[datatable-help] Return Select/Join that does NOT match?

Matthew Dowle mdowle at mdowle.plus.com
Thu Jul 29 02:08:20 CEST 2010


Tom, I'd forgotten about manual secondary keys, good point. Thanks for
reminder.

There is a todo comment in setkey.R to speed up changing primary key. At
the moment setkey creates a working copy of the whole table, but it
could change each column one by one, requiring only one column of
working memory, reused each time in a similar way dogroups is done.  I
just added FR#1006 for that since it seems a no brainer to do.  It still
may make sense to store a copy with a different primary key but it might
give more options if changing the primary key on the fly is faster
anyway. It could help in maintaining manual secondary key tables too.

Will reply in your original thread about secondary keys ...

Matthew


On Wed, 2010-07-28 at 04:40 -0700, Short, Tom wrote:
> > -----Original Message-----
> > From: datatable-help-bounces at lists.r-forge.r-project.org 
> > [mailto:datatable-help-bounces at lists.r-forge.r-project.org] 
> > On Behalf Of Branson Owen
> > Sent: Tuesday, July 27, 2010 23:32
> > To: datatable-help at lists.r-forge.r-project.org
> > Subject: Re: [datatable-help] Return Select/Join that does NOT match?
> ...
> 
> > [2] Assume I have a DataTable with four keys. How can I 
> > efficiently select/join and skip the first two keys in my join?
> > 
> > This is what I am doing now:
> > 
> > DataTable[ CJ( unique(key1), unique(key2), "target key3", "a 
> > collection of target key4") ]
> > 
> > Am I not supposed to use join like this? Could CJ(...) create 
> > a big object that is comparable to original datatable? 
> > Original datatable might already reach the limit of memory. 
> > Should I just use scan in this case (I hope not)?
> > 
> 
> You can create a secondary key manually. See this post:
> 
> http://lists.r-forge.r-project.org/pipermail/datatable-help/2010-May/000
> 028.html
> 
> An even simpler approach is just making a second copy of your data and
> re-keying it. For large data tables, the secondary key saves memory and
> is quite fast.
> 
> - Tom
> _______________________________________________
> datatable-help mailing list
> datatable-help at lists.r-forge.r-project.org
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help




More information about the datatable-help mailing list