[datatable-help] Return Select/Join that does NOT match?

mdowle at mdowle.plus.com mdowle at mdowle.plus.com
Wed Jul 28 10:40:03 CEST 2010


Welcome Branson.

[1] Try DT[-DT[<join>,which=TRUE],...]

[2] You mean one key of 4 columns I think. You can't skip columns as
that's fundamental to the binary search. You can call setkey again with
the 3rd and 4th column (i.e. change the key) then do the join. Thanks to
Tom adding radix sorting, setting the key is very fast. Adding secondary
keys is on the to do list, but a secondary key will always be slower than
joining on the primary key (due to page fetches). The pros and cons depend
a lot on the dataset and the task in hand.

[3] Might be a bug there. Please can you provide a small reproducible
example and version information. I'm assuming by key1, key2 you really
mean col1, col2.

Thanks, Matthew


> To everyone who contributed to data.table,
>
> I think data.table is the greatest package I have used. It not only
> provides the convenience, but also change the way I program and think!
> Can't wait to see new version with more power coming! I found many
> wishes we desired had been implemented. Want to say a million thanks
> to this community before I ask questions.
>
> Here are questions relevant to join/select:
>
> [1] Is that possible to return select/join that does NOT match? This
> is easy when using logic index like ! (x==1) but we back to scan and
> lost binary search benefits. Not sure about the syntax? Maybe we can
> try something like "invert = TRUE" in grep function?
>
> DataTable[ CJ("exclude") , invert = TRUE]
>
> At this moment, I wonder whether
>
> DataTable[ CJ( unique(column) %NOT IN% "exclude"   )  ]   ** %NOT IN%
> is a customized function that returns unselected items
>
> is faster than scan?  [column != "exclude"]
>
>
> [2] Assume I have a DataTable with four keys. How can I efficiently
> select/join and skip the first two keys in my join?
>
> This is what I am doing now:
>
> DataTable[ CJ( unique(key1), unique(key2), "target key3", "a
> collection of target key4") ]
>
> Am I not supposed to use join like this? Could CJ(...) create a big
> object that is comparable to original datatable? Original datatable
> might already reach the limit of memory. Should I just use scan in
> this case (I hope not)?
>
> [3] I thought I can do this:
>
> DataTable[ CJ( FN(key1), FN(key2), FN(key3) ) ], but it complains
> about column names.
> *FN is a function
>
> Later I found I can do this, DataTable[ { CJ( FN(key1), FN(key2),
> FN(key3) ) } ],
> I just add { } outside CJ
>
> Don't understand why, but at least it works. I really wonder whether I
> should do this or there is a more correct syntax?
>
>
> Again, thank you very, very much for all your efforts. Your work is
> fantastic and impacting! I seriously believe data.table should be a
> standard data type to replace data.frame!
>
> Best regards,
> _______________________________________________
> datatable-help mailing list
> datatable-help at lists.r-forge.r-project.org
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
>




More information about the datatable-help mailing list