[datatable-help] List-valued column

Stavros Macrakis macrakis at alum.mit.edu
Wed Oct 26 23:57:29 CEST 2011


Some quick testing seems to show that the cost of joining table A to table B
(on preexisting keys) is proportional to size(A)*log2(size(B)), that is,
each element of table A is looked up (using binary search) in table B.  This
is very efficient when size(A) << size(B), but when they're of similar size,
a linear merge can be done in time size(A)+size(B), which can be several
times faster.  Binary merge has a smooth transition from one regime to the
other, but a simple cross-over is pretty good already.

I would put together a nice script to show this explicitly, but that will
take some time, so I wanted to check whether this is a known issue or if I
need to demonstrate this issue in more detail.  (Or if I've made some stupid
blunder.)

Thanks!

             -s
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/datatable-help/attachments/20111026/95e79a1a/attachment.htm>


More information about the datatable-help mailing list