[datatable-help] New package bit64 over package int64

Branson Owen branson.owen at gmail.com
Mon Feb 27 18:10:05 CET 2012


Thank you very much for responding to the draft idea!  Matthew, your
opinions are very educational and enjoyable!

Great, that is is promising. The main difficulty in 'allowing' different
> column types is sorting them efficiently. By efficiently we mean as
> fast, or close to as fast, as radix sorting (actually a counting sort)
> of integers.  If there is a way to sort bit64 then it should be fine.
> I'm not quite clear if bit64 is for 64bit machines only or not. But that
> can be switched without too much difficulty.


I am more confident that bit64 also support 32bit machine with the
following support:

   1. I can't find any warning for bit64 not supporting 32bit machine.
   Can't image it doesn't support without a warning.
   2. I indeed find the compiled bit64.dll in bit64\libs\i386 folder. If it
   doesn't compile for 32bit machine, this folder and dll won't even exist.

As for sorting, in page 9:

*Limitations planned to be removed with the next release*
*• sort is not yet implemented*
*• order is not yet implemented*
*• match is not yet implemented*
*• duplicated is not yet implemented*
*• unique is not yet implemented*
*• table is not yet implemented*
*• as.factor is not yet implemented*
*
*
*Further limitations*
*• subscripting non-existing elements and subscripting with NAs is
currently not supported. Such subscripting currently returns
9218868437227407266 instead of NA (the NA value of the underlying double
code). Following the full R behaviour here would either destroy
performance or require extensive C-coding*

   1. Not sure whether data.table use its customized sorting or R's default
   sorting method. I presume it's later case.
   2. In later case, what bit64 is going to implement will become critical.
   Not sure whether the author (Dr. Jens Oehlschlägel) plans for something as
   fast as counting sort?
   3. Maybe we can kindly remind him? He must also be very interested too,
   because we can tell that he is also a fan of high-performance computing
   (Actually, I later found Dr. Jens Oehlschlägel is also the author ff
   pacakge). I sincerely hope he will also be happy to see the great potential
   in leveraging his new package in data.table community. :)))
   4. Does it imply that data.table can also support double type as the key
   column once bit64 fast sorting is available? since bit64 is internally
   double type.

Nope, 64bit R is still limited to 2^31 vector length. What is freed in
> 64bit R is that you can have many more 2^31 vectors in memory at once.
> So a data.table can be 2 billion rows and as many columns that can fit
> in RAM. Remember a 2 billion (2^31) numeric vector is 2^31 * 8 / 1024^3
> = 16GB. That's quite a bit for a single vector! Lets say hardware
> limitations are 128GB of RAM currently (at reasonable cost).  With just
> 8 columns and 2 billion rows, your RAM is full anyway with no room for
> copies, let alone the OS itself. In practice the vector length
> limitation rarely bites.


Thank you very much for pointing out. Aha, that's why I didn't remember
2^31 vector length was a problem. But I couldn't remember the detail and
thus was scared when you raised the issue.

Best regards,
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/datatable-help/attachments/20120227/b5d32d8f/attachment.html>


More information about the datatable-help mailing list