Thank you very much for responding to the draft idea!
Matthew, your opinions are very educational and enjoyable! <br><br><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Great, that is is promising. The main difficulty in 'allowing' different<br>
column types is sorting them efficiently. By efficiently we mean as<br>
fast, or close to as fast, as radix sorting (actually a counting sort)<br>
of integers. If there is a way to sort bit64 then it should be fine.<br>
I'm not quite clear if bit64 is for 64bit machines only or not. But that<br>
can be switched without too much difficulty.</blockquote><div><br></div><div>I am more confident that bit64 also support 32bit machine with the following support: </div><div><ol><li>I can't find any warning for bit64 not supporting 32bit machine. Can't image it doesn't support without a warning. </li>
<li>I indeed find the compiled bit64.dll in bit64\libs\i386 folder. If it doesn't compile for 32bit machine, this folder and dll won't even exist. </li></ol></div><div>As for sorting, in page 9: </div><div><br></div>
<div><div><b><i>Limitations planned to be removed with the next release</i></b></div><div><i>• <b>sort </b>is not yet implemented</i></div><div><i>• <b>order </b>is not yet implemented</i></div><div><i>• <b>match </b>is not yet implemented</i></div>
<div><i>• duplicated is not yet implemented</i></div><div><i>• unique is not yet implemented</i></div><div><i>• table is not yet implemented</i></div><div><i>• as.factor is not yet implemented</i></div><div><i><br></i></div>
<div><b><i>Further limitations</i></b></div><div><i>• subscripting non-existing elements and subscripting with NAs is currently not supported. Such subscripting currently returns 9218868437227407266 instead of NA (the NA value of the underlying double code). Following the full R behaviour here would either destroy performance or require extensive C-coding</i></div>
</div><div><ol><li>Not sure whether data.table use its customized sorting or R's default sorting method. I presume it's later case. </li><li>In later case, what bit64 is going to implement will become critical. Not sure whether the author (Dr. Jens Oehlschlägel) plans for something as fast as counting sort?</li>
<li>Maybe we can kindly remind him? He must also be very interested too, because we can tell that he is also a fan of high-performance computing (Actually, I later found Dr. Jens Oehlschlägel is also the author ff pacakge). I sincerely hope he will also be happy to see the great potential in leveraging his new package in data.table community. :)))</li>
<li>Does it imply that data.table can also support double type as the key column once bit64 fast sorting is available? since bit64 is internally double type. </li></ol></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div class="im">Nope, 64bit R is still limited to 2^31 vector length. What is freed in</div>
64bit R is that you can have many more 2^31 vectors in memory at once.<br>
So a data.table can be 2 billion rows and as many columns that can fit<br>
in RAM. Remember a 2 billion (2^31) numeric vector is 2^31 * 8 / 1024^3<br>
= 16GB. That's quite a bit for a single vector! Lets say hardware<br>
limitations are 128GB of RAM currently (at reasonable cost). With just<br>
8 columns and 2 billion rows, your RAM is full anyway with no room for<br>
copies, let alone the OS itself. In practice the vector length<br>
limitation rarely bites. </blockquote><div><br></div><div>Thank you very much for pointing out. Aha, that's why I didn't remember 2^31 vector length was a problem. But I couldn't remember the detail and thus was scared when you raised the issue. </div>
<div><br></div><div>Best regards,</div><div> </div></div>