[datatable-help] integer64 group by doesn't find all groups

Yike Lu yikelu.home at gmail.com
Fri Feb 14 16:07:30 CET 2014


Thanks for the info guys! Wondering if there's any way I can help?


On Wed, Feb 12, 2014 at 11:17 AM, caneff at gmail.com <caneff at gmail.com> wrote:

> Yes this isn't a data.table criticism, just a bit64 one in general.
>
>
> On Wed Feb 12 2014 at 11:39:47 AM, Matt Dowle <mdowle at mdowle.plus.com>
> wrote:
>
>>
>> Sometimes we take the hard road in data.table, to get to a better place.
>> Once bit64::integer64 is fully supported, it'll be much easier.   All the
>> recent radix work for double applies almost automatically to integer64 for
>> example,  but that radix work had to be done first.
>>
>>
>> On 12/02/14 16:26, caneff at gmail.com wrote:
>>
>> FYI (and this is a long outstanding argument) this is why I don't like
>> the bit64 package.  These sorts of errors happen silently.  I understand
>> that data.table can't use the other integer64 package, but at least there
>> it is obvious when things are being coerced.
>>
>>  In my situations, if I am grouping by a int64, it is usually either an
>> ID so I can just make it a character vector instead, or it is something
>> where I don't mind lost precision so I just make it numeric.
>>
>> On Wed Feb 12 2014 at 11:22:40 AM, Matt Dowle <mdowle at mdowle.plus.com>
>> wrote:
>>
>>
>> Hi,
>>
>> You're doing nothing wrong.  Although you can load integer64 using fread
>> and create them directly,  data.table's grouping and keys don't work on
>> them yet.  Sorry,  just not yet implemented. Because integer64 are
>> internally stored as type double  (a good idea by package bit64),
>> data.table sees them internally as double and doesn't catch that the
>> type isn't supported yet (hence no error message such as you get for
>> type 'complex').   The particular integer64 numbers in this example are
>> quite small so will use the lower bits.  In double, those are the most
>> precise part of the significand, which would explain why only one group
>> comes out here since data.table groups and joins floating point data
>> within tolerance.
>>
>> Matt
>>
>> On 06/02/14 23:38, Yike Lu wrote:
>> > After a long hiatus, I am back to using data.table. Unfortunately,
>> > I've encountered a problem. Am I doing something wrong here?
>> >
>> > require(data.table)
>> >
>> > dt = data.table(idx = 1:100 %% 3, 1:100)
>> > dt[, list(sum(V2)), by = idx]
>> > # normal
>> >
>> > require(bit64)
>> >
>> > dt2 = data.table(idx = integer64(100) + 1:100 %% 3, 1:100)
>> > dt2[, list(sum(V2)), by = idx]
>> > # only has one group:
>> > #   idx   V1
>> > #1:   1 5050
>> >
>>
>> _______________________________________________
>> datatable-help mailing list
>> datatable-help at lists.r-forge.r-project.org
>>
>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
>>
>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/datatable-help/attachments/20140214/ad8cbafe/attachment-0001.html>


More information about the datatable-help mailing list