[datatable-help] integer64 group by doesn't find all groups
Yike Lu
yikelu.home at gmail.com
Fri Feb 14 16:07:30 CET 2014
Thanks for the info guys! Wondering if there's any way I can help?
On Wed, Feb 12, 2014 at 11:17 AM, caneff at gmail.com <caneff at gmail.com> wrote:
> Yes this isn't a data.table criticism, just a bit64 one in general.
>
>
> On Wed Feb 12 2014 at 11:39:47 AM, Matt Dowle <mdowle at mdowle.plus.com>
> wrote:
>
>>
>> Sometimes we take the hard road in data.table, to get to a better place.
>> Once bit64::integer64 is fully supported, it'll be much easier. All the
>> recent radix work for double applies almost automatically to integer64 for
>> example, but that radix work had to be done first.
>>
>>
>> On 12/02/14 16:26, caneff at gmail.com wrote:
>>
>> FYI (and this is a long outstanding argument) this is why I don't like
>> the bit64 package. These sorts of errors happen silently. I understand
>> that data.table can't use the other integer64 package, but at least there
>> it is obvious when things are being coerced.
>>
>> In my situations, if I am grouping by a int64, it is usually either an
>> ID so I can just make it a character vector instead, or it is something
>> where I don't mind lost precision so I just make it numeric.
>>
>> On Wed Feb 12 2014 at 11:22:40 AM, Matt Dowle <mdowle at mdowle.plus.com>
>> wrote:
>>
>>
>> Hi,
>>
>> You're doing nothing wrong. Although you can load integer64 using fread
>> and create them directly, data.table's grouping and keys don't work on
>> them yet. Sorry, just not yet implemented. Because integer64 are
>> internally stored as type double (a good idea by package bit64),
>> data.table sees them internally as double and doesn't catch that the
>> type isn't supported yet (hence no error message such as you get for
>> type 'complex'). The particular integer64 numbers in this example are
>> quite small so will use the lower bits. In double, those are the most
>> precise part of the significand, which would explain why only one group
>> comes out here since data.table groups and joins floating point data
>> within tolerance.
>>
>> Matt
>>
>> On 06/02/14 23:38, Yike Lu wrote:
>> > After a long hiatus, I am back to using data.table. Unfortunately,
>> > I've encountered a problem. Am I doing something wrong here?
>> >
>> > require(data.table)
>> >
>> > dt = data.table(idx = 1:100 %% 3, 1:100)
>> > dt[, list(sum(V2)), by = idx]
>> > # normal
>> >
>> > require(bit64)
>> >
>> > dt2 = data.table(idx = integer64(100) + 1:100 %% 3, 1:100)
>> > dt2[, list(sum(V2)), by = idx]
>> > # only has one group:
>> > # idx V1
>> > #1: 1 5050
>> >
>>
>> _______________________________________________
>> datatable-help mailing list
>> datatable-help at lists.r-forge.r-project.org
>>
>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
>>
>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/datatable-help/attachments/20140214/ad8cbafe/attachment-0001.html>
More information about the datatable-help
mailing list