[datatable-help] integer64 group by doesn't find all groups

Matt Dowle mdowle at mdowle.plus.com
Wed Feb 12 17:39:44 CET 2014


Sometimes we take the hard road in data.table, to get to a better 
place.  Once bit64::integer64 is fully supported, it'll be much 
easier.   All the recent radix work for double applies almost 
automatically to integer64 for example,  but that radix work had to be 
done first.

On 12/02/14 16:26, caneff at gmail.com wrote:
> FYI (and this is a long outstanding argument) this is why I don't like 
> the bit64 package.  These sorts of errors happen silently.  I 
> understand that data.table can't use the other integer64 package, but 
> at least there it is obvious when things are being coerced.
>
> In my situations, if I am grouping by a int64, it is usually either an 
> ID so I can just make it a character vector instead, or it is 
> something where I don't mind lost precision so I just make it numeric.
>
> On Wed Feb 12 2014 at 11:22:40 AM, Matt Dowle <mdowle at mdowle.plus.com 
> <mailto:mdowle at mdowle.plus.com>> wrote:
>
>
>     Hi,
>
>     You're doing nothing wrong.  Although you can load integer64 using
>     fread
>     and create them directly,  data.table's grouping and keys don't
>     work on
>     them yet.  Sorry,  just not yet implemented. Because integer64 are
>     internally stored as type double  (a good idea by package bit64),
>     data.table sees them internally as double and doesn't catch that the
>     type isn't supported yet (hence no error message such as you get for
>     type 'complex').   The particular integer64 numbers in this
>     example are
>     quite small so will use the lower bits.  In double, those are the most
>     precise part of the significand, which would explain why only one
>     group
>     comes out here since data.table groups and joins floating point data
>     within tolerance.
>
>     Matt
>
>     On 06/02/14 23:38, Yike Lu wrote:
>     > After a long hiatus, I am back to using data.table. Unfortunately,
>     > I've encountered a problem. Am I doing something wrong here?
>     >
>     > require(data.table)
>     >
>     > dt = data.table(idx = 1:100 %% 3, 1:100)
>     > dt[, list(sum(V2)), by = idx]
>     > # normal
>     >
>     > require(bit64)
>     >
>     > dt2 = data.table(idx = integer64(100) + 1:100 %% 3, 1:100)
>     > dt2[, list(sum(V2)), by = idx]
>     > # only has one group:
>     > #   idx   V1
>     > #1:   1 5050
>     >
>
>     _______________________________________________
>     datatable-help mailing list
>     datatable-help at lists.r-forge.r-project.org
>     <mailto:datatable-help at lists.r-forge.r-project.org>
>     https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/datatable-help/attachments/20140212/f4ac06a6/attachment-0001.html>


More information about the datatable-help mailing list