[datatable-help] integer64 group by doesn't find all groups

Yike Lu yikelu.home at gmail.com
Sun Mar 2 19:45:17 CET 2014


Yes, I'm up for it. The C edits sound relatively straightforward actually.

It's the other parts I'm not as familiar with: what's the SCM procedure,
what's the build procedure going to be?


On Sun, Mar 2, 2014 at 6:26 AM, Matt Dowle <mdowle at mdowle.plus.com> wrote:

>
> On 14/02/14 15:07, Yike Lu wrote:
>
> Thanks for the info guys! Wondering if there's any way I can help?
>
>
> Thanks for your offer.  The function iradix in forder.c needs copying and
> tweaking to become i64radix (8 passes instead of 4), or making general so
> that 4 or 8 can be passed in. Should also check first how the bit64 package
> sorts integer64. Then in bmerge.c add a case to the switch for integer64 to
> cast to long long,  add tests to tests.Rraw for grouping and joining,
> update documentation (.Rd) files and add checks to init.c.
>
> Is that something you could do?  If you are rusty on C I don't mind
> guiding you through.
>
> Matt
>
>
>
>
> On Wed, Feb 12, 2014 at 11:17 AM, caneff at gmail.com <caneff at gmail.com>wrote:
>
>> Yes this isn't a data.table criticism, just a bit64 one in general.
>>
>>
>>  On Wed Feb 12 2014 at 11:39:47 AM, Matt Dowle <mdowle at mdowle.plus.com>
>> wrote:
>>
>>>
>>> Sometimes we take the hard road in data.table, to get to a better
>>> place.  Once bit64::integer64 is fully supported, it'll be much easier.
>>> All the recent radix work for double applies almost automatically to
>>> integer64 for example,  but that radix work had to be done first.
>>>
>>>
>>> On 12/02/14 16:26, caneff at gmail.com wrote:
>>>
>>> FYI (and this is a long outstanding argument) this is why I don't like
>>> the bit64 package.  These sorts of errors happen silently.  I understand
>>> that data.table can't use the other integer64 package, but at least there
>>> it is obvious when things are being coerced.
>>>
>>>  In my situations, if I am grouping by a int64, it is usually either an
>>> ID so I can just make it a character vector instead, or it is something
>>> where I don't mind lost precision so I just make it numeric.
>>>
>>> On Wed Feb 12 2014 at 11:22:40 AM, Matt Dowle <mdowle at mdowle.plus.com>
>>> wrote:
>>>
>>>
>>> Hi,
>>>
>>> You're doing nothing wrong.  Although you can load integer64 using fread
>>> and create them directly,  data.table's grouping and keys don't work on
>>> them yet.  Sorry,  just not yet implemented. Because integer64 are
>>> internally stored as type double  (a good idea by package bit64),
>>> data.table sees them internally as double and doesn't catch that the
>>> type isn't supported yet (hence no error message such as you get for
>>> type 'complex').   The particular integer64 numbers in this example are
>>> quite small so will use the lower bits.  In double, those are the most
>>> precise part of the significand, which would explain why only one group
>>> comes out here since data.table groups and joins floating point data
>>> within tolerance.
>>>
>>> Matt
>>>
>>> On 06/02/14 23:38, Yike Lu wrote:
>>> > After a long hiatus, I am back to using data.table. Unfortunately,
>>> > I've encountered a problem. Am I doing something wrong here?
>>> >
>>> > require(data.table)
>>> >
>>> > dt = data.table(idx = 1:100 %% 3, 1:100)
>>> > dt[, list(sum(V2)), by = idx]
>>> > # normal
>>> >
>>> > require(bit64)
>>> >
>>> > dt2 = data.table(idx = integer64(100) + 1:100 %% 3, 1:100)
>>> > dt2[, list(sum(V2)), by = idx]
>>> > # only has one group:
>>> > #   idx   V1
>>> > #1:   1 5050
>>> >
>>>
>>> _______________________________________________
>>> datatable-help mailing list
>>> datatable-help at lists.r-forge.r-project.org
>>>
>>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
>>>
>>>
>>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/datatable-help/attachments/20140302/fc28f75f/attachment-0001.html>


More information about the datatable-help mailing list