[datatable-help] integer64 group by doesn't find all groups
Matt Dowle
mdowle at mdowle.plus.com
Mon Mar 3 02:14:28 CET 2014
Great. Just click 'join project', follow the instructions on the
R-Forge homepage to connect and then commit. We can discuss the finer
points off-list.
Matt
On 02/03/14 18:45, Yike Lu wrote:
> Yes, I'm up for it. The C edits sound relatively straightforward actually.
>
> It's the other parts I'm not as familiar with: what's the SCM
> procedure, what's the build procedure going to be?
>
>
> On Sun, Mar 2, 2014 at 6:26 AM, Matt Dowle <mdowle at mdowle.plus.com
> <mailto:mdowle at mdowle.plus.com>> wrote:
>
>
> On 14/02/14 15:07, Yike Lu wrote:
>> Thanks for the info guys! Wondering if there's any way I can help?
>
> Thanks for your offer. The function iradix in forder.c needs
> copying and tweaking to become i64radix (8 passes instead of 4),
> or making general so that 4 or 8 can be passed in. Should also
> check first how the bit64 package sorts integer64. Then in
> bmerge.c add a case to the switch for integer64 to cast to long
> long, add tests to tests.Rraw for grouping and joining, update
> documentation (.Rd) files and add checks to init.c.
>
> Is that something you could do? If you are rusty on C I don't
> mind guiding you through.
>
> Matt
>
>
>>
>>
>> On Wed, Feb 12, 2014 at 11:17 AM, caneff at gmail.com
>> <mailto:caneff at gmail.com> <caneff at gmail.com
>> <mailto:caneff at gmail.com>> wrote:
>>
>> Yes this isn't a data.table criticism, just a bit64 one in
>> general.
>>
>>
>> On Wed Feb 12 2014 at 11:39:47 AM, Matt Dowle
>> <mdowle at mdowle.plus.com <mailto:mdowle at mdowle.plus.com>> wrote:
>>
>>
>> Sometimes we take the hard road in data.table, to get to
>> a better place. Once bit64::integer64 is fully
>> supported, it'll be much easier. All the recent radix
>> work for double applies almost automatically to integer64
>> for example, but that radix work had to be done first.
>>
>>
>> On 12/02/14 16:26, caneff at gmail.com
>> <mailto:caneff at gmail.com> wrote:
>>> FYI (and this is a long outstanding argument) this is
>>> why I don't like the bit64 package. These sorts of
>>> errors happen silently. I understand that data.table
>>> can't use the other integer64 package, but at least
>>> there it is obvious when things are being coerced.
>>>
>>> In my situations, if I am grouping by a int64, it is
>>> usually either an ID so I can just make it a character
>>> vector instead, or it is something where I don't mind
>>> lost precision so I just make it numeric.
>>>
>>> On Wed Feb 12 2014 at 11:22:40 AM, Matt Dowle
>>> <mdowle at mdowle.plus.com <mailto:mdowle at mdowle.plus.com>>
>>> wrote:
>>>
>>>
>>> Hi,
>>>
>>> You're doing nothing wrong. Although you can load
>>> integer64 using fread
>>> and create them directly, data.table's grouping and
>>> keys don't work on
>>> them yet. Sorry, just not yet implemented. Because
>>> integer64 are
>>> internally stored as type double (a good idea by
>>> package bit64),
>>> data.table sees them internally as double and
>>> doesn't catch that the
>>> type isn't supported yet (hence no error message
>>> such as you get for
>>> type 'complex'). The particular integer64 numbers
>>> in this example are
>>> quite small so will use the lower bits. In double,
>>> those are the most
>>> precise part of the significand, which would explain
>>> why only one group
>>> comes out here since data.table groups and joins
>>> floating point data
>>> within tolerance.
>>>
>>> Matt
>>>
>>> On 06/02/14 23:38, Yike Lu wrote:
>>> > After a long hiatus, I am back to using
>>> data.table. Unfortunately,
>>> > I've encountered a problem. Am I doing something
>>> wrong here?
>>> >
>>> > require(data.table)
>>> >
>>> > dt = data.table(idx = 1:100 %% 3, 1:100)
>>> > dt[, list(sum(V2)), by = idx]
>>> > # normal
>>> >
>>> > require(bit64)
>>> >
>>> > dt2 = data.table(idx = integer64(100) + 1:100 %%
>>> 3, 1:100)
>>> > dt2[, list(sum(V2)), by = idx]
>>> > # only has one group:
>>> > # idx V1
>>> > #1: 1 5050
>>> >
>>>
>>> _______________________________________________
>>> datatable-help mailing list
>>> datatable-help at lists.r-forge.r-project.org
>>> <mailto:datatable-help at lists.r-forge.r-project.org>
>>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
>>>
>>
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/datatable-help/attachments/20140303/69efc4e3/attachment.html>
More information about the datatable-help
mailing list