[datatable-help] integer64 group by doesn't find all groups
Matt Dowle
mdowle at mdowle.plus.com
Sun Mar 2 13:26:42 CET 2014
On 14/02/14 15:07, Yike Lu wrote:
> Thanks for the info guys! Wondering if there's any way I can help?
Thanks for your offer. The function iradix in forder.c needs copying
and tweaking to become i64radix (8 passes instead of 4), or making
general so that 4 or 8 can be passed in. Should also check first how the
bit64 package sorts integer64. Then in bmerge.c add a case to the switch
for integer64 to cast to long long, add tests to tests.Rraw for
grouping and joining, update documentation (.Rd) files and add checks to
init.c.
Is that something you could do? If you are rusty on C I don't mind
guiding you through.
Matt
>
>
> On Wed, Feb 12, 2014 at 11:17 AM, caneff at gmail.com
> <mailto:caneff at gmail.com> <caneff at gmail.com <mailto:caneff at gmail.com>>
> wrote:
>
> Yes this isn't a data.table criticism, just a bit64 one in general.
>
>
> On Wed Feb 12 2014 at 11:39:47 AM, Matt Dowle
> <mdowle at mdowle.plus.com <mailto:mdowle at mdowle.plus.com>> wrote:
>
>
> Sometimes we take the hard road in data.table, to get to a
> better place. Once bit64::integer64 is fully supported, it'll
> be much easier. All the recent radix work for double applies
> almost automatically to integer64 for example, but that radix
> work had to be done first.
>
>
> On 12/02/14 16:26, caneff at gmail.com <mailto:caneff at gmail.com>
> wrote:
>> FYI (and this is a long outstanding argument) this is why I
>> don't like the bit64 package. These sorts of errors happen
>> silently. I understand that data.table can't use the other
>> integer64 package, but at least there it is obvious when
>> things are being coerced.
>>
>> In my situations, if I am grouping by a int64, it is usually
>> either an ID so I can just make it a character vector
>> instead, or it is something where I don't mind lost precision
>> so I just make it numeric.
>>
>> On Wed Feb 12 2014 at 11:22:40 AM, Matt Dowle
>> <mdowle at mdowle.plus.com <mailto:mdowle at mdowle.plus.com>> wrote:
>>
>>
>> Hi,
>>
>> You're doing nothing wrong. Although you can load
>> integer64 using fread
>> and create them directly, data.table's grouping and keys
>> don't work on
>> them yet. Sorry, just not yet implemented. Because
>> integer64 are
>> internally stored as type double (a good idea by package
>> bit64),
>> data.table sees them internally as double and doesn't
>> catch that the
>> type isn't supported yet (hence no error message such as
>> you get for
>> type 'complex'). The particular integer64 numbers in
>> this example are
>> quite small so will use the lower bits. In double, those
>> are the most
>> precise part of the significand, which would explain why
>> only one group
>> comes out here since data.table groups and joins floating
>> point data
>> within tolerance.
>>
>> Matt
>>
>> On 06/02/14 23:38, Yike Lu wrote:
>> > After a long hiatus, I am back to using data.table.
>> Unfortunately,
>> > I've encountered a problem. Am I doing something wrong
>> here?
>> >
>> > require(data.table)
>> >
>> > dt = data.table(idx = 1:100 %% 3, 1:100)
>> > dt[, list(sum(V2)), by = idx]
>> > # normal
>> >
>> > require(bit64)
>> >
>> > dt2 = data.table(idx = integer64(100) + 1:100 %% 3, 1:100)
>> > dt2[, list(sum(V2)), by = idx]
>> > # only has one group:
>> > # idx V1
>> > #1: 1 5050
>> >
>>
>> _______________________________________________
>> datatable-help mailing list
>> datatable-help at lists.r-forge.r-project.org
>> <mailto:datatable-help at lists.r-forge.r-project.org>
>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/datatable-help/attachments/20140302/77eac0b3/attachment.html>
More information about the datatable-help
mailing list