[datatable-help] integer64 group by doesn't find all groups

Matt Dowle mdowle at mdowle.plus.com
Mon Mar 3 02:14:28 CET 2014


Great.  Just click 'join project', follow the instructions on the 
R-Forge homepage to connect and then commit.  We can discuss the finer 
points off-list.

Matt

On 02/03/14 18:45, Yike Lu wrote:
> Yes, I'm up for it. The C edits sound relatively straightforward actually.
>
> It's the other parts I'm not as familiar with: what's the SCM 
> procedure, what's the build procedure going to be?
>
>
> On Sun, Mar 2, 2014 at 6:26 AM, Matt Dowle <mdowle at mdowle.plus.com 
> <mailto:mdowle at mdowle.plus.com>> wrote:
>
>
>     On 14/02/14 15:07, Yike Lu wrote:
>>     Thanks for the info guys! Wondering if there's any way I can help?
>
>     Thanks for your offer.  The function iradix in forder.c needs
>     copying and tweaking to become i64radix (8 passes instead of 4),
>     or making general so that 4 or 8 can be passed in. Should also
>     check first how the bit64 package sorts integer64. Then in
>     bmerge.c add a case to the switch for integer64 to cast to long
>     long,  add tests to tests.Rraw for grouping and joining, update
>     documentation (.Rd) files and add checks to init.c.
>
>     Is that something you could do?  If you are rusty on C I don't
>     mind guiding you through.
>
>     Matt
>
>
>>
>>
>>     On Wed, Feb 12, 2014 at 11:17 AM, caneff at gmail.com
>>     <mailto:caneff at gmail.com> <caneff at gmail.com
>>     <mailto:caneff at gmail.com>> wrote:
>>
>>         Yes this isn't a data.table criticism, just a bit64 one in
>>         general.
>>
>>
>>         On Wed Feb 12 2014 at 11:39:47 AM, Matt Dowle
>>         <mdowle at mdowle.plus.com <mailto:mdowle at mdowle.plus.com>> wrote:
>>
>>
>>             Sometimes we take the hard road in data.table, to get to
>>             a better place.  Once bit64::integer64 is fully
>>             supported, it'll be much easier.   All the recent radix
>>             work for double applies almost automatically to integer64
>>             for example,  but that radix work had to be done first.
>>
>>
>>             On 12/02/14 16:26, caneff at gmail.com
>>             <mailto:caneff at gmail.com> wrote:
>>>             FYI (and this is a long outstanding argument) this is
>>>             why I don't like the bit64 package.  These sorts of
>>>             errors happen silently.  I understand that data.table
>>>             can't use the other integer64 package, but at least
>>>             there it is obvious when things are being coerced.
>>>
>>>             In my situations, if I am grouping by a int64, it is
>>>             usually either an ID so I can just make it a character
>>>             vector instead, or it is something where I don't mind
>>>             lost precision so I just make it numeric.
>>>
>>>             On Wed Feb 12 2014 at 11:22:40 AM, Matt Dowle
>>>             <mdowle at mdowle.plus.com <mailto:mdowle at mdowle.plus.com>>
>>>             wrote:
>>>
>>>
>>>                 Hi,
>>>
>>>                 You're doing nothing wrong.  Although you can load
>>>                 integer64 using fread
>>>                 and create them directly,  data.table's grouping and
>>>                 keys don't work on
>>>                 them yet.  Sorry,  just not yet implemented. Because
>>>                 integer64 are
>>>                 internally stored as type double  (a good idea by
>>>                 package bit64),
>>>                 data.table sees them internally as double and
>>>                 doesn't catch that the
>>>                 type isn't supported yet (hence no error message
>>>                 such as you get for
>>>                 type 'complex').   The particular integer64 numbers
>>>                 in this example are
>>>                 quite small so will use the lower bits.  In double,
>>>                 those are the most
>>>                 precise part of the significand, which would explain
>>>                 why only one group
>>>                 comes out here since data.table groups and joins
>>>                 floating point data
>>>                 within tolerance.
>>>
>>>                 Matt
>>>
>>>                 On 06/02/14 23:38, Yike Lu wrote:
>>>                 > After a long hiatus, I am back to using
>>>                 data.table. Unfortunately,
>>>                 > I've encountered a problem. Am I doing something
>>>                 wrong here?
>>>                 >
>>>                 > require(data.table)
>>>                 >
>>>                 > dt = data.table(idx = 1:100 %% 3, 1:100)
>>>                 > dt[, list(sum(V2)), by = idx]
>>>                 > # normal
>>>                 >
>>>                 > require(bit64)
>>>                 >
>>>                 > dt2 = data.table(idx = integer64(100) + 1:100 %%
>>>                 3, 1:100)
>>>                 > dt2[, list(sum(V2)), by = idx]
>>>                 > # only has one group:
>>>                 > #   idx   V1
>>>                 > #1:   1 5050
>>>                 >
>>>
>>>                 _______________________________________________
>>>                 datatable-help mailing list
>>>                 datatable-help at lists.r-forge.r-project.org
>>>                 <mailto:datatable-help at lists.r-forge.r-project.org>
>>>                 https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
>>>
>>
>>
>
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/datatable-help/attachments/20140303/69efc4e3/attachment.html>


More information about the datatable-help mailing list