[datatable-help] aggregating data

Matthew Dowle mdowle at mdowle.plus.com
Wed Apr 10 15:25:05 CEST 2013


 

But data.table is floating point aware. You _can_ join to floating
point values, and you _can_ group by floating point values. data.table
will do that within machine tolerance and take care of it for you. 

So
this may explain why your 'agg' only had 119 rows (because data.table is
doing the rounding for you automatically), but length(unique(DT$x)) had
331 ? 

But, there was a bug or two in this area a few versions ago,
mentioned in NEWS. Which is why I asked for sessionInfo() and str(DT)
suspecting you had a double column with a slightly older version of
data.table. Or, there might be a new problem. If you have to round() in
data.table, that doesn't sound right to me. 

Matthew 

On 10.04.2013
13:50, David Bellot wrote: 

> actually I found the issue. That was not
related to data.table but because I'm comparing float values, it breaks
all the time if I do not round() my values before. Basically I have
values like 0,1, 1.5, 0.5 etc... 
> I know it's bad to do that but I'm
not the boss in this project ;-)
> 
> Just in case other users are
reading my email, I can only advise to read that again and again:
>
http://docs.oracle.com/cd/E19957-01/806-3568/ncg_goldberg.html [1]
> 
>
Best,
> David 
> 
> On Tue, Apr 9, 2013 at 11:39 AM, Matthew Dowle
<mdowle at mdowle.plus.com [2]> wrote:
> 
>> That's odd. Please provide
result of sessionInfo() and str(DT). 
>> 
>> Matthew 
>> 
>> On
09.04.2013 11:32, David Bellot wrote: 
>> 
>>> Hi,
>>> 
>>> I have a
data.table DT with one of the column named x and I other names, let's
say, a1, a2, ... aN. The key of this data.table is made of a1...aN.
>>>

>>> Later on, I aggregate my DT with x like this:
>>> agg = DT[ ,
list(m=mean(y), c=length(y)), by = c("x") ]
>>> 
>>> The problem is that
"x" has 331 unique values as found by length(unique(DT$x)) but my result
"agg" only has 119 rows. I tried by changing the key to "x" alone but
the problem persists. My DT table has a few millions rows by the way.

>>> 
>>> I'm sure I'm missing something totally obvious :-( !!!!
>>>

>>> Any idea ? 
>>> Best,
>>> David

 

Links:
------
[1]
http://docs.oracle.com/cd/E19957-01/806-3568/ncg_goldberg.html
[2]
mailto:mdowle at mdowle.plus.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/datatable-help/attachments/20130410/f3675b17/attachment.html>


More information about the datatable-help mailing list