[datatable-help] 'by' on a numeric column produces inconsistent utput

Thu Dec 19 09:26:12 CET 2013

Hi Arun,

here the results on Mac OS X Mavericks with gcc 4.8.2

data.table 1.8.10:

> set.seed(32)
> n <- 3
> dt <- data.table(
+ y=rnorm(n),
+ by=round( rnorm(n), 1)
+ )
>
> dt[,
+ list(max=max(y, na.rm=TRUE)),
+ by=list(by)
+ ]
    by        max
1: 0.7 0.01464054
2: 0.4 0.87328871
>
> dt[,
+ list(max=max(y, na.rm=TRUE)),
+ by=list(by)
+ ]
    by        max
1: 0.7 0.01464054
2: 0.4 0.87328871

data.table 1.8.11:

> set.seed(32)
> n <- 3
> dt <- data.table(
+ y=rnorm(n),
+ by=round( rnorm(n), 1)
+ )
>
> dt[,
+ list(max=max(y, na.rm=TRUE)),
+ by=list(by)
+ ]
    by        max
1: 0.7 0.01464054
2: 0.4 0.87328871
>
> dt[,
+ list(max=max(y, na.rm=TRUE)),
+ by=list(by)
+ ]
    by        max
1: 0.7 0.01464054
2: 0.4 0.87328871

Best

Simon

On 19 Dec 2013, at 09:05, Arunkumar Srinivasan <aragorn168b at gmail.com> wrote:

> Simon, sure.
> 
> set.seed(32)
> n <- 3
> dt <- data.table(
> y=rnorm(n),
> by=round( rnorm(n), 1)
> )
> 
> dt[,
> list(max=max(y, na.rm=TRUE)),
> by=list(by)
> ]
> 
> dt[,
> list(max=max(y, na.rm=TRUE)),
> by=list(by)
> ]
> 
> 
> 
> Arun
> 
> On Thursday, December 19, 2013 at 8:49 AM, Simon Zehnder wrote:
> 
>> Arun,
>> 
>> if you could send me the reproducible code in copyable form I can as well try it on Mac OS X Mavericks with gcc 4.8.
>> 
>> Best
>> 
>> Simon
>> 
>> On 19 Dec 2013, at 08:44, Arunkumar Srinivasan <aragorn168b at gmail.com> wrote:
>> 
>>> Aha, the issue seems to be with 'uniqlist', not sure why it gives
>>>> (f__ = data.table:::uniqlist(byval, order=o__)) # 1,3
>>> 1,2,3 for you and 1,3 consistently for me. I'll revert this back to `duplist` for now. Not sure how to solve this though. I've tried it so far on 3 machines:
>>> 
>>> 1) OS X 10.8.5 + libvm (gcc)
>>> 2) OS X Mavericks + Clang
>>> 3) Debian Weezy + gcc
>>> 
>>> All of them give consistent output. Man this is such a drag.
>>> 
>>> Arun
>>> 
>>> On Thursday, December 19, 2013 at 8:37 AM, Kevin Ushey wrote:
>>> 
>>>> Hi Arun,
>>>> 
>>>> Here's the output on my machine -- other information missing from
>>>> before; it's with OSX Mavericks, with R and data.table compiled with
>>>> Apple clang.
>>>> 
>>>> ---
>>>> 
>>>>> library(data.table, lib="/Users/kevinushey/Library/R/3.1/library")
>>>>> set.seed(32)
>>>>> n <- 3
>>>>> dt <- data.table(
>>>> + y=rnorm(n),
>>>> + by=round( rnorm(n), 1)
>>>> + )
>>>> ## run one
>>>>> byval <- list(by=dt$by)
>>>>> (o__ <- data.table:::fastorder(byval)) # 2,3,1
>>>> [1] 2 3 1
>>>>> (f__ = data.table:::uniqlist(byval, order=o__)) # 1,3
>>>> [1] 1 2 3
>>>>> (len__ = data.table:::uniqlengths(f__, nrow(dt))) # 2,1
>>>> [1] 1 1 1
>>>>> (firstofeachgroup = o__[f__]) # 2,1
>>>> [1] 2 3 1
>>>>> (origorder = data.table:::iradixorder(firstofeachgroup)) # 2,1
>>>> [1] 3 1 2
>>>>> (f__ = f__[origorder]) # 3,1
>>>> [1] 3 1 2
>>>>> (len__ = len__[origorder]) # 2,1
>>>> [1] 1 1 1
>>>> 
>>>> ## run two
>>>>> (o__ <- data.table:::fastorder(byval)) # 2,3,1
>>>> [1] 1 2 3
>>>>> (f__ = data.table:::uniqlist(byval, order=o__)) # 1,3
>>>> [1] 1 3
>>>>> (len__ = data.table:::uniqlengths(f__, nrow(dt))) # 2,1
>>>> [1] 2 1
>>>>> (firstofeachgroup = o__[f__]) # 2,1
>>>> [1] 1 3
>>>>> (origorder = data.table:::iradixorder(firstofeachgroup)) # 2,1
>>>> [1] 1 2
>>>>> (f__ = f__[origorder]) # 3,1
>>>> [1] 1 3
>>>>> (len__ = len__[origorder]) # 2,1
>>>> [1] 2 1
>>>> 
>>>> On Wed, Dec 18, 2013 at 11:22 PM, Arunkumar Srinivasan
>>>> <aragorn168b at gmail.com> wrote:
>>>>> Not sure how to debug without being able to reproduce. Tried on Mac OS X
>>>>> 10.8.5 and Debian GNU/Linux 7 (wheezy). I don't have access to a windows
>>>>> machine. I consistently gives me this:
>>>>> 
>>>>>> dt[,
>>>>> + list(max=max(y, na.rm=TRUE)),
>>>>> + by=list(by)
>>>>> + ]
>>>>> by max
>>>>> 1: 0.7 0.01464054
>>>>> 2: 0.4 0.87328871
>>>>>> 
>>>>>> dt[,
>>>>> + list(max=max(y, na.rm=TRUE)),
>>>>> + by=list(by)
>>>>> + ]
>>>>> by max
>>>>> 1: 0.7 0.01464054
>>>>> 2: 0.4 0.87328871
>>>>> 
>>>>> Can either of you provide me with the output of these steps in cases where
>>>>> there's an error? I've commented the output I get for each step.
>>>>> 
>>>>> byval <- list(by=dt$by)
>>>>> o__ <- data.table:::fastorder(byval) # 2,3,1
>>>>> f__ = data.table:::uniqlist(byval, order=o__) # 1,3
>>>>> len__ = data.table:::uniqlengths(f__, nrow(dt)) # 2,1
>>>>> firstofeachgroup = o__[f__] # 2,1
>>>>> origorder = data.table:::iradixorder(firstofeachgroup) # 2,1
>>>>> f__ = f__[origorder] # 3,1
>>>>> len__ = len__[origorder] # 2,1
>>>>> 
>>>>> 
>>>>> Arun
>>>>> 
>>>>> <...snip...>
>>> 
>>> _______________________________________________
>>> datatable-help mailing list
>>> datatable-help at lists.r-forge.r-project.org
>>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
>