[datatable-help] 'by' on a numeric column produces inconsistent utput
Simon Zehnder
szehnder at uni-bonn.de
Thu Dec 19 08:49:38 CET 2013
Arun,
if you could send me the reproducible code in copyable form I can as well try it on Mac OS X Mavericks with gcc 4.8.
Best
Simon
On 19 Dec 2013, at 08:44, Arunkumar Srinivasan <aragorn168b at gmail.com> wrote:
> Aha, the issue seems to be with 'uniqlist', not sure why it gives
>> (f__ = data.table:::uniqlist(byval, order=o__)) # 1,3
> 1,2,3 for you and 1,3 consistently for me. I'll revert this back to `duplist` for now. Not sure how to solve this though. I've tried it so far on 3 machines:
>
> 1) OS X 10.8.5 + libvm (gcc)
> 2) OS X Mavericks + Clang
> 3) Debian Weezy + gcc
>
> All of them give consistent output. Man this is such a drag.
>
> Arun
>
> On Thursday, December 19, 2013 at 8:37 AM, Kevin Ushey wrote:
>
>> Hi Arun,
>>
>> Here's the output on my machine -- other information missing from
>> before; it's with OSX Mavericks, with R and data.table compiled with
>> Apple clang.
>>
>> ---
>>
>>> library(data.table, lib="/Users/kevinushey/Library/R/3.1/library")
>>> set.seed(32)
>>> n <- 3
>>> dt <- data.table(
>> + y=rnorm(n),
>> + by=round( rnorm(n), 1)
>> + )
>> ## run one
>>> byval <- list(by=dt$by)
>>> (o__ <- data.table:::fastorder(byval)) # 2,3,1
>> [1] 2 3 1
>>> (f__ = data.table:::uniqlist(byval, order=o__)) # 1,3
>> [1] 1 2 3
>>> (len__ = data.table:::uniqlengths(f__, nrow(dt))) # 2,1
>> [1] 1 1 1
>>> (firstofeachgroup = o__[f__]) # 2,1
>> [1] 2 3 1
>>> (origorder = data.table:::iradixorder(firstofeachgroup)) # 2,1
>> [1] 3 1 2
>>> (f__ = f__[origorder]) # 3,1
>> [1] 3 1 2
>>> (len__ = len__[origorder]) # 2,1
>> [1] 1 1 1
>>
>> ## run two
>>> (o__ <- data.table:::fastorder(byval)) # 2,3,1
>> [1] 1 2 3
>>> (f__ = data.table:::uniqlist(byval, order=o__)) # 1,3
>> [1] 1 3
>>> (len__ = data.table:::uniqlengths(f__, nrow(dt))) # 2,1
>> [1] 2 1
>>> (firstofeachgroup = o__[f__]) # 2,1
>> [1] 1 3
>>> (origorder = data.table:::iradixorder(firstofeachgroup)) # 2,1
>> [1] 1 2
>>> (f__ = f__[origorder]) # 3,1
>> [1] 1 3
>>> (len__ = len__[origorder]) # 2,1
>> [1] 2 1
>>
>> On Wed, Dec 18, 2013 at 11:22 PM, Arunkumar Srinivasan
>> <aragorn168b at gmail.com> wrote:
>>> Not sure how to debug without being able to reproduce. Tried on Mac OS X
>>> 10.8.5 and Debian GNU/Linux 7 (wheezy). I don't have access to a windows
>>> machine. I consistently gives me this:
>>>
>>>> dt[,
>>> + list(max=max(y, na.rm=TRUE)),
>>> + by=list(by)
>>> + ]
>>> by max
>>> 1: 0.7 0.01464054
>>> 2: 0.4 0.87328871
>>>>
>>>> dt[,
>>> + list(max=max(y, na.rm=TRUE)),
>>> + by=list(by)
>>> + ]
>>> by max
>>> 1: 0.7 0.01464054
>>> 2: 0.4 0.87328871
>>>
>>> Can either of you provide me with the output of these steps in cases where
>>> there's an error? I've commented the output I get for each step.
>>>
>>> byval <- list(by=dt$by)
>>> o__ <- data.table:::fastorder(byval) # 2,3,1
>>> f__ = data.table:::uniqlist(byval, order=o__) # 1,3
>>> len__ = data.table:::uniqlengths(f__, nrow(dt)) # 2,1
>>> firstofeachgroup = o__[f__] # 2,1
>>> origorder = data.table:::iradixorder(firstofeachgroup) # 2,1
>>> f__ = f__[origorder] # 3,1
>>> len__ = len__[origorder] # 2,1
>>>
>>>
>>> Arun
>>>
>>> <...snip...>
>
> _______________________________________________
> datatable-help mailing list
> datatable-help at lists.r-forge.r-project.org
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
More information about the datatable-help
mailing list