[datatable-help] 'by' on a numeric column produces inconsistent utput

Kevin Ushey kevinushey at gmail.com
Thu Dec 19 08:55:18 CET 2013


Hmm, I am seeing that after the data.table:::fastorder call, the dt
itself is modified. Notice that 'by' is rearranged without modifying
'y'.

> dt
             y  by
1:  0.01464054 0.7
2:  0.87328871 0.4
3: -1.02794620 0.4
> (o__ <- data.table:::fastorder(byval)) # 2,3,1
[1] 2 3 1
> dt
             y  by
1:  0.01464054 0.4
2:  0.87328871 0.4
3: -1.02794620 0.7

On Wed, Dec 18, 2013 at 11:44 PM, Arunkumar Srinivasan
<aragorn168b at gmail.com> wrote:
> Aha, the issue seems to be with 'uniqlist', not sure why it gives
>
> (f__ = data.table:::uniqlist(byval, order=o__)) # 1,3
>
> 1,2,3 for you and 1,3 consistently for me. I'll revert this back to
> `duplist` for now. Not sure how to solve this though. I've tried it so far
> on 3 machines:
>
> 1) OS X 10.8.5 + libvm (gcc)
> 2) OS X Mavericks + Clang
> 3) Debian Weezy + gcc
>
> All of them give consistent output. Man this is such a drag.
>
> Arun
>
> On Thursday, December 19, 2013 at 8:37 AM, Kevin Ushey wrote:
>
> Hi Arun,
>
> Here's the output on my machine -- other information missing from
> before; it's with OSX Mavericks, with R and data.table compiled with
> Apple clang.
>
> ---
>
> library(data.table, lib="/Users/kevinushey/Library/R/3.1/library")
> set.seed(32)
> n <- 3
> dt <- data.table(
>
> + y=rnorm(n),
> + by=round( rnorm(n), 1)
> + )
>
> ## run one
>
> byval <- list(by=dt$by)
> (o__ <- data.table:::fastorder(byval)) # 2,3,1
>
> [1] 2 3 1
>
> (f__ = data.table:::uniqlist(byval, order=o__)) # 1,3
>
> [1] 1 2 3
>
> (len__ = data.table:::uniqlengths(f__, nrow(dt))) # 2,1
>
> [1] 1 1 1
>
> (firstofeachgroup = o__[f__]) # 2,1
>
> [1] 2 3 1
>
> (origorder = data.table:::iradixorder(firstofeachgroup)) # 2,1
>
> [1] 3 1 2
>
> (f__ = f__[origorder]) # 3,1
>
> [1] 3 1 2
>
> (len__ = len__[origorder]) # 2,1
>
> [1] 1 1 1
>
> ## run two
>
> (o__ <- data.table:::fastorder(byval)) # 2,3,1
>
> [1] 1 2 3
>
> (f__ = data.table:::uniqlist(byval, order=o__)) # 1,3
>
> [1] 1 3
>
> (len__ = data.table:::uniqlengths(f__, nrow(dt))) # 2,1
>
> [1] 2 1
>
> (firstofeachgroup = o__[f__]) # 2,1
>
> [1] 1 3
>
> (origorder = data.table:::iradixorder(firstofeachgroup)) # 2,1
>
> [1] 1 2
>
> (f__ = f__[origorder]) # 3,1
>
> [1] 1 3
>
> (len__ = len__[origorder]) # 2,1
>
> [1] 2 1
>
> On Wed, Dec 18, 2013 at 11:22 PM, Arunkumar Srinivasan
> <aragorn168b at gmail.com> wrote:
>
> Not sure how to debug without being able to reproduce. Tried on Mac OS X
> 10.8.5 and Debian GNU/Linux 7 (wheezy). I don't have access to a windows
> machine. I consistently gives me this:
>
> dt[,
>
> + list(max=max(y, na.rm=TRUE)),
> + by=list(by)
> + ]
> by max
> 1: 0.7 0.01464054
> 2: 0.4 0.87328871
>
>
> dt[,
>
> + list(max=max(y, na.rm=TRUE)),
> + by=list(by)
> + ]
> by max
> 1: 0.7 0.01464054
> 2: 0.4 0.87328871
>
> Can either of you provide me with the output of these steps in cases where
> there's an error? I've commented the output I get for each step.
>
> byval <- list(by=dt$by)
> o__ <- data.table:::fastorder(byval) # 2,3,1
> f__ = data.table:::uniqlist(byval, order=o__) # 1,3
> len__ = data.table:::uniqlengths(f__, nrow(dt)) # 2,1
> firstofeachgroup = o__[f__] # 2,1
> origorder = data.table:::iradixorder(firstofeachgroup) # 2,1
> f__ = f__[origorder] # 3,1
> len__ = len__[origorder] # 2,1
>
>
> Arun
>
> <...snip...>
>
>


More information about the datatable-help mailing list