[datatable-help] 'by' on a numeric column produces inconsistent utput

Kevin Ushey kevinushey at gmail.com
Thu Dec 19 08:37:21 CET 2013


Hi Arun,

Here's the output on my machine -- other information missing from
before; it's with OSX Mavericks, with R and data.table compiled with
Apple clang.

---

> library(data.table, lib="/Users/kevinushey/Library/R/3.1/library")
> set.seed(32)
> n <- 3
> dt <- data.table(
+   y=rnorm(n),
+   by=round( rnorm(n), 1)
+ )
>
## run one
> byval <- list(by=dt$by)
> (o__ <- data.table:::fastorder(byval)) # 2,3,1
[1] 2 3 1
> (f__ = data.table:::uniqlist(byval, order=o__)) # 1,3
[1] 1 2 3
> (len__ = data.table:::uniqlengths(f__, nrow(dt))) # 2,1
[1] 1 1 1
> (firstofeachgroup = o__[f__]) # 2,1
[1] 2 3 1
> (origorder = data.table:::iradixorder(firstofeachgroup)) # 2,1
[1] 3 1 2
> (f__ = f__[origorder]) # 3,1
[1] 3 1 2
> (len__ = len__[origorder]) # 2,1
[1] 1 1 1

## run two
> (o__ <- data.table:::fastorder(byval)) # 2,3,1
[1] 1 2 3
> (f__ = data.table:::uniqlist(byval, order=o__)) # 1,3
[1] 1 3
> (len__ = data.table:::uniqlengths(f__, nrow(dt))) # 2,1
[1] 2 1
> (firstofeachgroup = o__[f__]) # 2,1
[1] 1 3
> (origorder = data.table:::iradixorder(firstofeachgroup)) # 2,1
[1] 1 2
> (f__ = f__[origorder]) # 3,1
[1] 1 3
> (len__ = len__[origorder]) # 2,1
[1] 2 1

On Wed, Dec 18, 2013 at 11:22 PM, Arunkumar Srinivasan
<aragorn168b at gmail.com> wrote:
> Not sure how to debug without being able to reproduce. Tried on Mac OS X
> 10.8.5 and Debian GNU/Linux 7 (wheezy). I don't have access to a windows
> machine. I consistently gives me this:
>
>> dt[,
> +    list(max=max(y, na.rm=TRUE)),
> +    by=list(by)
> +    ]
>     by        max
> 1: 0.7 0.01464054
> 2: 0.4 0.87328871
>>
>> dt[,
> +    list(max=max(y, na.rm=TRUE)),
> +    by=list(by)
> +    ]
>     by        max
> 1: 0.7 0.01464054
> 2: 0.4 0.87328871
>
> Can either of you provide me with the output of these steps in cases where
> there's an error? I've commented the output I get for each step.
>
> byval <- list(by=dt$by)
> o__ <- data.table:::fastorder(byval) # 2,3,1
> f__ = data.table:::uniqlist(byval, order=o__) # 1,3
> len__ = data.table:::uniqlengths(f__, nrow(dt)) # 2,1
> firstofeachgroup = o__[f__] # 2,1
> origorder = data.table:::iradixorder(firstofeachgroup) # 2,1
> f__ = f__[origorder] # 3,1
> len__ = len__[origorder] # 2,1
>
>
> Arun
>
> <...snip...>


More information about the datatable-help mailing list