[datatable-help] 'by' on a numeric column produces inconsistent utput

Arunkumar Srinivasan aragorn168b at gmail.com
Thu Dec 19 09:43:17 CET 2013


Simon, 

Thanks. One more towards my way :). I think we've nailed down the problem to R-devel version. I'll write again once I discuss it over with Kevin. 

Arun


On Thursday, December 19, 2013 at 9:26 AM, Simon Zehnder wrote:

> Hi Arun,
> 
> here the results on Mac OS X Mavericks with gcc 4.8.2
> 
> data.table 1.8.10:
> 
> > set.seed(32)
> > n <- 3
> > dt <- data.table(
> > 
> 
> + y=rnorm(n),
> + by=round( rnorm(n), 1)
> + )
> > 
> > dt[,
> + list(max=max(y, na.rm=TRUE)),
> + by=list(by)
> + ]
> by max
> 1: 0.7 0.01464054
> 2: 0.4 0.87328871
> > 
> > dt[,
> + list(max=max(y, na.rm=TRUE)),
> + by=list(by)
> + ]
> by max
> 1: 0.7 0.01464054
> 2: 0.4 0.87328871
> 
> data.table 1.8.11:
> 
> > set.seed(32)
> > n <- 3
> > dt <- data.table(
> > 
> 
> + y=rnorm(n),
> + by=round( rnorm(n), 1)
> + )
> > 
> > dt[,
> + list(max=max(y, na.rm=TRUE)),
> + by=list(by)
> + ]
> by max
> 1: 0.7 0.01464054
> 2: 0.4 0.87328871
> > 
> > dt[,
> + list(max=max(y, na.rm=TRUE)),
> + by=list(by)
> + ]
> by max
> 1: 0.7 0.01464054
> 2: 0.4 0.87328871
> 
> Best
> 
> Simon
> 
> 
> On 19 Dec 2013, at 09:05, Arunkumar Srinivasan <aragorn168b at gmail.com (mailto:aragorn168b at gmail.com)> wrote:
> 
> > Simon, sure.
> > 
> > set.seed(32)
> > n <- 3
> > dt <- data.table(
> > y=rnorm(n),
> > by=round( rnorm(n), 1)
> > )
> > 
> > dt[,
> > list(max=max(y, na.rm=TRUE)),
> > by=list(by)
> > ]
> > 
> > dt[,
> > list(max=max(y, na.rm=TRUE)),
> > by=list(by)
> > ]
> > 
> > 
> > 
> > Arun
> > 
> > On Thursday, December 19, 2013 at 8:49 AM, Simon Zehnder wrote:
> > 
> > > Arun,
> > > 
> > > if you could send me the reproducible code in copyable form I can as well try it on Mac OS X Mavericks with gcc 4.8.
> > > 
> > > Best
> > > 
> > > Simon
> > > 
> > > On 19 Dec 2013, at 08:44, Arunkumar Srinivasan <aragorn168b at gmail.com (mailto:aragorn168b at gmail.com)> wrote:
> > > 
> > > > Aha, the issue seems to be with 'uniqlist', not sure why it gives
> > > > > (f__ = data.table:::uniqlist(byval, order=o__)) # 1,3
> > > > 
> > > > 1,2,3 for you and 1,3 consistently for me. I'll revert this back to `duplist` for now. Not sure how to solve this though. I've tried it so far on 3 machines:
> > > > 
> > > > 1) OS X 10.8.5 + libvm (gcc)
> > > > 2) OS X Mavericks + Clang
> > > > 3) Debian Weezy + gcc
> > > > 
> > > > All of them give consistent output. Man this is such a drag.
> > > > 
> > > > Arun
> > > > 
> > > > On Thursday, December 19, 2013 at 8:37 AM, Kevin Ushey wrote:
> > > > 
> > > > > Hi Arun,
> > > > > 
> > > > > Here's the output on my machine -- other information missing from
> > > > > before; it's with OSX Mavericks, with R and data.table compiled with
> > > > > Apple clang.
> > > > > 
> > > > > ---
> > > > > 
> > > > > > library(data.table, lib="/Users/kevinushey/Library/R/3.1/library")
> > > > > > set.seed(32)
> > > > > > n <- 3
> > > > > > dt <- data.table(
> > > > > > 
> > > > > 
> > > > > + y=rnorm(n),
> > > > > + by=round( rnorm(n), 1)
> > > > > + )
> > > > > ## run one
> > > > > > byval <- list(by=dt$by)
> > > > > > (o__ <- data.table:::fastorder(byval)) # 2,3,1
> > > > > > 
> > > > > 
> > > > > [1] 2 3 1
> > > > > > (f__ = data.table:::uniqlist(byval, order=o__)) # 1,3
> > > > > 
> > > > > [1] 1 2 3
> > > > > > (len__ = data.table:::uniqlengths(f__, nrow(dt))) # 2,1
> > > > > 
> > > > > [1] 1 1 1
> > > > > > (firstofeachgroup = o__[f__]) # 2,1
> > > > > 
> > > > > [1] 2 3 1
> > > > > > (origorder = data.table:::iradixorder(firstofeachgroup)) # 2,1
> > > > > 
> > > > > [1] 3 1 2
> > > > > > (f__ = f__[origorder]) # 3,1
> > > > > 
> > > > > [1] 3 1 2
> > > > > > (len__ = len__[origorder]) # 2,1
> > > > > 
> > > > > [1] 1 1 1
> > > > > 
> > > > > ## run two
> > > > > > (o__ <- data.table:::fastorder(byval)) # 2,3,1
> > > > > 
> > > > > [1] 1 2 3
> > > > > > (f__ = data.table:::uniqlist(byval, order=o__)) # 1,3
> > > > > 
> > > > > [1] 1 3
> > > > > > (len__ = data.table:::uniqlengths(f__, nrow(dt))) # 2,1
> > > > > 
> > > > > [1] 2 1
> > > > > > (firstofeachgroup = o__[f__]) # 2,1
> > > > > 
> > > > > [1] 1 3
> > > > > > (origorder = data.table:::iradixorder(firstofeachgroup)) # 2,1
> > > > > 
> > > > > [1] 1 2
> > > > > > (f__ = f__[origorder]) # 3,1
> > > > > 
> > > > > [1] 1 3
> > > > > > (len__ = len__[origorder]) # 2,1
> > > > > 
> > > > > [1] 2 1
> > > > > 
> > > > > On Wed, Dec 18, 2013 at 11:22 PM, Arunkumar Srinivasan
> > > > > <aragorn168b at gmail.com (mailto:aragorn168b at gmail.com)> wrote:
> > > > > > Not sure how to debug without being able to reproduce. Tried on Mac OS X
> > > > > > 10.8.5 and Debian GNU/Linux 7 (wheezy). I don't have access to a windows
> > > > > > machine. I consistently gives me this:
> > > > > > 
> > > > > > > dt[,
> > > > > > + list(max=max(y, na.rm=TRUE)),
> > > > > > + by=list(by)
> > > > > > + ]
> > > > > > by max
> > > > > > 1: 0.7 0.01464054
> > > > > > 2: 0.4 0.87328871
> > > > > > > 
> > > > > > > dt[,
> > > > > > + list(max=max(y, na.rm=TRUE)),
> > > > > > + by=list(by)
> > > > > > + ]
> > > > > > by max
> > > > > > 1: 0.7 0.01464054
> > > > > > 2: 0.4 0.87328871
> > > > > > 
> > > > > > Can either of you provide me with the output of these steps in cases where
> > > > > > there's an error? I've commented the output I get for each step.
> > > > > > 
> > > > > > byval <- list(by=dt$by)
> > > > > > o__ <- data.table:::fastorder(byval) # 2,3,1
> > > > > > f__ = data.table:::uniqlist(byval, order=o__) # 1,3
> > > > > > len__ = data.table:::uniqlengths(f__, nrow(dt)) # 2,1
> > > > > > firstofeachgroup = o__[f__] # 2,1
> > > > > > origorder = data.table:::iradixorder(firstofeachgroup) # 2,1
> > > > > > f__ = f__[origorder] # 3,1
> > > > > > len__ = len__[origorder] # 2,1
> > > > > > 
> > > > > > 
> > > > > > Arun
> > > > > > 
> > > > > > <...snip...>
> > > > 
> > > > _______________________________________________
> > > > datatable-help mailing list
> > > > datatable-help at lists.r-forge.r-project.org (mailto:datatable-help at lists.r-forge.r-project.org)
> > > > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
> > > > 
> > > 
> > > 
> > 
> > 
> 
> 
> 


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/datatable-help/attachments/20131219/effc3224/attachment-0001.html>


More information about the datatable-help mailing list