From kevinushey at gmail.com  Sat Feb  1 20:50:41 2014
From: kevinushey at gmail.com (Kevin Ushey)
Date: Sat, 1 Feb 2014 11:50:41 -0800
Subject: [datatable-help] R-devel breaks data.table
Message-ID: <CAJXgQP1q2aorXTqR3_VHPos92zizPi31aQKJ4w7zFQ4LEZjxkg@mail.gmail.com>

Hi guys,

See the commit here:

https://github.com/wch/r-source/commit/d0aece456bae5377245eb550a7434ba517be12fe

Now if I run the following code, I see an error:

library(data.table)

DT <- data.table(x=1, y=2, z=3)
DT[, k := 4]

Error in `[.data.table`(DT, , `:=`(k, 4)) :
  attempt to set index 3/3 in SET_STRING_ELT

Is this R-devel being overly picky about data.table's overallocation,
or is this a bug in data.table?

This is with data.table 1.8.11 from R-forge (version from yesterday).

R Under development (unstable) (2014-02-01 r64910)
Platform: x86_64-apple-darwin13.0.0 (64-bit)

locale:
[1] en_CA.UTF-8/en_CA.UTF-8/en_CA.UTF-8/C/en_CA.UTF-8/en_CA.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
[1] data.table_1.8.11    knitr_1.5.15         devtools_1.4.1.99
BiocInstaller_1.13.3

loaded via a namespace (and not attached):
 [1] compiler_3.1.0 digest_0.6.4   evaluate_0.5.1 formatR_0.10
httr_0.2       memoise_0.1
 [7] parallel_3.1.0 plyr_1.8       RCurl_1.95-4.1 reshape2_1.2.2
stringr_0.6.2  tools_3.1.0
[13] whisker_0.3-2

-Kevin

From mdowle at mdowle.plus.com  Sun Feb  2 03:06:59 2014
From: mdowle at mdowle.plus.com (Matt Dowle)
Date: Sun, 02 Feb 2014 02:06:59 +0000
Subject: [datatable-help] R-devel breaks data.table
In-Reply-To: <CAJXgQP1q2aorXTqR3_VHPos92zizPi31aQKJ4w7zFQ4LEZjxkg@mail.gmail.com>
References: <CAJXgQP1q2aorXTqR3_VHPos92zizPi31aQKJ4w7zFQ4LEZjxkg@mail.gmail.com>
Message-ID: <52EDA843.80400@mdowle.plus.com>


Hi Kevin,

Yes R-devel has a new check as of Friday.  No problem in data.table, 
just R getting stricter which is a good.

Fixed and v1.8.11 (r1108) works again on latest R-devel 2014-02-01 r64910.

Thanks,
Matt


On 01/02/14 19:50, Kevin Ushey wrote:
> Hi guys,
>
> See the commit here:
>
> https://github.com/wch/r-source/commit/d0aece456bae5377245eb550a7434ba517be12fe
>
> Now if I run the following code, I see an error:
>
> library(data.table)
>
> DT <- data.table(x=1, y=2, z=3)
> DT[, k := 4]
>
> Error in `[.data.table`(DT, , `:=`(k, 4)) :
>    attempt to set index 3/3 in SET_STRING_ELT
>
> Is this R-devel being overly picky about data.table's overallocation,
> or is this a bug in data.table?
>
> This is with data.table 1.8.11 from R-forge (version from yesterday).
>
> R Under development (unstable) (2014-02-01 r64910)
> Platform: x86_64-apple-darwin13.0.0 (64-bit)
>
> locale:
> [1] en_CA.UTF-8/en_CA.UTF-8/en_CA.UTF-8/C/en_CA.UTF-8/en_CA.UTF-8
>
> attached base packages:
> [1] stats     graphics  grDevices utils     datasets  methods   base
>
> other attached packages:
> [1] data.table_1.8.11    knitr_1.5.15         devtools_1.4.1.99
> BiocInstaller_1.13.3
>
> loaded via a namespace (and not attached):
>   [1] compiler_3.1.0 digest_0.6.4   evaluate_0.5.1 formatR_0.10
> httr_0.2       memoise_0.1
>   [7] parallel_3.1.0 plyr_1.8       RCurl_1.95-4.1 reshape2_1.2.2
> stringr_0.6.2  tools_3.1.0
> [13] whisker_0.3-2
>
> -Kevin
> _______________________________________________
> datatable-help mailing list
> datatable-help at lists.r-forge.r-project.org
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
>


From ggrothendieck at gmail.com  Sun Feb  2 13:27:36 2014
From: ggrothendieck at gmail.com (Gabor Grothendieck)
Date: Sun, 2 Feb 2014 07:27:36 -0500
Subject: [datatable-help] datatable roll="next" takes 150 times longer than
	findInterval
Message-ID: <CAP01uRno0PLSLOmA=vQ8L2PU+XTM79AnP7fko2iFR9qsy_7v9A@mail.gmail.com>

The benchmark at the bottom of this post shows a problem where a data.table
roll="next" took nearly 150x longer than a base findInterval() solution.
 (The data.table solution is easier to write though.) This suggests an area
for possible speed improvement.

http://stackoverflow.com/questions/21499742/fast-minimum-distance-interval-between-elements-of-2-logical-vectors-take-2/21500855#21500855

-- 
Statistics & Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/datatable-help/attachments/20140202/339e2573/attachment.html>

From mdowle at mdowle.plus.com  Sun Feb  2 19:57:43 2014
From: mdowle at mdowle.plus.com (Matt Dowle)
Date: Sun, 02 Feb 2014 18:57:43 +0000
Subject: [datatable-help] datatable roll="next" takes 150 times longer
 than findInterval
In-Reply-To: <CAP01uRno0PLSLOmA=vQ8L2PU+XTM79AnP7fko2iFR9qsy_7v9A@mail.gmail.com>
References: <CAP01uRno0PLSLOmA=vQ8L2PU+XTM79AnP7fko2iFR9qsy_7v9A@mail.gmail.com>
Message-ID: <52EE9527.10608@mdowle.plus.com>


But this is at the *micro* second level ?!!

I confirm those results on my slow netbook but remember these are 
**micro** seconds i.e. 71,000 here is less than 0.1 of a second.

 > microbenchmark(flodel(X,Y), GG1(X,Y), GG2(X,Y))
Unit: microseconds
          expr       min        lq      median          uq max neval
  flodel(X, Y)   330.798   369.369    402.7935    455.3225 17996.26   100
     GG1(X, Y) 14287.380 14370.038  14466.5990  16010.5440 121082.77   100
     GG2(X, Y) 71164.270 85751.437 107951.3415 161676.5720 366003.62   100

To put it in some perspective :

 > system.time(GG2(X,Y))
    user  system elapsed
   0.072   0.000   0.072
 > system.time(GG2(X,Y))
    user  system elapsed
   0.080   0.000   0.079
 > system.time(GG2(X,Y))
    user  system elapsed
   0.072   0.000   0.072

Where those times are in seconds.   So the task in question here, takes 
0.07 seconds ?!

The 150x longer figure is actually (using figures from the S.O. answer)  
24695 microseconds (i.e. 0.024 seconds) divided by 168 microseconds 
(0.000168 seconds).  0.024 seconds / 0.000168 = "150 times".   If you 
rounded to milliseconds you could say data.table is infinitely slower  
(24ms / 0ms = Inf).

I can believe there's scope for improvement, sure,  but not from this 
benchmark. The vectors need to be *much* bigger and replications needs 
to be *much* smaller, say 3.   The task being timed needs to take a 
meaningful amount of time (say 5 seconds) *for a single run*.

Matt


On 02/02/14 12:27, Gabor Grothendieck wrote:
> The benchmark at the bottom of this post shows a problem where a 
> data.table roll="next" took nearly 150x longer than a base 
> findInterval() solution.  (The data.table solution is easier to write 
> though.) This suggests an area for possible speed improvement.
>
> http://stackoverflow.com/questions/21499742/fast-minimum-distance-interval-between-elements-of-2-logical-vectors-take-2/21500855#21500855
>
> -- 
> Statistics & Software Consulting
> GKX Group, GKX Associates Inc.
> tel: 1-877-GKX-GROUP
> email: ggrothendieck at gmail.com <http://gmail.com>
>
>
> _______________________________________________
> datatable-help mailing list
> datatable-help at lists.r-forge.r-project.org
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/datatable-help/attachments/20140202/e58c3d48/attachment.html>

From mdowle at mdowle.plus.com  Mon Feb  3 12:46:23 2014
From: mdowle at mdowle.plus.com (Matt Dowle)
Date: Mon, 03 Feb 2014 11:46:23 +0000
Subject: [datatable-help] datatable roll="next" takes 150 times longer
 than findInterval
In-Reply-To: <52EE9527.10608@mdowle.plus.com>
References: <CAP01uRno0PLSLOmA=vQ8L2PU+XTM79AnP7fko2iFR9qsy_7v9A@mail.gmail.com>
 <52EE9527.10608@mdowle.plus.com>
Message-ID: <52EF818F.8090907@mdowle.plus.com>

Gabor,

With that said about it being a micro benchmark,  by-without-by might be 
at play in GG2(X,Y) here; i.e. running j for each row of i, where it 
could run once.  I remember you and others quite rightly said 
by-without-by should be explicit ... still got to make that change.  A 
similar speed issue came up recently somewhere else as well which the 
change in default should help.

Matt

On 02/02/14 18:57, Matt Dowle wrote:
>
> But this is at the *micro* second level ?!!
>
> I confirm those results on my slow netbook but remember these are 
> **micro** seconds i.e. 71,000 here is less than 0.1 of a second.
>
> > microbenchmark(flodel(X,Y), GG1(X,Y), GG2(X,Y))
> Unit: microseconds
>          expr       min        lq      median          uq max neval
>  flodel(X, Y)   330.798   369.369    402.7935    455.3225 17996.26   100
>     GG1(X, Y) 14287.380 14370.038  14466.5990  16010.5440 121082.77   100
>     GG2(X, Y) 71164.270 85751.437 107951.3415 161676.5720 366003.62   100
>
> To put it in some perspective :
>
> > system.time(GG2(X,Y))
>    user  system elapsed
>   0.072   0.000   0.072
> > system.time(GG2(X,Y))
>    user  system elapsed
>   0.080   0.000   0.079
> > system.time(GG2(X,Y))
>    user  system elapsed
>   0.072   0.000   0.072
>
> Where those times are in seconds.   So the task in question here,  
> takes 0.07 seconds ?!
>
> The 150x longer figure is actually (using figures from the S.O. 
> answer)  24695 microseconds (i.e. 0.024 seconds) divided by 168 
> microseconds (0.000168 seconds).  0.024 seconds / 0.000168 = "150 
> times".   If you rounded to milliseconds you could say data.table is 
> infinitely slower  (24ms / 0ms = Inf).
>
> I can believe there's scope for improvement, sure,  but not from this 
> benchmark. The vectors need to be *much* bigger and replications needs 
> to be *much* smaller, say 3.   The task being timed needs to take a 
> meaningful amount of time (say 5 seconds) *for a single run*.
>
> Matt
>
>
> On 02/02/14 12:27, Gabor Grothendieck wrote:
>> The benchmark at the bottom of this post shows a problem where a 
>> data.table roll="next" took nearly 150x longer than a base 
>> findInterval() solution.  (The data.table solution is easier to write 
>> though.) This suggests an area for possible speed improvement.
>>
>> http://stackoverflow.com/questions/21499742/fast-minimum-distance-interval-between-elements-of-2-logical-vectors-take-2/21500855#21500855
>>
>> -- 
>> Statistics & Software Consulting
>> GKX Group, GKX Associates Inc.
>> tel: 1-877-GKX-GROUP
>> email: ggrothendieck at gmail.com <http://gmail.com>
>>
>>
>> _______________________________________________
>> datatable-help mailing list
>> datatable-help at lists.r-forge.r-project.org
>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/datatable-help/attachments/20140203/42c1828e/attachment.html>

From npgraham1 at gmail.com  Wed Feb  5 02:36:38 2014
From: npgraham1 at gmail.com (Nathaniel Graham)
Date: Tue, 4 Feb 2014 20:36:38 -0500
Subject: [datatable-help] merging data tables on date ranges
Message-ID: <CALhihUjqDgPR1gyH+UENKEWatYFTBUc65D4MPH4fOnceB5Jkjw@mail.gmail.com>

I'm trying to figure out how to merge two data tables using the dates in
both.  One table a set of people, which has the to and from dates and their
address at each place they've lived.  The other has their work history,
also with to and from dates.  Obviously, there isn't a one-to-one
relationship; individuals may have several jobs while staying in the same
place, several homes over the course of a job, and any sort of overlapping
you can imagine.  Both tables are reasonably large; the residences has
about 950k rows, and the employment has about 1.2M rows.

To give you a bit of flavor, the first ten rows of each:
> work.history[1:10, list(icrdn, fromdate, todate, state, postalcode)]
    icrdn fromdate   todate state postalcode
 1:   145 Apr 1988 Jan 1990    FL      33432
 2:   145 Jan 1990 Jan 1997    FL      33432
 3:   145 Jan 1997 Dec 2011    FL      33444
 4:   145 Jan 1997 Dec 2011    FL      33444
 5:   145 Jan 1997 Dec 2011    FL      33444
 6:   170 Oct 1983 Apr 2002    NE      68114
 7:   170 Sep 1972 Dec 2011    IL      60443
 8:   170 Sep 1972 Dec 2011    IL 61821-3066
 9:   183 Aug 2000 Dec 2011    GA      30305
10:   183 Aug 2000 Dec 2011    GA      30305
> residences[1:10]
    icrdn fromdate  todate state postalcode
 1:   145  10/1992 03/2004    FL      33432
 2:   145  03/2004            FL      33487
 3:   170  09/1995            IL      61821
 4:   183  05/1993 08/2000    GA      30342
 5:   183  08/2000 09/2001    GA      30342
 6:   183  09/2001 08/2004    GA      30305
 7:   183  08/2004            GA      30073
 8:   183  02/2005            GA      30342
 9:   183  06/2006            GA      30075
10:   183  07/1974 05/1993    GA      30338

The 'icrdn' column is an identifier unique to each person.

What I'm looking for is a data table with a row for each residence-job
pair.  Any residence that doesn't have a job in the sample can be safely
dropped, and vice-versa.

Thanks in advance for any help anyone can offer.
-------
Nathaniel Graham
npgraham1 at gmail.com
npgraham1 at uky.edu
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/datatable-help/attachments/20140204/5b74f166/attachment.html>

From ggrothendieck at gmail.com  Wed Feb  5 16:22:32 2014
From: ggrothendieck at gmail.com (Gabor Grothendieck)
Date: Wed, 5 Feb 2014 10:22:32 -0500
Subject: [datatable-help] datatable roll="next" takes 150 times longer
 than findInterval
In-Reply-To: <52EF818F.8090907@mdowle.plus.com>
References: <CAP01uRno0PLSLOmA=vQ8L2PU+XTM79AnP7fko2iFR9qsy_7v9A@mail.gmail.com>
 <52EE9527.10608@mdowle.plus.com> <52EF818F.8090907@mdowle.plus.com>
Message-ID: <CAP01uRmAA2bYD2_JXD3-mM4KPfrW4DQSsOFSQagVA++qLoSMhg@mail.gmail.com>

There was anoither benchmark posted with larger data and longer times
but this time data.table stopped with an error.  See:

http://stackoverflow.com/questions/21499742/fast-minimum-distance-interval-between-elements-of-2-logical-vectors-take-2/21500855#21500855

On Mon, Feb 3, 2014 at 6:46 AM, Matt Dowle <mdowle at mdowle.plus.com> wrote:
> Gabor,
>
> With that said about it being a micro benchmark,  by-without-by might be at
> play in GG2(X,Y) here; i.e. running j for each row of i, where it could run
> once.  I remember you and others quite rightly said by-without-by should be
> explicit ... still got to make that change.  A similar speed issue came up
> recently somewhere else as well which the change in default should help.
>
> Matt
>
>
> On 02/02/14 18:57, Matt Dowle wrote:
>
>
> But this is at the *micro* second level ?!!
>
> I confirm those results on my slow netbook but remember these are **micro**
> seconds i.e. 71,000 here is less than 0.1 of a second.
>
>> microbenchmark(flodel(X,Y), GG1(X,Y), GG2(X,Y))
> Unit: microseconds
>          expr       min        lq      median          uq       max neval
>  flodel(X, Y)   330.798   369.369    402.7935    455.3225  17996.26   100
>     GG1(X, Y) 14287.380 14370.038  14466.5990  16010.5440 121082.77   100
>     GG2(X, Y) 71164.270 85751.437 107951.3415 161676.5720 366003.62   100
>
> To put it in some perspective :
>
>> system.time(GG2(X,Y))
>    user  system elapsed
>   0.072   0.000   0.072
>> system.time(GG2(X,Y))
>    user  system elapsed
>   0.080   0.000   0.079
>> system.time(GG2(X,Y))
>    user  system elapsed
>   0.072   0.000   0.072
>
> Where those times are in seconds.   So the task in question here,  takes
> 0.07 seconds ?!
>
> The 150x longer figure is actually (using figures from the S.O. answer)
> 24695 microseconds (i.e. 0.024 seconds) divided by 168 microseconds
> (0.000168 seconds).  0.024 seconds / 0.000168 = "150 times".   If you
> rounded to milliseconds you could say data.table is infinitely slower  (24ms
> / 0ms = Inf).
>
> I can believe there's scope for improvement, sure,  but not from this
> benchmark. The vectors need to be *much* bigger and replications needs to be
> *much* smaller, say 3.   The task being timed needs to take a meaningful
> amount of time (say 5 seconds) *for a single run*.
>
> Matt
>
>
> On 02/02/14 12:27, Gabor Grothendieck wrote:
>
> The benchmark at the bottom of this post shows a problem where a data.table
> roll="next" took nearly 150x longer than a base findInterval() solution.
> (The data.table solution is easier to write though.) This suggests an area
> for possible speed improvement.
>
> http://stackoverflow.com/questions/21499742/fast-minimum-distance-interval-between-elements-of-2-logical-vectors-take-2/21500855#21500855
>
> --
> Statistics & Software Consulting
> GKX Group, GKX Associates Inc.
> tel: 1-877-GKX-GROUP
> email: ggrothendieck at gmail.com
>
>
> _______________________________________________
> datatable-help mailing list
> datatable-help at lists.r-forge.r-project.org
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
>
>
>


-- 
Statistics & Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com

From aragorn168b at gmail.com  Wed Feb  5 16:32:03 2014
From: aragorn168b at gmail.com (Arunkumar Srinivasan)
Date: Wed, 5 Feb 2014 16:32:03 +0100
Subject: [datatable-help] datatable roll="next" takes 150 times longer
 than findInterval
In-Reply-To: <CAP01uRmAA2bYD2_JXD3-mM4KPfrW4DQSsOFSQagVA++qLoSMhg@mail.gmail.com>
References: <CAP01uRno0PLSLOmA=vQ8L2PU+XTM79AnP7fko2iFR9qsy_7v9A@mail.gmail.com>
 <52EE9527.10608@mdowle.plus.com> <52EF818F.8090907@mdowle.plus.com>
 <CAP01uRmAA2bYD2_JXD3-mM4KPfrW4DQSsOFSQagVA++qLoSMhg@mail.gmail.com>
Message-ID: <CAAf756MF7CjfqNJ+_zzfwLTgBwQkiJhhRJXp4dOuOtZg6Ma1+w@mail.gmail.com>

Just tested. Works just fine (on 1.8.11). Takes 16 seconds as opposed to
Flodel's which takes 1.4 seconds on my laptop. Also identical returned TRUE.
Will see where's the delay coming from.


On Wed, Feb 5, 2014 at 4:22 PM, Gabor Grothendieck
<ggrothendieck at gmail.com>wrote:

> There was anoither benchmark posted with larger data and longer times
> but this time data.table stopped with an error.  See:
>
>
> http://stackoverflow.com/questions/21499742/fast-minimum-distance-interval-between-elements-of-2-logical-vectors-take-2/21500855#21500855
>
> On Mon, Feb 3, 2014 at 6:46 AM, Matt Dowle <mdowle at mdowle.plus.com> wrote:
> > Gabor,
> >
> > With that said about it being a micro benchmark,  by-without-by might be
> at
> > play in GG2(X,Y) here; i.e. running j for each row of i, where it could
> run
> > once.  I remember you and others quite rightly said by-without-by should
> be
> > explicit ... still got to make that change.  A similar speed issue came
> up
> > recently somewhere else as well which the change in default should help.
> >
> > Matt
> >
> >
> > On 02/02/14 18:57, Matt Dowle wrote:
> >
> >
> > But this is at the *micro* second level ?!!
> >
> > I confirm those results on my slow netbook but remember these are
> **micro**
> > seconds i.e. 71,000 here is less than 0.1 of a second.
> >
> >> microbenchmark(flodel(X,Y), GG1(X,Y), GG2(X,Y))
> > Unit: microseconds
> >          expr       min        lq      median          uq       max neval
> >  flodel(X, Y)   330.798   369.369    402.7935    455.3225  17996.26   100
> >     GG1(X, Y) 14287.380 14370.038  14466.5990  16010.5440 121082.77   100
> >     GG2(X, Y) 71164.270 85751.437 107951.3415 161676.5720 366003.62   100
> >
> > To put it in some perspective :
> >
> >> system.time(GG2(X,Y))
> >    user  system elapsed
> >   0.072   0.000   0.072
> >> system.time(GG2(X,Y))
> >    user  system elapsed
> >   0.080   0.000   0.079
> >> system.time(GG2(X,Y))
> >    user  system elapsed
> >   0.072   0.000   0.072
> >
> > Where those times are in seconds.   So the task in question here,  takes
> > 0.07 seconds ?!
> >
> > The 150x longer figure is actually (using figures from the S.O. answer)
> > 24695 microseconds (i.e. 0.024 seconds) divided by 168 microseconds
> > (0.000168 seconds).  0.024 seconds / 0.000168 = "150 times".   If you
> > rounded to milliseconds you could say data.table is infinitely slower
>  (24ms
> > / 0ms = Inf).
> >
> > I can believe there's scope for improvement, sure,  but not from this
> > benchmark. The vectors need to be *much* bigger and replications needs
> to be
> > *much* smaller, say 3.   The task being timed needs to take a meaningful
> > amount of time (say 5 seconds) *for a single run*.
> >
> > Matt
> >
> >
> > On 02/02/14 12:27, Gabor Grothendieck wrote:
> >
> > The benchmark at the bottom of this post shows a problem where a
> data.table
> > roll="next" took nearly 150x longer than a base findInterval() solution.
> > (The data.table solution is easier to write though.) This suggests an
> area
> > for possible speed improvement.
> >
> >
> http://stackoverflow.com/questions/21499742/fast-minimum-distance-interval-between-elements-of-2-logical-vectors-take-2/21500855#21500855
> >
> > --
> > Statistics & Software Consulting
> > GKX Group, GKX Associates Inc.
> > tel: 1-877-GKX-GROUP
> > email: ggrothendieck at gmail.com
> >
> >
> > _______________________________________________
> > datatable-help mailing list
> > datatable-help at lists.r-forge.r-project.org
> >
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
> >
> >
> >
>
>
>
> --
> Statistics & Software Consulting
> GKX Group, GKX Associates Inc.
> tel: 1-877-GKX-GROUP
> email: ggrothendieck at gmail.com
> _______________________________________________
> datatable-help mailing list
> datatable-help at lists.r-forge.r-project.org
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/datatable-help/attachments/20140205/69ca4083/attachment.html>

From aragorn168b at gmail.com  Wed Feb  5 16:42:10 2014
From: aragorn168b at gmail.com (Arunkumar Srinivasan)
Date: Wed, 5 Feb 2014 16:42:10 +0100
Subject: [datatable-help] datatable roll="next" takes 150 times longer
 than findInterval
In-Reply-To: <CAAf756MF7CjfqNJ+_zzfwLTgBwQkiJhhRJXp4dOuOtZg6Ma1+w@mail.gmail.com>
References: <CAP01uRno0PLSLOmA=vQ8L2PU+XTM79AnP7fko2iFR9qsy_7v9A@mail.gmail.com>
 <52EE9527.10608@mdowle.plus.com> <52EF818F.8090907@mdowle.plus.com>
 <CAP01uRmAA2bYD2_JXD3-mM4KPfrW4DQSsOFSQagVA++qLoSMhg@mail.gmail.com>
 <CAAf756MF7CjfqNJ+_zzfwLTgBwQkiJhhRJXp4dOuOtZg6Ma1+w@mail.gmail.com>
Message-ID: <CAAf756OmX_DEe8xXogm9LUhDnt1dR=8ckZ6e+gpteaL7ni+JZw@mail.gmail.com>

Seems like the "by-without-by" is what's slowing things down:

require(data.table)
dtx <- data.table(x=which(X), key="x")
dty <- data.table(y=which(Y), key="y")
dtx[, x1 := x]
dty[, y1 := y]
system.time(ans <- dty[dtx, roll="nearest"][, abs(x1-y1)])
   user  system elapsed
  1.321   0.076   1.396
system.time(ans2 <- flodel(x,y))
   user  system elapsed
  0.936   0.044   0.977

identical(ans, ans2) # [1] TRUE


On Wed, Feb 5, 2014 at 4:32 PM, Arunkumar Srinivasan
<aragorn168b at gmail.com>wrote:

> Just tested. Works just fine (on 1.8.11). Takes 16 seconds as opposed to
> Flodel's which takes 1.4 seconds on my laptop. Also identical returned TRUE.
> Will see where's the delay coming from.
>
>
> On Wed, Feb 5, 2014 at 4:22 PM, Gabor Grothendieck <
> ggrothendieck at gmail.com> wrote:
>
>> There was anoither benchmark posted with larger data and longer times
>> but this time data.table stopped with an error.  See:
>>
>>
>> http://stackoverflow.com/questions/21499742/fast-minimum-distance-interval-between-elements-of-2-logical-vectors-take-2/21500855#21500855
>>
>> On Mon, Feb 3, 2014 at 6:46 AM, Matt Dowle <mdowle at mdowle.plus.com>
>> wrote:
>> > Gabor,
>> >
>> > With that said about it being a micro benchmark,  by-without-by might
>> be at
>> > play in GG2(X,Y) here; i.e. running j for each row of i, where it could
>> run
>> > once.  I remember you and others quite rightly said by-without-by
>> should be
>> > explicit ... still got to make that change.  A similar speed issue came
>> up
>> > recently somewhere else as well which the change in default should help.
>> >
>> > Matt
>> >
>> >
>> > On 02/02/14 18:57, Matt Dowle wrote:
>> >
>> >
>> > But this is at the *micro* second level ?!!
>> >
>> > I confirm those results on my slow netbook but remember these are
>> **micro**
>> > seconds i.e. 71,000 here is less than 0.1 of a second.
>> >
>> >> microbenchmark(flodel(X,Y), GG1(X,Y), GG2(X,Y))
>> > Unit: microseconds
>> >          expr       min        lq      median          uq       max
>> neval
>> >  flodel(X, Y)   330.798   369.369    402.7935    455.3225  17996.26
>> 100
>> >     GG1(X, Y) 14287.380 14370.038  14466.5990  16010.5440 121082.77
>> 100
>> >     GG2(X, Y) 71164.270 85751.437 107951.3415 161676.5720 366003.62
>> 100
>> >
>> > To put it in some perspective :
>> >
>> >> system.time(GG2(X,Y))
>> >    user  system elapsed
>> >   0.072   0.000   0.072
>> >> system.time(GG2(X,Y))
>> >    user  system elapsed
>> >   0.080   0.000   0.079
>> >> system.time(GG2(X,Y))
>> >    user  system elapsed
>> >   0.072   0.000   0.072
>> >
>> > Where those times are in seconds.   So the task in question here,  takes
>> > 0.07 seconds ?!
>> >
>> > The 150x longer figure is actually (using figures from the S.O. answer)
>> > 24695 microseconds (i.e. 0.024 seconds) divided by 168 microseconds
>> > (0.000168 seconds).  0.024 seconds / 0.000168 = "150 times".   If you
>> > rounded to milliseconds you could say data.table is infinitely slower
>>  (24ms
>> > / 0ms = Inf).
>> >
>> > I can believe there's scope for improvement, sure,  but not from this
>> > benchmark. The vectors need to be *much* bigger and replications needs
>> to be
>> > *much* smaller, say 3.   The task being timed needs to take a meaningful
>> > amount of time (say 5 seconds) *for a single run*.
>> >
>> > Matt
>> >
>> >
>> > On 02/02/14 12:27, Gabor Grothendieck wrote:
>> >
>> > The benchmark at the bottom of this post shows a problem where a
>> data.table
>> > roll="next" took nearly 150x longer than a base findInterval() solution.
>> > (The data.table solution is easier to write though.) This suggests an
>> area
>> > for possible speed improvement.
>> >
>> >
>> http://stackoverflow.com/questions/21499742/fast-minimum-distance-interval-between-elements-of-2-logical-vectors-take-2/21500855#21500855
>> >
>> > --
>> > Statistics & Software Consulting
>> > GKX Group, GKX Associates Inc.
>> > tel: 1-877-GKX-GROUP
>> > email: ggrothendieck at gmail.com
>> >
>> >
>> > _______________________________________________
>> > datatable-help mailing list
>> > datatable-help at lists.r-forge.r-project.org
>> >
>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
>> >
>> >
>> >
>>
>>
>>
>> --
>> Statistics & Software Consulting
>> GKX Group, GKX Associates Inc.
>> tel: 1-877-GKX-GROUP
>> email: ggrothendieck at gmail.com
>> _______________________________________________
>> datatable-help mailing list
>> datatable-help at lists.r-forge.r-project.org
>>
>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/datatable-help/attachments/20140205/5f64f128/attachment-0001.html>

From aragorn168b at gmail.com  Wed Feb  5 17:12:03 2014
From: aragorn168b at gmail.com (Arunkumar Srinivasan)
Date: Wed, 5 Feb 2014 17:12:03 +0100
Subject: [datatable-help] datatable roll="next" takes 150 times longer
 than findInterval
In-Reply-To: <CAAf756OmX_DEe8xXogm9LUhDnt1dR=8ckZ6e+gpteaL7ni+JZw@mail.gmail.com>
References: <CAP01uRno0PLSLOmA=vQ8L2PU+XTM79AnP7fko2iFR9qsy_7v9A@mail.gmail.com>
 <52EE9527.10608@mdowle.plus.com> <52EF818F.8090907@mdowle.plus.com>
 <CAP01uRmAA2bYD2_JXD3-mM4KPfrW4DQSsOFSQagVA++qLoSMhg@mail.gmail.com>
 <CAAf756MF7CjfqNJ+_zzfwLTgBwQkiJhhRJXp4dOuOtZg6Ma1+w@mail.gmail.com>
 <CAAf756OmX_DEe8xXogm9LUhDnt1dR=8ckZ6e+gpteaL7ni+JZw@mail.gmail.com>
Message-ID: <CAAf756PCfiXD2rek9zE8593ddqQdBYs0wQ9d7rx_S8bgV4FysQ@mail.gmail.com>

Have edited here now:
http://stackoverflow.com/a/21500855/559784


On Wed, Feb 5, 2014 at 4:42 PM, Arunkumar Srinivasan
<aragorn168b at gmail.com>wrote:

> Seems like the "by-without-by" is what's slowing things down:
>
> require(data.table)
> dtx <- data.table(x=which(X), key="x")
> dty <- data.table(y=which(Y), key="y")
> dtx[, x1 := x]
> dty[, y1 := y]
> system.time(ans <- dty[dtx, roll="nearest"][, abs(x1-y1)])
>    user  system elapsed
>   1.321   0.076   1.396
> system.time(ans2 <- flodel(x,y))
>    user  system elapsed
>   0.936   0.044   0.977
>
> identical(ans, ans2) # [1] TRUE
>
>
> On Wed, Feb 5, 2014 at 4:32 PM, Arunkumar Srinivasan <
> aragorn168b at gmail.com> wrote:
>
>> Just tested. Works just fine (on 1.8.11). Takes 16 seconds as opposed to
>> Flodel's which takes 1.4 seconds on my laptop. Also identical returned TRUE.
>> Will see where's the delay coming from.
>>
>>
>> On Wed, Feb 5, 2014 at 4:22 PM, Gabor Grothendieck <
>> ggrothendieck at gmail.com> wrote:
>>
>>> There was anoither benchmark posted with larger data and longer times
>>> but this time data.table stopped with an error.  See:
>>>
>>>
>>> http://stackoverflow.com/questions/21499742/fast-minimum-distance-interval-between-elements-of-2-logical-vectors-take-2/21500855#21500855
>>>
>>> On Mon, Feb 3, 2014 at 6:46 AM, Matt Dowle <mdowle at mdowle.plus.com>
>>> wrote:
>>> > Gabor,
>>> >
>>> > With that said about it being a micro benchmark,  by-without-by might
>>> be at
>>> > play in GG2(X,Y) here; i.e. running j for each row of i, where it
>>> could run
>>> > once.  I remember you and others quite rightly said by-without-by
>>> should be
>>> > explicit ... still got to make that change.  A similar speed issue
>>> came up
>>> > recently somewhere else as well which the change in default should
>>> help.
>>> >
>>> > Matt
>>> >
>>> >
>>> > On 02/02/14 18:57, Matt Dowle wrote:
>>> >
>>> >
>>> > But this is at the *micro* second level ?!!
>>> >
>>> > I confirm those results on my slow netbook but remember these are
>>> **micro**
>>> > seconds i.e. 71,000 here is less than 0.1 of a second.
>>> >
>>> >> microbenchmark(flodel(X,Y), GG1(X,Y), GG2(X,Y))
>>> > Unit: microseconds
>>> >          expr       min        lq      median          uq       max
>>> neval
>>> >  flodel(X, Y)   330.798   369.369    402.7935    455.3225  17996.26
>>> 100
>>> >     GG1(X, Y) 14287.380 14370.038  14466.5990  16010.5440 121082.77
>>> 100
>>> >     GG2(X, Y) 71164.270 85751.437 107951.3415 161676.5720 366003.62
>>> 100
>>> >
>>> > To put it in some perspective :
>>> >
>>> >> system.time(GG2(X,Y))
>>> >    user  system elapsed
>>> >   0.072   0.000   0.072
>>> >> system.time(GG2(X,Y))
>>> >    user  system elapsed
>>> >   0.080   0.000   0.079
>>> >> system.time(GG2(X,Y))
>>> >    user  system elapsed
>>> >   0.072   0.000   0.072
>>> >
>>> > Where those times are in seconds.   So the task in question here,
>>>  takes
>>> > 0.07 seconds ?!
>>> >
>>> > The 150x longer figure is actually (using figures from the S.O. answer)
>>> > 24695 microseconds (i.e. 0.024 seconds) divided by 168 microseconds
>>> > (0.000168 seconds).  0.024 seconds / 0.000168 = "150 times".   If you
>>> > rounded to milliseconds you could say data.table is infinitely slower
>>>  (24ms
>>> > / 0ms = Inf).
>>> >
>>> > I can believe there's scope for improvement, sure,  but not from this
>>> > benchmark. The vectors need to be *much* bigger and replications needs
>>> to be
>>> > *much* smaller, say 3.   The task being timed needs to take a
>>> meaningful
>>> > amount of time (say 5 seconds) *for a single run*.
>>> >
>>> > Matt
>>> >
>>> >
>>> > On 02/02/14 12:27, Gabor Grothendieck wrote:
>>> >
>>> > The benchmark at the bottom of this post shows a problem where a
>>> data.table
>>> > roll="next" took nearly 150x longer than a base findInterval()
>>> solution.
>>> > (The data.table solution is easier to write though.) This suggests an
>>> area
>>> > for possible speed improvement.
>>> >
>>> >
>>> http://stackoverflow.com/questions/21499742/fast-minimum-distance-interval-between-elements-of-2-logical-vectors-take-2/21500855#21500855
>>> >
>>> > --
>>> > Statistics & Software Consulting
>>> > GKX Group, GKX Associates Inc.
>>> > tel: 1-877-GKX-GROUP
>>> > email: ggrothendieck at gmail.com
>>> >
>>> >
>>> > _______________________________________________
>>> > datatable-help mailing list
>>> > datatable-help at lists.r-forge.r-project.org
>>> >
>>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
>>> >
>>> >
>>> >
>>>
>>>
>>>
>>> --
>>> Statistics & Software Consulting
>>> GKX Group, GKX Associates Inc.
>>> tel: 1-877-GKX-GROUP
>>> email: ggrothendieck at gmail.com
>>> _______________________________________________
>>> datatable-help mailing list
>>> datatable-help at lists.r-forge.r-project.org
>>>
>>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
>>>
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/datatable-help/attachments/20140205/4a0038d2/attachment.html>

From ggrothendieck at gmail.com  Thu Feb  6 12:55:41 2014
From: ggrothendieck at gmail.com (Gabor Grothendieck)
Date: Thu, 6 Feb 2014 06:55:41 -0500
Subject: [datatable-help] datatable roll="next" takes 150 times longer
 than findInterval
In-Reply-To: <CAAf756OmX_DEe8xXogm9LUhDnt1dR=8ckZ6e+gpteaL7ni+JZw@mail.gmail.com>
References: <CAP01uRno0PLSLOmA=vQ8L2PU+XTM79AnP7fko2iFR9qsy_7v9A@mail.gmail.com>
 <52EE9527.10608@mdowle.plus.com> <52EF818F.8090907@mdowle.plus.com>
 <CAP01uRmAA2bYD2_JXD3-mM4KPfrW4DQSsOFSQagVA++qLoSMhg@mail.gmail.com>
 <CAAf756MF7CjfqNJ+_zzfwLTgBwQkiJhhRJXp4dOuOtZg6Ma1+w@mail.gmail.com>
 <CAAf756OmX_DEe8xXogm9LUhDnt1dR=8ckZ6e+gpteaL7ni+JZw@mail.gmail.com>
Message-ID: <CAP01uRnqrTCK7z7Nn=cJ=7PVduk3bx_sst8WUyxLQ60mxBofMw@mail.gmail.com>

On Wed, Feb 5, 2014 at 10:42 AM, Arunkumar Srinivasan
<aragorn168b at gmail.com> wrote:
> Seems like the "by-without-by" is what's slowing things down:
>
> require(data.table)
> dtx <- data.table(x=which(X), key="x")
> dty <- data.table(y=which(Y), key="y")
> dtx[, x1 := x]
> dty[, y1 := y]
> system.time(ans <- dty[dtx, roll="nearest"][, abs(x1-y1)])
>    user  system elapsed
>   1.321   0.076   1.396
> system.time(ans2 <- flodel(x,y))
>    user  system elapsed
>   0.936   0.044   0.977
>
> identical(ans, ans2) # [1] TRUE

What will the code look like after the explicit by-without-by feature is added?

-- 
Statistics & Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com

From aragorn168b at gmail.com  Thu Feb  6 14:23:31 2014
From: aragorn168b at gmail.com (Arunkumar Srinivasan)
Date: Thu, 6 Feb 2014 14:23:31 +0100
Subject: [datatable-help] datatable roll="next" takes 150 times longer
 than findInterval
In-Reply-To: <CAP01uRnqrTCK7z7Nn=cJ=7PVduk3bx_sst8WUyxLQ60mxBofMw@mail.gmail.com>
References: <CAP01uRno0PLSLOmA=vQ8L2PU+XTM79AnP7fko2iFR9qsy_7v9A@mail.gmail.com>
 <52EE9527.10608@mdowle.plus.com> <52EF818F.8090907@mdowle.plus.com>
 <CAP01uRmAA2bYD2_JXD3-mM4KPfrW4DQSsOFSQagVA++qLoSMhg@mail.gmail.com>
 <CAAf756MF7CjfqNJ+_zzfwLTgBwQkiJhhRJXp4dOuOtZg6Ma1+w@mail.gmail.com>
 <CAAf756OmX_DEe8xXogm9LUhDnt1dR=8ckZ6e+gpteaL7ni+JZw@mail.gmail.com>
 <CAP01uRnqrTCK7z7Nn=cJ=7PVduk3bx_sst8WUyxLQ60mxBofMw@mail.gmail.com>
Message-ID: <CAAf756NGeNWtZOV8uDRus3sXMk_3O23_HfDAAH0yy4WzxfJDnw@mail.gmail.com>

In this case? Then nothing'll be different.

I'm not sure what you mean because the problem here is that this *doesn't*
require *by-without-by* as the j-operations are not necessary to be
performed *during* the join. So, we can just perform the join and then take
the "abs" once at the end, rather than calling it about 1e5+ times (the
number of groups).

So, if your question is: "apart from this question, how would an explicit
by-without-by look like?", then I guess it'd be the same as the normal
aggregation, but "by" would take a data.table as well. This has not yet
been discussed or conceptualised. But this is how I imagine it to be:

DT1[, list(...), by=DT2]

Where, DT1's key columns have to be set as usual.


On Thu, Feb 6, 2014 at 12:55 PM, Gabor Grothendieck <ggrothendieck at gmail.com
> wrote:

> On Wed, Feb 5, 2014 at 10:42 AM, Arunkumar Srinivasan
> <aragorn168b at gmail.com> wrote:
> > Seems like the "by-without-by" is what's slowing things down:
> >
> > require(data.table)
> > dtx <- data.table(x=which(X), key="x")
> > dty <- data.table(y=which(Y), key="y")
> > dtx[, x1 := x]
> > dty[, y1 := y]
> > system.time(ans <- dty[dtx, roll="nearest"][, abs(x1-y1)])
> >    user  system elapsed
> >   1.321   0.076   1.396
> > system.time(ans2 <- flodel(x,y))
> >    user  system elapsed
> >   0.936   0.044   0.977
> >
> > identical(ans, ans2) # [1] TRUE
>
> What will the code look like after the explicit by-without-by feature is
> added?
>
> --
> Statistics & Software Consulting
> GKX Group, GKX Associates Inc.
> tel: 1-877-GKX-GROUP
> email: ggrothendieck at gmail.com
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/datatable-help/attachments/20140206/2c0164a6/attachment.html>

From ggrothendieck at gmail.com  Thu Feb  6 14:45:10 2014
From: ggrothendieck at gmail.com (Gabor Grothendieck)
Date: Thu, 6 Feb 2014 08:45:10 -0500
Subject: [datatable-help] datatable roll="next" takes 150 times longer
 than findInterval
In-Reply-To: <CAAf756NGeNWtZOV8uDRus3sXMk_3O23_HfDAAH0yy4WzxfJDnw@mail.gmail.com>
References: <CAP01uRno0PLSLOmA=vQ8L2PU+XTM79AnP7fko2iFR9qsy_7v9A@mail.gmail.com>
 <52EE9527.10608@mdowle.plus.com> <52EF818F.8090907@mdowle.plus.com>
 <CAP01uRmAA2bYD2_JXD3-mM4KPfrW4DQSsOFSQagVA++qLoSMhg@mail.gmail.com>
 <CAAf756MF7CjfqNJ+_zzfwLTgBwQkiJhhRJXp4dOuOtZg6Ma1+w@mail.gmail.com>
 <CAAf756OmX_DEe8xXogm9LUhDnt1dR=8ckZ6e+gpteaL7ni+JZw@mail.gmail.com>
 <CAP01uRnqrTCK7z7Nn=cJ=7PVduk3bx_sst8WUyxLQ60mxBofMw@mail.gmail.com>
 <CAAf756NGeNWtZOV8uDRus3sXMk_3O23_HfDAAH0yy4WzxfJDnw@mail.gmail.com>
Message-ID: <CAP01uR=RMVukcMe0ywYySVdeG_qOvxn0=RZTkp_yBUwoTrnXqA@mail.gmail.com>

On Thu, Feb 6, 2014 at 8:23 AM, Arunkumar Srinivasan
<aragorn168b at gmail.com> wrote:
> In this case? Then nothing'll be different.
>
> I'm not sure what you mean because the problem here is that this *doesn't*
> require *by-without-by* as the j-operations are not necessary to be
> performed *during* the join. So, we can just perform the join and then take
> the "abs" once at the end, rather than calling it about 1e5+ times (the
> number of groups).
>
> So, if your question is: "apart from this question, how would an explicit
> by-without-by look like?", then I guess it'd be the same as the normal
> aggregation, but "by" would take a data.table as well. This has not yet been
> discussed or conceptualised. But this is how I imagine it to be:
>
> DT1[, list(...), by=DT2]
>
> Where, DT1's key columns have to be set as usual.

My original code was this:

dtx <- data.table(x = which(x))
dty <- data.table(y = which(y), key = "y")
dty[dtx, abs(x - y), roll = "nearest"]

With that feature would this code not use by-within-by and therefore
become fast?

From aragorn168b at gmail.com  Thu Feb  6 14:53:18 2014
From: aragorn168b at gmail.com (Arunkumar Srinivasan)
Date: Thu, 6 Feb 2014 14:53:18 +0100
Subject: [datatable-help] datatable roll="next" takes 150 times longer
 than findInterval
In-Reply-To: <CAP01uR=RMVukcMe0ywYySVdeG_qOvxn0=RZTkp_yBUwoTrnXqA@mail.gmail.com>
References: <CAP01uRno0PLSLOmA=vQ8L2PU+XTM79AnP7fko2iFR9qsy_7v9A@mail.gmail.com>
 <52EE9527.10608@mdowle.plus.com> <52EF818F.8090907@mdowle.plus.com>
 <CAP01uRmAA2bYD2_JXD3-mM4KPfrW4DQSsOFSQagVA++qLoSMhg@mail.gmail.com>
 <CAAf756MF7CjfqNJ+_zzfwLTgBwQkiJhhRJXp4dOuOtZg6Ma1+w@mail.gmail.com>
 <CAAf756OmX_DEe8xXogm9LUhDnt1dR=8ckZ6e+gpteaL7ni+JZw@mail.gmail.com>
 <CAP01uRnqrTCK7z7Nn=cJ=7PVduk3bx_sst8WUyxLQ60mxBofMw@mail.gmail.com>
 <CAAf756NGeNWtZOV8uDRus3sXMk_3O23_HfDAAH0yy4WzxfJDnw@mail.gmail.com>
 <CAP01uR=RMVukcMe0ywYySVdeG_qOvxn0=RZTkp_yBUwoTrnXqA@mail.gmail.com>
Message-ID: <CAAf756Njoi8=zE5gqzAdW8oQQD=edRV=7Ej1HP9suk5gsaSbfQ@mail.gmail.com>

Not really. Because it still doing a "by". Meaning, for every grouping in
"by"  - abs(x-y) will be evaluated. If there are 1e5 groups, there'll be
1e5 calls. And that can be expensive depending on the function + the time
to call eval from within C.

However, since it's not necessary to do a by-without-by, we can perform the
join and then compute once the difference between columns. There's no
grouping, no eval from C, and no multiple calls to abs. Hope this clears it
up?


On Thu, Feb 6, 2014 at 2:45 PM, Gabor Grothendieck
<ggrothendieck at gmail.com>wrote:

> On Thu, Feb 6, 2014 at 8:23 AM, Arunkumar Srinivasan
> <aragorn168b at gmail.com> wrote:
> > In this case? Then nothing'll be different.
> >
> > I'm not sure what you mean because the problem here is that this
> *doesn't*
> > require *by-without-by* as the j-operations are not necessary to be
> > performed *during* the join. So, we can just perform the join and then
> take
> > the "abs" once at the end, rather than calling it about 1e5+ times (the
> > number of groups).
> >
> > So, if your question is: "apart from this question, how would an explicit
> > by-without-by look like?", then I guess it'd be the same as the normal
> > aggregation, but "by" would take a data.table as well. This has not yet
> been
> > discussed or conceptualised. But this is how I imagine it to be:
> >
> > DT1[, list(...), by=DT2]
> >
> > Where, DT1's key columns have to be set as usual.
>
> My original code was this:
>
> dtx <- data.table(x = which(x))
> dty <- data.table(y = which(y), key = "y")
> dty[dtx, abs(x - y), roll = "nearest"]
>
> With that feature would this code not use by-within-by and therefore
> become fast?
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/datatable-help/attachments/20140206/0de1336d/attachment.html>

From ggrothendieck at gmail.com  Thu Feb  6 15:20:37 2014
From: ggrothendieck at gmail.com (Gabor Grothendieck)
Date: Thu, 6 Feb 2014 09:20:37 -0500
Subject: [datatable-help] datatable roll="next" takes 150 times longer
 than findInterval
In-Reply-To: <CAAf756Njoi8=zE5gqzAdW8oQQD=edRV=7Ej1HP9suk5gsaSbfQ@mail.gmail.com>
References: <CAP01uRno0PLSLOmA=vQ8L2PU+XTM79AnP7fko2iFR9qsy_7v9A@mail.gmail.com>
 <52EE9527.10608@mdowle.plus.com> <52EF818F.8090907@mdowle.plus.com>
 <CAP01uRmAA2bYD2_JXD3-mM4KPfrW4DQSsOFSQagVA++qLoSMhg@mail.gmail.com>
 <CAAf756MF7CjfqNJ+_zzfwLTgBwQkiJhhRJXp4dOuOtZg6Ma1+w@mail.gmail.com>
 <CAAf756OmX_DEe8xXogm9LUhDnt1dR=8ckZ6e+gpteaL7ni+JZw@mail.gmail.com>
 <CAP01uRnqrTCK7z7Nn=cJ=7PVduk3bx_sst8WUyxLQ60mxBofMw@mail.gmail.com>
 <CAAf756NGeNWtZOV8uDRus3sXMk_3O23_HfDAAH0yy4WzxfJDnw@mail.gmail.com>
 <CAP01uR=RMVukcMe0ywYySVdeG_qOvxn0=RZTkp_yBUwoTrnXqA@mail.gmail.com>
 <CAAf756Njoi8=zE5gqzAdW8oQQD=edRV=7Ej1HP9suk5gsaSbfQ@mail.gmail.com>
Message-ID: <CAP01uRktbinOAZa7XvC8K=P5RUB4Kahzf0-vTAz64YG-Gm=n0g@mail.gmail.com>

On Thu, Feb 6, 2014 at 8:53 AM, Arunkumar Srinivasan
<aragorn168b at gmail.com> wrote:
> Not really. Because it still doing a "by". Meaning, for every grouping in
> "by"  - abs(x-y) will be evaluated. If there are 1e5 groups, there'll be 1e5
> calls. And that can be expensive depending on the function + the time to
> call eval from within C.
>
> However, since it's not necessary to do a by-without-by, we can perform the
> join and then compute once the difference between columns. There's no
> grouping, no eval from C, and no multiple calls to abs. Hope this clears it
> up?
>
>

In that case what is the proposed user interface?

I thought that the idea was that one would have to explicitly specify
the by= clause for by-within-by  it to occur.  In the code I had just
posted there is a join = "nearest" but no by= clause is specified.

From aragorn168b at gmail.com  Thu Feb  6 15:58:28 2014
From: aragorn168b at gmail.com (Arunkumar Srinivasan)
Date: Thu, 6 Feb 2014 15:58:28 +0100
Subject: [datatable-help] datatable roll="next" takes 150 times longer
 than findInterval
In-Reply-To: <CAP01uRktbinOAZa7XvC8K=P5RUB4Kahzf0-vTAz64YG-Gm=n0g@mail.gmail.com>
References: <CAP01uRno0PLSLOmA=vQ8L2PU+XTM79AnP7fko2iFR9qsy_7v9A@mail.gmail.com>
 <52EE9527.10608@mdowle.plus.com> <52EF818F.8090907@mdowle.plus.com>
 <CAP01uRmAA2bYD2_JXD3-mM4KPfrW4DQSsOFSQagVA++qLoSMhg@mail.gmail.com>
 <CAAf756MF7CjfqNJ+_zzfwLTgBwQkiJhhRJXp4dOuOtZg6Ma1+w@mail.gmail.com>
 <CAAf756OmX_DEe8xXogm9LUhDnt1dR=8ckZ6e+gpteaL7ni+JZw@mail.gmail.com>
 <CAP01uRnqrTCK7z7Nn=cJ=7PVduk3bx_sst8WUyxLQ60mxBofMw@mail.gmail.com>
 <CAAf756NGeNWtZOV8uDRus3sXMk_3O23_HfDAAH0yy4WzxfJDnw@mail.gmail.com>
 <CAP01uR=RMVukcMe0ywYySVdeG_qOvxn0=RZTkp_yBUwoTrnXqA@mail.gmail.com>
 <CAAf756Njoi8=zE5gqzAdW8oQQD=edRV=7Ej1HP9suk5gsaSbfQ@mail.gmail.com>
 <CAP01uRktbinOAZa7XvC8K=P5RUB4Kahzf0-vTAz64YG-Gm=n0g@mail.gmail.com>
Message-ID: <CAAf756Nw-DqdL4WNrncb26ysPmgfGLEbLu+1LQpPksP0R3vpJw@mail.gmail.com>

Gabor,

I think now I understand what your earlier post was about. You mean after
the external by-without-by, doing DT1[DT2, ..., ] will be faster as it
shouldn't do a by-without-by. Yes, that's true. So basically, the statement:

dty[dtx, abs(x - y), roll = "nearest"]

once external by-without-by is implemented, will/should first do the join
and then do the "j' operation. And therefore it'll be as fast as the
solution I wrote. If one wants to perform the j-operation for each group,
then they'll have to do something like

DT1[, j, by=DT2] (or any other solutions we end up on)

Sorry for the misunderstanding.


On Thu, Feb 6, 2014 at 3:20 PM, Gabor Grothendieck
<ggrothendieck at gmail.com>wrote:

> On Thu, Feb 6, 2014 at 8:53 AM, Arunkumar Srinivasan
> <aragorn168b at gmail.com> wrote:
> > Not really. Because it still doing a "by". Meaning, for every grouping in
> > "by"  - abs(x-y) will be evaluated. If there are 1e5 groups, there'll be
> 1e5
> > calls. And that can be expensive depending on the function + the time to
> > call eval from within C.
> >
> > However, since it's not necessary to do a by-without-by, we can perform
> the
> > join and then compute once the difference between columns. There's no
> > grouping, no eval from C, and no multiple calls to abs. Hope this clears
> it
> > up?
> >
> >
>
> In that case what is the proposed user interface?
>
> I thought that the idea was that one would have to explicitly specify
> the by= clause for by-within-by  it to occur.  In the code I had just
> posted there is a join = "nearest" but no by= clause is specified.
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/datatable-help/attachments/20140206/a8e7b72b/attachment-0001.html>

From yikelu.home at gmail.com  Fri Feb  7 00:38:49 2014
From: yikelu.home at gmail.com (Yike Lu)
Date: Thu, 6 Feb 2014 17:38:49 -0600
Subject: [datatable-help] integer64 group by doesn't find all groups
Message-ID: <CAO=aP+fO5r+LzChLDCJAW42JVznho2rAKxY6Mw+k_4PVbOb4AQ@mail.gmail.com>

After a long hiatus, I am back to using data.table. Unfortunately, I've
encountered a problem. Am I doing something wrong here?

require(data.table)

dt = data.table(idx = 1:100 %% 3, 1:100)
dt[, list(sum(V2)), by = idx]
# normal

require(bit64)

dt2 = data.table(idx = integer64(100) + 1:100 %% 3, 1:100)
dt2[, list(sum(V2)), by = idx]
# only has one group:
#   idx   V1
#1:   1 5050
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/datatable-help/attachments/20140206/d69c699b/attachment.html>

From caneff at gmail.com  Wed Feb 12 17:01:56 2014
From: caneff at gmail.com (caneff at gmail.com)
Date: Wed, 12 Feb 2014 16:01:56 +0000
Subject: [datatable-help] Infinite numeric key doesn't collapse
Message-ID: <CAAuY0RV0W3Qia0d8iQjcCH2aCDm6Sqz-PuXcwQJ9-oAf9cYJgQ@mail.gmail.com>

I have a numeric key in a data.table that sometimes has infinite values. I
discovered today that Inf does not collapse when used in a by.  Is this
expected? It surprised me:

DT <- data.table(x=rep(c(1,Inf), each=10), y=1:20)

DT[, sum(y), by=x] # The x==1 cases collapse, but the Inf cases don't
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/datatable-help/attachments/20140212/e1a56ddc/attachment.html>

From aragorn168b at gmail.com  Wed Feb 12 17:04:07 2014
From: aragorn168b at gmail.com (Arunkumar Srinivasan)
Date: Wed, 12 Feb 2014 17:04:07 +0100
Subject: [datatable-help] Infinite numeric key doesn't collapse
In-Reply-To: <CAAuY0RV0W3Qia0d8iQjcCH2aCDm6Sqz-PuXcwQJ9-oAf9cYJgQ@mail.gmail.com>
References: <CAAuY0RV0W3Qia0d8iQjcCH2aCDm6Sqz-PuXcwQJ9-oAf9cYJgQ@mail.gmail.com>
Message-ID: <etPan.52fb9b77.580bd78f.ef94@Arunkumars-MacBook-Pro.local>

Caneff,

I'm guessing you're using 1.8.10. This has been fixed a while ago in the current devel version 1.8.11. Or you can wait until the next release (which should be very soon now).
Arun
From:?caneff at gmail.com caneff at gmail.com
Reply:?caneff at gmail.com caneff at gmail.com
Date:?February 12, 2014 at 5:02:14 PM
To:?datatable-help at lists.r-forge.r-project.org datatable-help at lists.r-forge.r-project.org
Subject:? [datatable-help] Infinite numeric key doesn't collapse  
I have a numeric key in a data.table that sometimes has infinite values. I discovered today that Inf does not collapse when used in a by. ?Is this expected? It surprised me:

DT <- data.table(x=rep(c(1,Inf), each=10), y=1:20)

DT[, sum(y), by=x] # The x==1 cases collapse, but the Inf cases don't
_______________________________________________  
datatable-help mailing list  
datatable-help at lists.r-forge.r-project.org  
https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/datatable-help/attachments/20140212/0cd8c188/attachment.html>

From caneff at gmail.com  Wed Feb 12 17:07:04 2014
From: caneff at gmail.com (caneff at gmail.com)
Date: Wed, 12 Feb 2014 16:07:04 +0000
Subject: [datatable-help] Infinite numeric key doesn't collapse
References: <CAAuY0RV0W3Qia0d8iQjcCH2aCDm6Sqz-PuXcwQJ9-oAf9cYJgQ@mail.gmail.com>
 <etPan.52fb9b77.580bd78f.ef94@Arunkumars-MacBook-Pro.local>
Message-ID: <CAAuY0RX6yv469ZPCq=-1n_LwjebMJWaMXUjkW=j4RNT=YVjeww@mail.gmail.com>

Whoops! Sorry I try to keep synced to the latest devel version, but
sometimes because of work related updates packages get overwritten back to
the latest public version.  Sorry about that.

I also found an easy workaround since the number of unique values is low, I
can make it an ordered factor.

On Wed Feb 12 2014 at 11:04:10 AM, Arunkumar Srinivasan <
aragorn168b at gmail.com> wrote:

> Caneff,
>
> I'm guessing you're using 1.8.10. This has been fixed a while ago in the
> current devel version 1.8.11. Or you can wait until the next release (which
> should be very soon now).
> Arun
> ------------------------------
> From: caneff at gmail.com caneff at gmail.com
> Reply: caneff at gmail.com caneff at gmail.com
> Date: February 12, 2014 at 5:02:14 PM
> To: datatable-help at lists.r-forge.r-project.org
> datatable-help at lists.r-forge.r-project.org
> Subject:  [datatable-help] Infinite numeric key doesn't collapse
>
> I have a numeric key in a data.table that sometimes has infinite values. I
> discovered today that Inf does not collapse when used in a by.  Is this
> expected? It surprised me:
>
> DT <- data.table(x=rep(c(1,Inf), each=10), y=1:20)
>
> DT[, sum(y), by=x] # The x==1 cases collapse, but the Inf cases don't
>
> _______________________________________________
> datatable-help mailing list
> datatable-help at lists.r-forge.r-project.org
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/datatable-help/attachments/20140212/c4e13df3/attachment.html>

From mdowle at mdowle.plus.com  Wed Feb 12 17:22:26 2014
From: mdowle at mdowle.plus.com (Matt Dowle)
Date: Wed, 12 Feb 2014 16:22:26 +0000
Subject: [datatable-help] integer64 group by doesn't find all groups
In-Reply-To: <CAO=aP+fO5r+LzChLDCJAW42JVznho2rAKxY6Mw+k_4PVbOb4AQ@mail.gmail.com>
References: <CAO=aP+fO5r+LzChLDCJAW42JVznho2rAKxY6Mw+k_4PVbOb4AQ@mail.gmail.com>
Message-ID: <52FB9FC2.4000305@mdowle.plus.com>


Hi,

You're doing nothing wrong.  Although you can load integer64 using fread 
and create them directly,  data.table's grouping and keys don't work on 
them yet.  Sorry,  just not yet implemented. Because integer64 are 
internally stored as type double  (a good idea by package bit64),  
data.table sees them internally as double and doesn't catch that the 
type isn't supported yet (hence no error message such as you get for 
type 'complex').   The particular integer64 numbers in this example are 
quite small so will use the lower bits.  In double, those are the most 
precise part of the significand, which would explain why only one group 
comes out here since data.table groups and joins floating point data 
within tolerance.

Matt

On 06/02/14 23:38, Yike Lu wrote:
> After a long hiatus, I am back to using data.table. Unfortunately, 
> I've encountered a problem. Am I doing something wrong here?
>
> require(data.table)
>
> dt = data.table(idx = 1:100 %% 3, 1:100)
> dt[, list(sum(V2)), by = idx]
> # normal
>
> require(bit64)
>
> dt2 = data.table(idx = integer64(100) + 1:100 %% 3, 1:100)
> dt2[, list(sum(V2)), by = idx]
> # only has one group:
> #   idx   V1
> #1:   1 5050
>


From caneff at gmail.com  Wed Feb 12 17:26:06 2014
From: caneff at gmail.com (caneff at gmail.com)
Date: Wed, 12 Feb 2014 16:26:06 +0000
Subject: [datatable-help] integer64 group by doesn't find all groups
References: <CAO=aP+fO5r+LzChLDCJAW42JVznho2rAKxY6Mw+k_4PVbOb4AQ@mail.gmail.com>
 <52FB9FC2.4000305@mdowle.plus.com>
Message-ID: <CAAuY0RWEqyVHmmd29hZwjPsNvt76frsJ+Jc9DFiifVNFHWutzA@mail.gmail.com>

FYI (and this is a long outstanding argument) this is why I don't like the
bit64 package.  These sorts of errors happen silently.  I understand that
data.table can't use the other integer64 package, but at least there it is
obvious when things are being coerced.

In my situations, if I am grouping by a int64, it is usually either an ID
so I can just make it a character vector instead, or it is something where
I don't mind lost precision so I just make it numeric.

On Wed Feb 12 2014 at 11:22:40 AM, Matt Dowle <mdowle at mdowle.plus.com>
wrote:

>
> Hi,
>
> You're doing nothing wrong.  Although you can load integer64 using fread
> and create them directly,  data.table's grouping and keys don't work on
> them yet.  Sorry,  just not yet implemented. Because integer64 are
> internally stored as type double  (a good idea by package bit64),
> data.table sees them internally as double and doesn't catch that the
> type isn't supported yet (hence no error message such as you get for
> type 'complex').   The particular integer64 numbers in this example are
> quite small so will use the lower bits.  In double, those are the most
> precise part of the significand, which would explain why only one group
> comes out here since data.table groups and joins floating point data
> within tolerance.
>
> Matt
>
> On 06/02/14 23:38, Yike Lu wrote:
> > After a long hiatus, I am back to using data.table. Unfortunately,
> > I've encountered a problem. Am I doing something wrong here?
> >
> > require(data.table)
> >
> > dt = data.table(idx = 1:100 %% 3, 1:100)
> > dt[, list(sum(V2)), by = idx]
> > # normal
> >
> > require(bit64)
> >
> > dt2 = data.table(idx = integer64(100) + 1:100 %% 3, 1:100)
> > dt2[, list(sum(V2)), by = idx]
> > # only has one group:
> > #   idx   V1
> > #1:   1 5050
> >
>
> _______________________________________________
> datatable-help mailing list
> datatable-help at lists.r-forge.r-project.org
> https://lists.r-forge.r-project.org/cgi-bin/mailman/
> listinfo/datatable-help
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/datatable-help/attachments/20140212/81a7f412/attachment.html>

From mdowle at mdowle.plus.com  Wed Feb 12 17:39:44 2014
From: mdowle at mdowle.plus.com (Matt Dowle)
Date: Wed, 12 Feb 2014 16:39:44 +0000
Subject: [datatable-help] integer64 group by doesn't find all groups
In-Reply-To: <CAAuY0RWEqyVHmmd29hZwjPsNvt76frsJ+Jc9DFiifVNFHWutzA@mail.gmail.com>
References: <CAO=aP+fO5r+LzChLDCJAW42JVznho2rAKxY6Mw+k_4PVbOb4AQ@mail.gmail.com>
 <52FB9FC2.4000305@mdowle.plus.com>
 <CAAuY0RWEqyVHmmd29hZwjPsNvt76frsJ+Jc9DFiifVNFHWutzA@mail.gmail.com>
Message-ID: <52FBA3D0.60109@mdowle.plus.com>


Sometimes we take the hard road in data.table, to get to a better 
place.  Once bit64::integer64 is fully supported, it'll be much 
easier.   All the recent radix work for double applies almost 
automatically to integer64 for example,  but that radix work had to be 
done first.

On 12/02/14 16:26, caneff at gmail.com wrote:
> FYI (and this is a long outstanding argument) this is why I don't like 
> the bit64 package.  These sorts of errors happen silently.  I 
> understand that data.table can't use the other integer64 package, but 
> at least there it is obvious when things are being coerced.
>
> In my situations, if I am grouping by a int64, it is usually either an 
> ID so I can just make it a character vector instead, or it is 
> something where I don't mind lost precision so I just make it numeric.
>
> On Wed Feb 12 2014 at 11:22:40 AM, Matt Dowle <mdowle at mdowle.plus.com 
> <mailto:mdowle at mdowle.plus.com>> wrote:
>
>
>     Hi,
>
>     You're doing nothing wrong.  Although you can load integer64 using
>     fread
>     and create them directly,  data.table's grouping and keys don't
>     work on
>     them yet.  Sorry,  just not yet implemented. Because integer64 are
>     internally stored as type double  (a good idea by package bit64),
>     data.table sees them internally as double and doesn't catch that the
>     type isn't supported yet (hence no error message such as you get for
>     type 'complex').   The particular integer64 numbers in this
>     example are
>     quite small so will use the lower bits.  In double, those are the most
>     precise part of the significand, which would explain why only one
>     group
>     comes out here since data.table groups and joins floating point data
>     within tolerance.
>
>     Matt
>
>     On 06/02/14 23:38, Yike Lu wrote:
>     > After a long hiatus, I am back to using data.table. Unfortunately,
>     > I've encountered a problem. Am I doing something wrong here?
>     >
>     > require(data.table)
>     >
>     > dt = data.table(idx = 1:100 %% 3, 1:100)
>     > dt[, list(sum(V2)), by = idx]
>     > # normal
>     >
>     > require(bit64)
>     >
>     > dt2 = data.table(idx = integer64(100) + 1:100 %% 3, 1:100)
>     > dt2[, list(sum(V2)), by = idx]
>     > # only has one group:
>     > #   idx   V1
>     > #1:   1 5050
>     >
>
>     _______________________________________________
>     datatable-help mailing list
>     datatable-help at lists.r-forge.r-project.org
>     <mailto:datatable-help at lists.r-forge.r-project.org>
>     https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/datatable-help/attachments/20140212/f4ac06a6/attachment-0001.html>

From caneff at gmail.com  Wed Feb 12 18:17:16 2014
From: caneff at gmail.com (caneff at gmail.com)
Date: Wed, 12 Feb 2014 17:17:16 +0000
Subject: [datatable-help] integer64 group by doesn't find all groups
References: <CAO=aP+fO5r+LzChLDCJAW42JVznho2rAKxY6Mw+k_4PVbOb4AQ@mail.gmail.com>
 <52FB9FC2.4000305@mdowle.plus.com>
 <CAAuY0RWEqyVHmmd29hZwjPsNvt76frsJ+Jc9DFiifVNFHWutzA@mail.gmail.com>
 <52FBA3D0.60109@mdowle.plus.com>
Message-ID: <CAAuY0RVqNdVo1MERu1WjDeO+=ffpo2DEuFN_TSR7zeDniEYx8A@mail.gmail.com>

Yes this isn't a data.table criticism, just a bit64 one in general.

On Wed Feb 12 2014 at 11:39:47 AM, Matt Dowle <mdowle at mdowle.plus.com>
wrote:

>
> Sometimes we take the hard road in data.table, to get to a better place.
> Once bit64::integer64 is fully supported, it'll be much easier.   All the
> recent radix work for double applies almost automatically to integer64 for
> example,  but that radix work had to be done first.
>
>
> On 12/02/14 16:26, caneff at gmail.com wrote:
>
> FYI (and this is a long outstanding argument) this is why I don't like the
> bit64 package.  These sorts of errors happen silently.  I understand that
> data.table can't use the other integer64 package, but at least there it is
> obvious when things are being coerced.
>
>  In my situations, if I am grouping by a int64, it is usually either an
> ID so I can just make it a character vector instead, or it is something
> where I don't mind lost precision so I just make it numeric.
>
> On Wed Feb 12 2014 at 11:22:40 AM, Matt Dowle <mdowle at mdowle.plus.com>
> wrote:
>
>
> Hi,
>
> You're doing nothing wrong.  Although you can load integer64 using fread
> and create them directly,  data.table's grouping and keys don't work on
> them yet.  Sorry,  just not yet implemented. Because integer64 are
> internally stored as type double  (a good idea by package bit64),
> data.table sees them internally as double and doesn't catch that the
> type isn't supported yet (hence no error message such as you get for
> type 'complex').   The particular integer64 numbers in this example are
> quite small so will use the lower bits.  In double, those are the most
> precise part of the significand, which would explain why only one group
> comes out here since data.table groups and joins floating point data
> within tolerance.
>
> Matt
>
> On 06/02/14 23:38, Yike Lu wrote:
> > After a long hiatus, I am back to using data.table. Unfortunately,
> > I've encountered a problem. Am I doing something wrong here?
> >
> > require(data.table)
> >
> > dt = data.table(idx = 1:100 %% 3, 1:100)
> > dt[, list(sum(V2)), by = idx]
> > # normal
> >
> > require(bit64)
> >
> > dt2 = data.table(idx = integer64(100) + 1:100 %% 3, 1:100)
> > dt2[, list(sum(V2)), by = idx]
> > # only has one group:
> > #   idx   V1
> > #1:   1 5050
> >
>
> _______________________________________________
> datatable-help mailing list
> datatable-help at lists.r-forge.r-project.org
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/datatable-help/attachments/20140212/54f7ac80/attachment.html>

From john.laing at gmail.com  Wed Feb 12 18:24:27 2014
From: john.laing at gmail.com (John Laing)
Date: Wed, 12 Feb 2014 12:24:27 -0500
Subject: [datatable-help] Force evaluation of first argument to [
Message-ID: <CAA3Wa=vp5PuqvdWjFVPtF+vcpnKG9awBqCUAv6y0JkdnUyC_fw@mail.gmail.com>

Let's say I merge together several data.tables such that I wind up
with lots of NAs:

require(data.table)
foo <- data.table(k=1:4, foo=TRUE, key="k")
bar <- data.table(k=3:6, bar=TRUE, key="k")
qux <- data.table(k=5:8, qux=TRUE, key="k")
fbq <- merge(merge(foo, bar, all=TRUE), qux, all=TRUE)
print(fbq)
#    k  foo  bar  qux
# 1: 1 TRUE   NA   NA
# 2: 2 TRUE   NA   NA
# 3: 3 TRUE TRUE   NA
# 4: 4 TRUE TRUE   NA
# 5: 5   NA TRUE TRUE
# 6: 6   NA TRUE TRUE
# 7: 7   NA   NA TRUE
# 8: 8   NA   NA TRUE

I want to go through those columns and turn each NA into FALSE. I can
do this by writing code for each column:

fbq.cp <- copy(fbq)
fbq.cp[is.na(foo), foo:=FALSE]
fbq.cp[is.na(bar), bar:=FALSE]
fbq.cp[is.na(qux), qux:=FALSE]
print(fbq.cp)
#    k   foo   bar   qux
# 1: 1  TRUE FALSE FALSE
# 2: 2  TRUE FALSE FALSE
# 3: 3  TRUE  TRUE FALSE
# 4: 4  TRUE  TRUE FALSE
# 5: 5 FALSE  TRUE  TRUE
# 6: 6 FALSE  TRUE  TRUE
# 7: 7 FALSE FALSE  TRUE
# 8: 8 FALSE FALSE  TRUE

But I can't figure out how to do it in a loop. More precisely, I can't
figure out how to make the [ operator evaluate its first argument in
the context of the data.table. All of these have no effect:
for (x in c("foo", "bar", "qux")) fbq[is.na(x), eval(x):=FALSE]
for (x in c("foo", "bar", "qux")) fbq[is.na(eval(x)), eval(x):=FALSE]
for (x in c("foo", "bar", "qux")) fbq[eval(is.na(x)), eval(x):=FALSE]

I'm running R 3.0.2 on Linux, data.table 1.8.10.

Thanks in advance,
John
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/datatable-help/attachments/20140212/eb2ca1f6/attachment.html>

From mdowle at mdowle.plus.com  Wed Feb 12 18:44:11 2014
From: mdowle at mdowle.plus.com (Matt Dowle)
Date: Wed, 12 Feb 2014 17:44:11 +0000
Subject: [datatable-help] Force evaluation of first argument to [
In-Reply-To: <CAA3Wa=vp5PuqvdWjFVPtF+vcpnKG9awBqCUAv6y0JkdnUyC_fw@mail.gmail.com>
References: <CAA3Wa=vp5PuqvdWjFVPtF+vcpnKG9awBqCUAv6y0JkdnUyC_fw@mail.gmail.com>
Message-ID: <52FBB2EB.2070000@mdowle.plus.com>


Hi John,

In examples like this I'd use set() and [[,  since it's a bit easier to 
write but memory efficient too.

for (x in c("foo", "bar", "qux"))   set(fbq, is.na(fbq[[x]]), x, 
FALSE)           [untested]

A downside here is one repetition of the "fbq" symbol,  but can live 
with that.  If you have a large number of columns  (and I've been 
surprised just how many columns some poeple have!) then calling set() 
many times has lower overhead than DT[, :=],  see ?set.   Note also that 
[[ is base R, doesn't copy the column and often useful to use with 
data.table.

Or, use get() in either i or j rather than eval().

HTH, Matt


On 12/02/14 17:24, John Laing wrote:
> Let's say I merge together several data.tables such that I wind up
> with lots of NAs:
>
> require(data.table)
> foo <- data.table(k=1:4, foo=TRUE, key="k")
> bar <- data.table(k=3:6, bar=TRUE, key="k")
> qux <- data.table(k=5:8, qux=TRUE, key="k")
> fbq <- merge(merge(foo, bar, all=TRUE), qux, all=TRUE)
> print(fbq)
> #    k  foo  bar  qux
> # 1: 1 TRUE   NA   NA
> # 2: 2 TRUE   NA   NA
> # 3: 3 TRUE TRUE   NA
> # 4: 4 TRUE TRUE   NA
> # 5: 5   NA TRUE TRUE
> # 6: 6   NA TRUE TRUE
> # 7: 7   NA   NA TRUE
> # 8: 8   NA   NA TRUE
>
> I want to go through those columns and turn each NA into FALSE. I can
> do this by writing code for each column:
>
> fbq.cp <- copy(fbq)
> fbq.cp[is.na <http://is.na>(foo), foo:=FALSE]
> fbq.cp[is.na <http://is.na>(bar), bar:=FALSE]
> fbq.cp[is.na <http://is.na>(qux), qux:=FALSE]
> print(fbq.cp)
> #    k   foo   bar   qux
> # 1: 1  TRUE FALSE FALSE
> # 2: 2  TRUE FALSE FALSE
> # 3: 3  TRUE  TRUE FALSE
> # 4: 4  TRUE  TRUE FALSE
> # 5: 5 FALSE  TRUE  TRUE
> # 6: 6 FALSE  TRUE  TRUE
> # 7: 7 FALSE FALSE  TRUE
> # 8: 8 FALSE FALSE  TRUE
>
> But I can't figure out how to do it in a loop. More precisely, I can't
> figure out how to make the [ operator evaluate its first argument in
> the context of the data.table. All of these have no effect:
> for (x in c("foo", "bar", "qux")) fbq[is.na <http://is.na>(x), 
> eval(x):=FALSE]
> for (x in c("foo", "bar", "qux")) fbq[is.na <http://is.na>(eval(x)), 
> eval(x):=FALSE]
> for (x in c("foo", "bar", "qux")) fbq[eval(is.na <http://is.na>(x)), 
> eval(x):=FALSE]
>
> I'm running R 3.0.2 on Linux, data.table 1.8.10.
>
> Thanks in advance,
> John
>
>
> _______________________________________________
> datatable-help mailing list
> datatable-help at lists.r-forge.r-project.org
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/datatable-help/attachments/20140212/c2260013/attachment.html>

From john.laing at gmail.com  Wed Feb 12 18:58:57 2014
From: john.laing at gmail.com (John Laing)
Date: Wed, 12 Feb 2014 12:58:57 -0500
Subject: [datatable-help] Force evaluation of first argument to [
In-Reply-To: <52FBB2EB.2070000@mdowle.plus.com>
References: <CAA3Wa=vp5PuqvdWjFVPtF+vcpnKG9awBqCUAv6y0JkdnUyC_fw@mail.gmail.com>
 <52FBB2EB.2070000@mdowle.plus.com>
Message-ID: <CAA3Wa=u2PwRW_Od4XTodtZ4uDSN7RZmOs33Uaaix8GN7OuRpag@mail.gmail.com>

Thanks, Matt! With a slight amendment that works great:
for (x in c("foo", "bar", "qux")) set(fbq, which(is.na(fbq[[x]])), x, FALSE)

Which highlights an opportunity to say that I really appreciate the
unusually helpful error messages in this package.

-John


On Wed, Feb 12, 2014 at 12:44 PM, Matt Dowle <mdowle at mdowle.plus.com> wrote:

>
> Hi John,
>
> In examples like this I'd use set() and [[,  since it's a bit easier to
> write but memory efficient too.
>
> for (x in c("foo", "bar", "qux"))   set(fbq, is.na(fbq[[x]]), x,
> FALSE)           [untested]
>
> A downside here is one repetition of the "fbq" symbol,  but can live with
> that.  If you have a large number of columns  (and I've been surprised just
> how many columns some poeple have!) then calling set() many times has lower
> overhead than DT[, :=],  see ?set.   Note also that [[ is base R, doesn't
> copy the column and often useful to use with data.table.
>
> Or, use get() in either i or j rather than eval().
>
> HTH, Matt
>
>
>
> On 12/02/14 17:24, John Laing wrote:
>
> Let's say I merge together several data.tables such that I wind up
> with lots of NAs:
>
> require(data.table)
> foo <- data.table(k=1:4, foo=TRUE, key="k")
> bar <- data.table(k=3:6, bar=TRUE, key="k")
> qux <- data.table(k=5:8, qux=TRUE, key="k")
> fbq <- merge(merge(foo, bar, all=TRUE), qux, all=TRUE)
> print(fbq)
> #    k  foo  bar  qux
> # 1: 1 TRUE   NA   NA
> # 2: 2 TRUE   NA   NA
> # 3: 3 TRUE TRUE   NA
> # 4: 4 TRUE TRUE   NA
> # 5: 5   NA TRUE TRUE
> # 6: 6   NA TRUE TRUE
> # 7: 7   NA   NA TRUE
> # 8: 8   NA   NA TRUE
>
> I want to go through those columns and turn each NA into FALSE. I can
> do this by writing code for each column:
>
> fbq.cp <- copy(fbq)
> fbq.cp[is.na(foo), foo:=FALSE]
> fbq.cp[is.na(bar), bar:=FALSE]
> fbq.cp[is.na(qux), qux:=FALSE]
> print(fbq.cp)
> #    k   foo   bar   qux
> # 1: 1  TRUE FALSE FALSE
> # 2: 2  TRUE FALSE FALSE
> # 3: 3  TRUE  TRUE FALSE
> # 4: 4  TRUE  TRUE FALSE
> # 5: 5 FALSE  TRUE  TRUE
> # 6: 6 FALSE  TRUE  TRUE
> # 7: 7 FALSE FALSE  TRUE
> # 8: 8 FALSE FALSE  TRUE
>
> But I can't figure out how to do it in a loop. More precisely, I can't
> figure out how to make the [ operator evaluate its first argument in
> the context of the data.table. All of these have no effect:
> for (x in c("foo", "bar", "qux")) fbq[is.na(x), eval(x):=FALSE]
> for (x in c("foo", "bar", "qux")) fbq[is.na(eval(x)), eval(x):=FALSE]
> for (x in c("foo", "bar", "qux")) fbq[eval(is.na(x)), eval(x):=FALSE]
>
> I'm running R 3.0.2 on Linux, data.table 1.8.10.
>
> Thanks in advance,
> John
>
>
> _______________________________________________
> datatable-help mailing listdatatable-help at lists.r-forge.r-project.orghttps://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/datatable-help/attachments/20140212/aefa3b7e/attachment-0001.html>

From mdowle at mdowle.plus.com  Wed Feb 12 20:22:04 2014
From: mdowle at mdowle.plus.com (Matt Dowle)
Date: Wed, 12 Feb 2014 19:22:04 +0000
Subject: [datatable-help] Force evaluation of first argument to [
In-Reply-To: <CAA3Wa=u2PwRW_Od4XTodtZ4uDSN7RZmOs33Uaaix8GN7OuRpag@mail.gmail.com>
References: <CAA3Wa=vp5PuqvdWjFVPtF+vcpnKG9awBqCUAv6y0JkdnUyC_fw@mail.gmail.com>
 <52FBB2EB.2070000@mdowle.plus.com>
 <CAA3Wa=u2PwRW_Od4XTodtZ4uDSN7RZmOs33Uaaix8GN7OuRpag@mail.gmail.com>
Message-ID: <52FBC9DC.2010809@mdowle.plus.com>


Ha.  Yes we certainly don't hold back from making the messages as long 
and as helpful as possible.  If the code knows, or can know what exactly 
is wrong, it's a deliberate policy to put that info right there into the 
message. data.table is written by users; i.e. we wrote it for ourselves 
doing real jobs. I think that may be the root of that.  If any messages 
could more helpful,  those suggestions are very welcome.

Matt

On 12/02/14 17:58, John Laing wrote:
> Thanks, Matt! With a slight amendment that works great:
> for (x in c("foo", "bar", "qux")) set(fbq, which(is.na 
> <http://is.na>(fbq[[x]])), x, FALSE)
>
> Which highlights an opportunity to say that I really appreciate the 
> unusually helpful error messages in this package.
>
> -John
>
>
> On Wed, Feb 12, 2014 at 12:44 PM, Matt Dowle <mdowle at mdowle.plus.com 
> <mailto:mdowle at mdowle.plus.com>> wrote:
>
>
>     Hi John,
>
>     In examples like this I'd use set() and [[,  since it's a bit
>     easier to write but memory efficient too.
>
>     for (x in c("foo", "bar", "qux"))   set(fbq, is.na
>     <http://is.na>(fbq[[x]]), x, FALSE)           [untested]
>
>     A downside here is one repetition of the "fbq" symbol, but can
>     live with that.  If you have a large number of columns  (and I've
>     been surprised just how many columns some poeple have!) then
>     calling set() many times has lower overhead than DT[, :=],  see
>     ?set.   Note also that [[ is base R, doesn't copy the column and
>     often useful to use with data.table.
>
>     Or, use get() in either i or j rather than eval().
>
>     HTH, Matt
>
>
>
>     On 12/02/14 17:24, John Laing wrote:
>>     Let's say I merge together several data.tables such that I wind up
>>     with lots of NAs:
>>
>>     require(data.table)
>>     foo <- data.table(k=1:4, foo=TRUE, key="k")
>>     bar <- data.table(k=3:6, bar=TRUE, key="k")
>>     qux <- data.table(k=5:8, qux=TRUE, key="k")
>>     fbq <- merge(merge(foo, bar, all=TRUE), qux, all=TRUE)
>>     print(fbq)
>>     #    k  foo  bar  qux
>>     # 1: 1 TRUE   NA   NA
>>     # 2: 2 TRUE   NA   NA
>>     # 3: 3 TRUE TRUE   NA
>>     # 4: 4 TRUE TRUE   NA
>>     # 5: 5   NA TRUE TRUE
>>     # 6: 6   NA TRUE TRUE
>>     # 7: 7   NA   NA TRUE
>>     # 8: 8   NA   NA TRUE
>>
>>     I want to go through those columns and turn each NA into FALSE. I can
>>     do this by writing code for each column:
>>
>>     fbq.cp <- copy(fbq)
>>     fbq.cp[is.na <http://is.na>(foo), foo:=FALSE]
>>     fbq.cp[is.na <http://is.na>(bar), bar:=FALSE]
>>     fbq.cp[is.na <http://is.na>(qux), qux:=FALSE]
>>     print(fbq.cp)
>>     #    k   foo   bar   qux
>>     # 1: 1  TRUE FALSE FALSE
>>     # 2: 2  TRUE FALSE FALSE
>>     # 3: 3  TRUE  TRUE FALSE
>>     # 4: 4  TRUE  TRUE FALSE
>>     # 5: 5 FALSE  TRUE  TRUE
>>     # 6: 6 FALSE  TRUE  TRUE
>>     # 7: 7 FALSE FALSE  TRUE
>>     # 8: 8 FALSE FALSE  TRUE
>>
>>     But I can't figure out how to do it in a loop. More precisely, I
>>     can't
>>     figure out how to make the [ operator evaluate its first argument in
>>     the context of the data.table. All of these have no effect:
>>     for (x in c("foo", "bar", "qux")) fbq[is.na <http://is.na>(x),
>>     eval(x):=FALSE]
>>     for (x in c("foo", "bar", "qux")) fbq[is.na
>>     <http://is.na>(eval(x)), eval(x):=FALSE]
>>     for (x in c("foo", "bar", "qux")) fbq[eval(is.na
>>     <http://is.na>(x)), eval(x):=FALSE]
>>
>>     I'm running R 3.0.2 on Linux, data.table 1.8.10.
>>
>>     Thanks in advance,
>>     John
>>
>>
>>     _______________________________________________
>>     datatable-help mailing list
>>     datatable-help at lists.r-forge.r-project.org  <mailto:datatable-help at lists.r-forge.r-project.org>
>>     https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
>
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/datatable-help/attachments/20140212/1164a806/attachment.html>

From caneff at gmail.com  Thu Feb 13 17:05:37 2014
From: caneff at gmail.com (caneff at gmail.com)
Date: Thu, 13 Feb 2014 16:05:37 +0000
Subject: [datatable-help] Merging strings claim that the encodings don't
	match
Message-ID: <CAAuY0RWrRdzEMCkLTkhkd6DAsWvYJcMX1-3=RD-sCRRxJ42tMw@mail.gmail.com>

I have a master DT. I aggregate it in one way, and aggregate it in another
with a common key between them.   When I  try to merge these two, it says
that the key does not have the same encoding on both sides.  If I call
Encoding() on each of the keys, they both are listed as "unknown", so from
what I can see they still look the same.

I can't create a safe to share reproducible case unfortunately, the simple
ones I've tried all work.  If you can give more advice on how to debug
maybe I can.

This is using the latest devel version. I did not have this issue i 1.8.10
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/datatable-help/attachments/20140213/08a37fa6/attachment.html>

From mel at mbacou.com  Fri Feb 14 12:52:08 2014
From: mel at mbacou.com (Bacou, Melanie)
Date: Fri, 14 Feb 2014 06:52:08 -0500
Subject: [datatable-help] Force evaluation of first argument to [
In-Reply-To: <52FBC9DC.2010809@mdowle.plus.com>
References: <CAA3Wa=vp5PuqvdWjFVPtF+vcpnKG9awBqCUAv6y0JkdnUyC_fw@mail.gmail.com>
 <52FBB2EB.2070000@mdowle.plus.com>
 <CAA3Wa=u2PwRW_Od4XTodtZ4uDSN7RZmOs33Uaaix8GN7OuRpag@mail.gmail.com>
 <52FBC9DC.2010809@mdowle.plus.com>
Message-ID: <52FE0368.6080603@mbacou.com>

Hi John, Matt,

In this case, why not simply using the standard data.table approach with 
.SD?

fbq.cp[, lapply(.SD, function(x) ifelse(is.na(x), FALSE, x)), 
.SDcols=c("foo", "bar", "qux")]

--Mel.


On 2/12/2014 2:22 PM, Matt Dowle wrote:
>
> Ha.  Yes we certainly don't hold back from making the messages as long 
> and as helpful as possible.  If the code knows, or can know what 
> exactly is wrong, it's a deliberate policy to put that info right 
> there into the message. data.table is written by users; i.e. we wrote 
> it for ourselves doing real jobs. I think that may be the root of 
> that.  If any messages could more helpful,  those suggestions are very 
> welcome.
>
> Matt
>
> On 12/02/14 17:58, John Laing wrote:
>> Thanks, Matt! With a slight amendment that works great:
>> for (x in c("foo", "bar", "qux")) set(fbq, which(is.na 
>> <http://is.na>(fbq[[x]])), x, FALSE)
>>
>> Which highlights an opportunity to say that I really appreciate the 
>> unusually helpful error messages in this package.
>>
>> -John
>>
>>
>> On Wed, Feb 12, 2014 at 12:44 PM, Matt Dowle <mdowle at mdowle.plus.com 
>> <mailto:mdowle at mdowle.plus.com>> wrote:
>>
>>
>>     Hi John,
>>
>>     In examples like this I'd use set() and [[,  since it's a bit
>>     easier to write but memory efficient too.
>>
>>     for (x in c("foo", "bar", "qux"))   set(fbq, is.na
>>     <http://is.na>(fbq[[x]]), x, FALSE)           [untested]
>>
>>     A downside here is one repetition of the "fbq" symbol,  but can
>>     live with that.  If you have a large number of columns  (and I've
>>     been surprised just how many columns some poeple have!) then
>>     calling set() many times has lower overhead than DT[, :=],  see
>>     ?set.   Note also that [[ is base R, doesn't copy the column and
>>     often useful to use with data.table.
>>
>>     Or, use get() in either i or j rather than eval().
>>
>>     HTH, Matt
>>
>>
>>
>>     On 12/02/14 17:24, John Laing wrote:
>>>     Let's say I merge together several data.tables such that I wind up
>>>     with lots of NAs:
>>>
>>>     require(data.table)
>>>     foo <- data.table(k=1:4, foo=TRUE, key="k")
>>>     bar <- data.table(k=3:6, bar=TRUE, key="k")
>>>     qux <- data.table(k=5:8, qux=TRUE, key="k")
>>>     fbq <- merge(merge(foo, bar, all=TRUE), qux, all=TRUE)
>>>     print(fbq)
>>>     #    k  foo  bar  qux
>>>     # 1: 1 TRUE   NA   NA
>>>     # 2: 2 TRUE   NA   NA
>>>     # 3: 3 TRUE TRUE   NA
>>>     # 4: 4 TRUE TRUE   NA
>>>     # 5: 5   NA TRUE TRUE
>>>     # 6: 6   NA TRUE TRUE
>>>     # 7: 7   NA   NA TRUE
>>>     # 8: 8   NA   NA TRUE
>>>
>>>     I want to go through those columns and turn each NA into FALSE.
>>>     I can
>>>     do this by writing code for each column:
>>>
>>>     fbq.cp <- copy(fbq)
>>>     fbq.cp[is.na <http://is.na>(foo), foo:=FALSE]
>>>     fbq.cp[is.na <http://is.na>(bar), bar:=FALSE]
>>>     fbq.cp[is.na <http://is.na>(qux), qux:=FALSE]
>>>     print(fbq.cp)
>>>     #    k   foo   bar   qux
>>>     # 1: 1  TRUE FALSE FALSE
>>>     # 2: 2  TRUE FALSE FALSE
>>>     # 3: 3  TRUE  TRUE FALSE
>>>     # 4: 4  TRUE  TRUE FALSE
>>>     # 5: 5 FALSE  TRUE  TRUE
>>>     # 6: 6 FALSE  TRUE  TRUE
>>>     # 7: 7 FALSE FALSE  TRUE
>>>     # 8: 8 FALSE FALSE  TRUE
>>>
>>>     But I can't figure out how to do it in a loop. More precisely, I
>>>     can't
>>>     figure out how to make the [ operator evaluate its first argument in
>>>     the context of the data.table. All of these have no effect:
>>>     for (x in c("foo", "bar", "qux")) fbq[is.na <http://is.na>(x),
>>>     eval(x):=FALSE]
>>>     for (x in c("foo", "bar", "qux")) fbq[is.na
>>>     <http://is.na>(eval(x)), eval(x):=FALSE]
>>>     for (x in c("foo", "bar", "qux")) fbq[eval(is.na
>>>     <http://is.na>(x)), eval(x):=FALSE]
>>>
>>>     I'm running R 3.0.2 on Linux, data.table 1.8.10.
>>>
>>>     Thanks in advance,
>>>     John
>>>
>>>
>>>     _______________________________________________
>>>     datatable-help mailing list
>>>     datatable-help at lists.r-forge.r-project.org  <mailto:datatable-help at lists.r-forge.r-project.org>
>>>     https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
>>
>>
>
>
>
> _______________________________________________
> datatable-help mailing list
> datatable-help at lists.r-forge.r-project.org
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help

-- 
Melanie BACOU
International Food Policy Research Institute
Agricultural Economist, HarvestChoice
Work +1(202)862-5699
E-mail mel at mbacou.com
Visit harvestchoice.org

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/datatable-help/attachments/20140214/0e2b1d62/attachment.html>

From aragorn168b at gmail.com  Fri Feb 14 13:07:58 2014
From: aragorn168b at gmail.com (Arunkumar Srinivasan)
Date: Fri, 14 Feb 2014 13:07:58 +0100
Subject: [datatable-help] Force evaluation of first argument to [
In-Reply-To: <52FE0368.6080603@mbacou.com>
References: <CAA3Wa=vp5PuqvdWjFVPtF+vcpnKG9awBqCUAv6y0JkdnUyC_fw@mail.gmail.com>
 <52FBB2EB.2070000@mdowle.plus.com>
 <CAA3Wa=u2PwRW_Od4XTodtZ4uDSN7RZmOs33Uaaix8GN7OuRpag@mail.gmail.com>
 <52FBC9DC.2010809@mdowle.plus.com> <52FE0368.6080603@mbacou.com>
Message-ID: <etPan.52fe071e.1dbabf00.ef94@Arunkumars-MacBook-Pro.local>

Melanie,
`set` modifies by reference. Yours'll make a copy.?
Arun
From:?Bacou, Melanie Bacou, Melanie
Reply:?Bacou, Melanie mel at mbacou.com
Date:?February 14, 2014 at 12:52:56 PM
To:?Matt Dowle mdowle at mdowle.plus.com, John Laing john.laing at gmail.com
Subject:? Re: [datatable-help] Force evaluation of first argument to [  
Hi John, Matt,

In this case, why not simply using the standard data.table approach with .SD?

fbq.cp[, lapply(.SD, function(x) ifelse(is.na(x), FALSE, x)), .SDcols=c("foo", "bar", "qux")]

--Mel.


On 2/12/2014 2:22 PM, Matt Dowle wrote:

Ha.? Yes we certainly don't hold back from making the messages as long and as helpful as possible.? If the code knows, or can know what exactly is wrong, it's a deliberate policy to put that info right there into the message. data.table is written by users; i.e. we wrote it for ourselves doing real jobs. I think that may be the root of that.? If any messages could more helpful,? those suggestions are very welcome.

Matt

On 12/02/14 17:58, John Laing wrote:
Thanks, Matt! With a slight amendment that works great:
for (x in c("foo", "bar", "qux")) set(fbq, which(is.na(fbq[[x]])), x, FALSE)

Which highlights an opportunity to say that I really appreciate the unusually helpful error messages in this package.

-John


On Wed, Feb 12, 2014 at 12:44 PM, Matt Dowle <mdowle at mdowle.plus.com> wrote:

Hi John,

In examples like this I'd use set() and [[,? since it's a bit easier to write but memory efficient too.

for (x in c("foo", "bar", "qux"))?? set(fbq, is.na(fbq[[x]]), x, FALSE)?????????? [untested]

A downside here is one repetition of the "fbq" symbol,? but can live with that.? If you have a large number of columns? (and I've been surprised just how many columns some poeple have!) then calling set() many times has lower overhead than DT[, :=],? see ?set.?? Note also that [[ is base R, doesn't copy the column and often useful to use with data.table.

Or, use get() in either i or j rather than eval().

HTH, Matt


On 12/02/14 17:24, John Laing wrote:
Let's say I merge together several data.tables such that I wind up
with lots of NAs:

require(data.table)
foo <- data.table(k=1:4, foo=TRUE, key="k")
bar <- data.table(k=3:6, bar=TRUE, key="k")
qux <- data.table(k=5:8, qux=TRUE, key="k")
fbq <- merge(merge(foo, bar, all=TRUE), qux, all=TRUE)
print(fbq)
# ? ?k ?foo ?bar ?qux
# 1: 1 TRUE ? NA ? NA
# 2: 2 TRUE ? NA ? NA
# 3: 3 TRUE TRUE ? NA
# 4: 4 TRUE TRUE ? NA
# 5: 5 ? NA TRUE TRUE
# 6: 6 ? NA TRUE TRUE
# 7: 7 ? NA ? NA TRUE
# 8: 8 ? NA ? NA TRUE

I want to go through those columns and turn each NA into FALSE. I can
do this by writing code for each column:

fbq.cp <- copy(fbq)
fbq.cp[is.na(foo), foo:=FALSE]
fbq.cp[is.na(bar), bar:=FALSE]
fbq.cp[is.na(qux), qux:=FALSE]
print(fbq.cp)
# ? ?k ? foo ? bar ? qux
# 1: 1 ?TRUE FALSE FALSE
# 2: 2 ?TRUE FALSE FALSE
# 3: 3 ?TRUE ?TRUE FALSE
# 4: 4 ?TRUE ?TRUE FALSE
# 5: 5 FALSE ?TRUE ?TRUE
# 6: 6 FALSE ?TRUE ?TRUE
# 7: 7 FALSE FALSE ?TRUE
# 8: 8 FALSE FALSE ?TRUE

But I can't figure out how to do it in a loop. More precisely, I can't
figure out how to make the [ operator evaluate its first argument in
the context of the data.table. All of these have no effect:
for (x in c("foo", "bar", "qux")) fbq[is.na(x), eval(x):=FALSE]
for (x in c("foo", "bar", "qux")) fbq[is.na(eval(x)), eval(x):=FALSE]
for (x in c("foo", "bar", "qux")) fbq[eval(is.na(x)), eval(x):=FALSE]

I'm running R 3.0.2 on Linux, data.table 1.8.10.

Thanks in advance,
John


_______________________________________________
datatable-help mailing list
datatable-help at lists.r-forge.r-project.org
https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help


_______________________________________________
datatable-help mailing list
datatable-help at lists.r-forge.r-project.org
https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help

--  
Melanie BACOU
International Food Policy Research Institute
Agricultural Economist, HarvestChoice
Work +1(202)862-5699
E-mail mel at mbacou.com
Visit harvestchoice.org  
_______________________________________________  
datatable-help mailing list  
datatable-help at lists.r-forge.r-project.org  
https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/datatable-help/attachments/20140214/e860cd17/attachment-0001.html>

From mel at mbacou.com  Fri Feb 14 13:47:16 2014
From: mel at mbacou.com (Bacou, Melanie)
Date: Fri, 14 Feb 2014 07:47:16 -0500
Subject: [datatable-help] data.table and sp classes - any best practices?
Message-ID: <52FE1054.1020406@mbacou.com>

I often use data.table in combination with large spatial objects 
(SpatialPolygonsDataFrame, SpatialPixelsDataFrame, etc.), but I am 
always worried about using setkey()  on a @data slot thinking that I 
might mess up the link between the data attributes and the spatial 
features (polygons, points, pixels).

I am hoping some of you might be able to clarify how best to manipulate 
data attributes inside a spatial object using data.table without running 
into potential errors.

Here is a typical use case:

# Load a sample SpatialPolygonsDataFrame from GADM
load(url("http://biogeo.ucdavis.edu/data/gadm2/R/ETH_adm3.RData"))

# My understanding is the data.frame row names should always match the 
polygon ID slots
gadm.rn <- row.names(gadm)
gadm.rn[1:5]
# [1] "1" "2" "3" "4" "5"

pid <- lapply(gadm at polygons, slot, "ID")
pid[1:5]
# [[1]]
# [1] "1"
#
# [[2]]
# [1] "2"
#
# [[3]]
# [1] "3"
#
# [[4]]
# [1] "4"
#
# [[5]]
# [1] "5"


# Let's say I need to merge external data into gadm at data using setkey()
# Here is my approach
gadm at data <- data.table(gadm at data)
row.names(gadm at data)[1:5]
# [1] "1" "2" "3" "4" "5"
# Til now row names are preserved, good.

# Let's create an explicit `rn` column to keep the initial `gadm` row names
gadm at data[, rn := gadm.rn]

# Check the ordering of the first data column
gadm at data[, PID][1:5]
# [1] 30825 30826 30827 30828 30829

# Now index gadm at data by another column
setkey(gadm at data, NAME_3)

# Verify that the row order has changed
gadm at data[, PID][1:5]
# [1] 30859 31100 31101 31145 31016

# What about row names?
row.names(gadm at data)[1:5]
# [1] "1" "2" "3" "4" "5"
# Row names are not preserved, does that mean attributes are now associated
# with the wrong polygons?

# Let's try to fix that
setkey(gadm at data, rn)
gadm at data <- gadm at data[gadm.rn]
gadm at data[, PID][1:5]
# [1] 30825 30826 30827 30828 30829
# I'm now back to the original row order, note that row names are still 
unchanged
row.names(gadm at data)[1:5]
# [1] "1" "2" "3" "4" "5"
# I assume my spatial object is now correct

I don't know whether this approach makes sense at all, or if I should 
stay away from using data.table inside sp: classes?

I much appreciate any suggestion.
Thanks, --Mel.

-- 
Melanie BACOU
International Food Policy Research Institute
Agricultural Economist, HarvestChoice
Work +1(202)862-5699
E-mail mel at mbacou.com
Visit harvestchoice.org


From mel at mbacou.com  Fri Feb 14 13:59:05 2014
From: mel at mbacou.com (Bacou, Melanie)
Date: Fri, 14 Feb 2014 07:59:05 -0500
Subject: [datatable-help] Force evaluation of first argument to [
In-Reply-To: <etPan.52fe071e.1dbabf00.ef94@Arunkumars-MacBook-Pro.local>
References: <CAA3Wa=vp5PuqvdWjFVPtF+vcpnKG9awBqCUAv6y0JkdnUyC_fw@mail.gmail.com>
 <52FBB2EB.2070000@mdowle.plus.com>
 <CAA3Wa=u2PwRW_Od4XTodtZ4uDSN7RZmOs33Uaaix8GN7OuRpag@mail.gmail.com>
 <52FBC9DC.2010809@mdowle.plus.com> <52FE0368.6080603@mbacou.com>
 <etPan.52fe071e.1dbabf00.ef94@Arunkumars-MacBook-Pro.local>
Message-ID: <52FE1319.7040504@mbacou.com>

Arun, thanks for the clarification -- I see I didn't read that thread fully.
--Mel.

On 2/14/2014 7:07 AM, Arunkumar Srinivasan wrote:
> Melanie,
> `set` modifies by reference. Yours'll make a copy.
> Arun
> ------------------------------------------------------------------------
> From: Bacou, Melanie Bacou, Melanie <mailto:mel at mbacou.com>
> Reply: Bacou, Melanie mel at mbacou.com <mailto:mel at mbacou.com>
> Date: February 14, 2014 at 12:52:56 PM
> To: Matt Dowle mdowle at mdowle.plus.com <mailto:mdowle at mdowle.plus.com>, 
> John Laing john.laing at gmail.com <mailto:john.laing at gmail.com>
> Subject: Re: [datatable-help] Force evaluation of first argument to [
>> Hi John, Matt,
>>
>> In this case, why not simply using the standard data.table approach 
>> with .SD?
>>
>> fbq.cp[, lapply(.SD, function(x) ifelse(is.na(x), FALSE, x)), 
>> .SDcols=c("foo", "bar", "qux")]
>>
>> --Mel.
>>
>>
>> On 2/12/2014 2:22 PM, Matt Dowle wrote:
>>>
>>> Ha.  Yes we certainly don't hold back from making the messages as 
>>> long and as helpful as possible.  If the code knows, or can know 
>>> what exactly is wrong, it's a deliberate policy to put that info 
>>> right there into the message. data.table is written by users; i.e. 
>>> we wrote it for ourselves doing real jobs. I think that may be the 
>>> root of that.  If any messages could more helpful, those suggestions 
>>> are very welcome.
>>>
>>> Matt
>>>
>>> On 12/02/14 17:58, John Laing wrote:
>>>> Thanks, Matt! With a slight amendment that works great:
>>>> for (x in c("foo", "bar", "qux")) set(fbq, which(is.na 
>>>> <http://is.na>(fbq[[x]])), x, FALSE)
>>>>
>>>> Which highlights an opportunity to say that I really appreciate the 
>>>> unusually helpful error messages in this package.
>>>>
>>>> -John
>>>>
>>>>
>>>> On Wed, Feb 12, 2014 at 12:44 PM, Matt Dowle 
>>>> <mdowle at mdowle.plus.com <mailto:mdowle at mdowle.plus.com>> wrote:
>>>>
>>>>
>>>>     Hi John,
>>>>
>>>>     In examples like this I'd use set() and [[, since it's a bit
>>>>     easier to write but memory efficient too.
>>>>
>>>>     for (x in c("foo", "bar", "qux"))   set(fbq, is.na
>>>>     <http://is.na>(fbq[[x]]), x, FALSE) [untested]
>>>>
>>>>     A downside here is one repetition of the "fbq" symbol,  but can
>>>>     live with that.  If you have a large number of columns  (and
>>>>     I've been surprised just how many columns some poeple have!)
>>>>     then calling set() many times has lower overhead than DT[,
>>>>     :=],  see ?set.   Note also that [[ is base R, doesn't copy the
>>>>     column and often useful to use with data.table.
>>>>
>>>>     Or, use get() in either i or j rather than eval().
>>>>
>>>>     HTH, Matt
>>>>
>>>>
>>>>
>>>>     On 12/02/14 17:24, John Laing wrote:
>>>>>     Let's say I merge together several data.tables such that I wind up
>>>>>     with lots of NAs:
>>>>>
>>>>>     require(data.table)
>>>>>     foo <- data.table(k=1:4, foo=TRUE, key="k")
>>>>>     bar <- data.table(k=3:6, bar=TRUE, key="k")
>>>>>     qux <- data.table(k=5:8, qux=TRUE, key="k")
>>>>>     fbq <- merge(merge(foo, bar, all=TRUE), qux, all=TRUE)
>>>>>     print(fbq)
>>>>>     #    k  foo  bar  qux
>>>>>     # 1: 1 TRUE   NA   NA
>>>>>     # 2: 2 TRUE   NA   NA
>>>>>     # 3: 3 TRUE TRUE   NA
>>>>>     # 4: 4 TRUE TRUE   NA
>>>>>     # 5: 5   NA TRUE TRUE
>>>>>     # 6: 6   NA TRUE TRUE
>>>>>     # 7: 7   NA   NA TRUE
>>>>>     # 8: 8   NA   NA TRUE
>>>>>
>>>>>     I want to go through those columns and turn each NA into
>>>>>     FALSE. I can
>>>>>     do this by writing code for each column:
>>>>>
>>>>>     fbq.cp <- copy(fbq)
>>>>>     fbq.cp[is.na <http://is.na>(foo), foo:=FALSE]
>>>>>     fbq.cp[is.na <http://is.na>(bar), bar:=FALSE]
>>>>>     fbq.cp[is.na <http://is.na>(qux), qux:=FALSE]
>>>>>     print(fbq.cp)
>>>>>     #    k   foo   bar   qux
>>>>>     # 1: 1  TRUE FALSE FALSE
>>>>>     # 2: 2  TRUE FALSE FALSE
>>>>>     # 3: 3  TRUE  TRUE FALSE
>>>>>     # 4: 4  TRUE  TRUE FALSE
>>>>>     # 5: 5 FALSE  TRUE  TRUE
>>>>>     # 6: 6 FALSE  TRUE  TRUE
>>>>>     # 7: 7 FALSE FALSE  TRUE
>>>>>     # 8: 8 FALSE FALSE  TRUE
>>>>>
>>>>>     But I can't figure out how to do it in a loop. More precisely,
>>>>>     I can't
>>>>>     figure out how to make the [ operator evaluate its first
>>>>>     argument in
>>>>>     the context of the data.table. All of these have no effect:
>>>>>     for (x in c("foo", "bar", "qux")) fbq[is.na <http://is.na>(x),
>>>>>     eval(x):=FALSE]
>>>>>     for (x in c("foo", "bar", "qux")) fbq[is.na
>>>>>     <http://is.na>(eval(x)), eval(x):=FALSE]
>>>>>     for (x in c("foo", "bar", "qux")) fbq[eval(is.na
>>>>>     <http://is.na>(x)), eval(x):=FALSE]
>>>>>
>>>>>     I'm running R 3.0.2 on Linux, data.table 1.8.10.
>>>>>
>>>>>     Thanks in advance,
>>>>>     John
>>>>>
>>>>>
>>>>>     _______________________________________________
>>>>>     datatable-help mailing list
>>>>>     datatable-help at lists.r-forge.r-project.org  <mailto:datatable-help at lists.r-forge.r-project.org>
>>>>>     https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
>>>>
>>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> datatable-help mailing list
>>> datatable-help at lists.r-forge.r-project.org
>>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
>>
>> --
>> Melanie BACOU
>> International Food Policy Research Institute
>> Agricultural Economist, HarvestChoice
>> Work +1(202)862-5699
>> E-mailmel at mbacou.com
>> Visit harvestchoice.org
>> _______________________________________________
>> datatable-help mailing list
>> datatable-help at lists.r-forge.r-project.org
>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help

-- 
Melanie BACOU
International Food Policy Research Institute
Agricultural Economist, HarvestChoice
Work +1(202)862-5699
E-mail mel at mbacou.com
Visit harvestchoice.org

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/datatable-help/attachments/20140214/7402fa22/attachment.html>

From yikelu.home at gmail.com  Fri Feb 14 16:07:30 2014
From: yikelu.home at gmail.com (Yike Lu)
Date: Fri, 14 Feb 2014 09:07:30 -0600
Subject: [datatable-help] integer64 group by doesn't find all groups
In-Reply-To: <CAAuY0RVqNdVo1MERu1WjDeO+=ffpo2DEuFN_TSR7zeDniEYx8A@mail.gmail.com>
References: <CAO=aP+fO5r+LzChLDCJAW42JVznho2rAKxY6Mw+k_4PVbOb4AQ@mail.gmail.com>
 <52FB9FC2.4000305@mdowle.plus.com>
 <CAAuY0RWEqyVHmmd29hZwjPsNvt76frsJ+Jc9DFiifVNFHWutzA@mail.gmail.com>
 <52FBA3D0.60109@mdowle.plus.com>
 <CAAuY0RVqNdVo1MERu1WjDeO+=ffpo2DEuFN_TSR7zeDniEYx8A@mail.gmail.com>
Message-ID: <CAO=aP+dZmtC5d1c62dTeSfDRSn0Fb_XWstpRA6kA5uTd=eaS5w@mail.gmail.com>

Thanks for the info guys! Wondering if there's any way I can help?


On Wed, Feb 12, 2014 at 11:17 AM, caneff at gmail.com <caneff at gmail.com> wrote:

> Yes this isn't a data.table criticism, just a bit64 one in general.
>
>
> On Wed Feb 12 2014 at 11:39:47 AM, Matt Dowle <mdowle at mdowle.plus.com>
> wrote:
>
>>
>> Sometimes we take the hard road in data.table, to get to a better place.
>> Once bit64::integer64 is fully supported, it'll be much easier.   All the
>> recent radix work for double applies almost automatically to integer64 for
>> example,  but that radix work had to be done first.
>>
>>
>> On 12/02/14 16:26, caneff at gmail.com wrote:
>>
>> FYI (and this is a long outstanding argument) this is why I don't like
>> the bit64 package.  These sorts of errors happen silently.  I understand
>> that data.table can't use the other integer64 package, but at least there
>> it is obvious when things are being coerced.
>>
>>  In my situations, if I am grouping by a int64, it is usually either an
>> ID so I can just make it a character vector instead, or it is something
>> where I don't mind lost precision so I just make it numeric.
>>
>> On Wed Feb 12 2014 at 11:22:40 AM, Matt Dowle <mdowle at mdowle.plus.com>
>> wrote:
>>
>>
>> Hi,
>>
>> You're doing nothing wrong.  Although you can load integer64 using fread
>> and create them directly,  data.table's grouping and keys don't work on
>> them yet.  Sorry,  just not yet implemented. Because integer64 are
>> internally stored as type double  (a good idea by package bit64),
>> data.table sees them internally as double and doesn't catch that the
>> type isn't supported yet (hence no error message such as you get for
>> type 'complex').   The particular integer64 numbers in this example are
>> quite small so will use the lower bits.  In double, those are the most
>> precise part of the significand, which would explain why only one group
>> comes out here since data.table groups and joins floating point data
>> within tolerance.
>>
>> Matt
>>
>> On 06/02/14 23:38, Yike Lu wrote:
>> > After a long hiatus, I am back to using data.table. Unfortunately,
>> > I've encountered a problem. Am I doing something wrong here?
>> >
>> > require(data.table)
>> >
>> > dt = data.table(idx = 1:100 %% 3, 1:100)
>> > dt[, list(sum(V2)), by = idx]
>> > # normal
>> >
>> > require(bit64)
>> >
>> > dt2 = data.table(idx = integer64(100) + 1:100 %% 3, 1:100)
>> > dt2[, list(sum(V2)), by = idx]
>> > # only has one group:
>> > #   idx   V1
>> > #1:   1 5050
>> >
>>
>> _______________________________________________
>> datatable-help mailing list
>> datatable-help at lists.r-forge.r-project.org
>>
>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
>>
>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/datatable-help/attachments/20140214/ad8cbafe/attachment-0001.html>

From mhawkes at gcmlp.com  Fri Feb 14 22:34:03 2014
From: mhawkes at gcmlp.com (Malcolm Hawkes)
Date: Fri, 14 Feb 2014 21:34:03 +0000
Subject: [datatable-help] CJ and setkey sort differently
Message-ID: <002E5054D2B84346B6F551575E3A8CA3030A7BE8@DC-GCM-MB-02.gcmlp.com>

Ran in to the warning

Warning in setkeyv(x, cols, verbose = verbose) :
  Already keyed by this key but had invalid row order, key rebuilt. If you didn't go under the hood please let datatable-help know so the root cause can be fixed.

You can reproduce it with
vec1 <- c("CMDTY", "Copper", "CORPOAS")
vec2 <-  1:3
dt <- CJ(vec1, Date)
setkey(dt, V1, V2)

Issue seems to be that CJ (..., sorted = TRUE) and setkey want to sort the character data in different orders, one case-sensitive, one not.

CJ creates

     V1 V2
1: Corp  1
2: Corp  2
3: Corp  3
4: CORP  1
5: CORP  2
6: CORP  3

And it's keyed as you would expect by V1 then V2
> key(dt)
[1] "V1" "V2"


But after doing setkey you have

     V1 V2
1: CORP  1
2: CORP  2
3: CORP  3
4: Corp  1
5: Corp  2
6: Corp  3


data.table version 1.8.10

> R.version
               _
platform       x86_64-w64-mingw32
arch           x86_64
os             mingw32
system         x86_64, mingw32
status
major          3
minor          0.2
year           2013
month          09
day            25
svn rev        63987
language       R
version.string R version 3.0.2 (2013-09-25)
nickname       Frisbee Sailing
>


Malcolm Hawkes
On-Site Consultant, Investments - RiskManagement
Grosvenor Capital Management, L.P.
900 N. Michigan Avenue, Suite 1100
Chicago, IL  60611
mhawkes at gcmlp.com


---

Disclosure and Statement of Confidentiality

Grosvenor Securities LLC, Member FINRA, Serves as Placement Agent or Distributor for Certain Investment Products Managed/Advised by GCM Grosvenor-Affiliated Entities.

The contents of this e-mail message and its attachments (if any) may be proprietary and/or confidential and are intended solely for the addressee(s) hereof. In addition, this e-mail message and its attachments (if any) may be subject to non-disclosure or confidentiality agreements or applicable legal privileges, including privileges protecting communications between attorneys or solicitors and their clients or the work product of attorneys and solicitors. If you are not the named addressee, or if this e-mail message has been addressed to you in error, please do not read, disclose, reproduce, distribute, disseminate or otherwise use this message or any of its attachments. Delivery of this e-mail message to any person other than the intended recipient(s) is not intended in any way to waive privilege or confidentiality. If you have received this e-mail message in error, please alert the sender by reply e-mail; we also request that you immediately delete this e-mail message and its attachments (if any). Grosvenor Capital Management, L.P., GCM Customized Fund Investment Group, L.P. and their affiliated entities (collectively, "GCM Grosvenor") reserve the right to monitor all e-mail communications through their networks. GCM Grosvenor gives no assurances that this e-mail message and its attachments (if any) are free of viruses and other harmful code.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/datatable-help/attachments/20140214/5863f28d/attachment.html>

From aragorn168b at gmail.com  Fri Feb 14 22:38:16 2014
From: aragorn168b at gmail.com (Arunkumar Srinivasan)
Date: Fri, 14 Feb 2014 22:38:16 +0100
Subject: [datatable-help] CJ and setkey sort differently
In-Reply-To: <002E5054D2B84346B6F551575E3A8CA3030A7BE8@DC-GCM-MB-02.gcmlp.com>
References: <002E5054D2B84346B6F551575E3A8CA3030A7BE8@DC-GCM-MB-02.gcmlp.com>
Message-ID: <etPan.52fe8cc8.5db70ae5.ef94@Arunkumars-MacBook-Pro.local>

Malcolm,

Thanks for the nice report. I suppose your `dt` creation should be: `dt <- CJ(vec1, vec2)`. The reason is pretty clear. It's an easy fix. Could you please file a bug report? Thank you.

Arun
From:?Malcolm Hawkes Malcolm Hawkes
Reply:?Malcolm Hawkes mhawkes at gcmlp.com
Date:?February 14, 2014 at 10:34:21 PM
To:?datatable-help at lists.r-forge.r-project.org datatable-help at lists.r-forge.r-project.org
Subject:? [datatable-help] CJ and setkey sort differently  
Ran in to the warning
?
Warning in setkeyv(x, cols, verbose = verbose) :
? Already keyed by this key but had invalid row order, key rebuilt. If you didn't go under the hood please let datatable-help know so the root cause can be fixed.
?
You can reproduce it with
vec1 <- c("CMDTY", "Copper", "CORPOAS")
vec2 <-? 1:3
dt <- CJ(vec1, Date)
setkey(dt, V1, V2)
?
Issue seems to be that CJ (..., sorted = TRUE) and setkey want to sort the character data in different orders, one case-sensitive, one not.
?
CJ creates
?
???? V1 V2
1: Corp? 1
2: Corp? 2
3: Corp? 3
4: CORP? 1
5: CORP? 2
6: CORP? 3
?
And it?s keyed as you would expect by V1 then V2
> key(dt)
[1] "V1" "V2"
?
?
But after doing setkey you have
?
???? V1 V2
1: CORP? 1
2: CORP? 2
3: CORP? 3
4: Corp? 1
5: Corp? 2
6: Corp? 3
?
?
?
data.table version 1.8.10
?
> R.version
?????????????? _??????????????????????????
platform?????? x86_64-w64-mingw32?????????
arch?????????? x86_64?????????????????????
os???????????? mingw32????????????????????
system???????? x86_64, mingw32????????????
status????????????????????????????????????
major????????? 3??????????????????????????
minor????????? 0.2????????????????????????
year?????????? 2013???????????????????????
month????????? 09?????????????????????????
day??????????? 25?????????????????????????
svn rev??????? 63987??????????????????????
language?????? R??????????????????????????
version.string R version 3.0.2 (2013-09-25)
nickname?????? Frisbee Sailing????????????
>
?
?
Malcolm Hawkes
On-Site Consultant, Investments - RiskManagement
Grosvenor Capital Management, L.P.
900 N. Michigan Avenue, Suite 1100
Chicago, IL? 60611
mhawkes at gcmlp.com
?
?
?
---

Disclosure and Statement of Confidentiality
?

Grosvenor Securities LLC, Member FINRA, Serves as Placement Agent or Distributor for Certain Investment Products Managed/Advised by GCM Grosvenor-Affiliated Entities.
?

The contents of this e-mail message and its attachments (if any) may be proprietary and/or confidential and are intended solely for the addressee(s) hereof. In addition, this e-mail message and its attachments (if any) may be subject to non-disclosure or confidentiality agreements or applicable legal privileges, including privileges protecting communications between attorneys or solicitors and their clients or the work product of attorneys and solicitors. If you are not the named addressee, or if this e-mail message has been addressed to you in error, please do not read, disclose, reproduce, distribute, disseminate or otherwise use this message or any of its attachments. Delivery of this e-mail message to any person other than the intended recipient(s) is not intended in any way to waive privilege or confidentiality. If you have received this e-mail message in error, please alert the sender by reply e-mail; we also request that you immediately delete this e-mail message and its attachments (if any). Grosvenor Capital Management, L.P., GCM Customized Fund Investment Group, L.P. and their affiliated entities (collectively, ?GCM Grosvenor?) reserve the right to monitor all e-mail communications through their networks. GCM Grosvenor gives no assurances that this e-mail message and its attachments (if any) are free of viruses and other harmful code.
_______________________________________________  
datatable-help mailing list  
datatable-help at lists.r-forge.r-project.org  
https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/datatable-help/attachments/20140214/f6e4304d/attachment-0001.html>

From mhawkes at gcmlp.com  Fri Feb 14 22:42:00 2014
From: mhawkes at gcmlp.com (Malcolm Hawkes)
Date: Fri, 14 Feb 2014 21:42:00 +0000
Subject: [datatable-help] CJ and setkey sort differently
In-Reply-To: <etPan.52fe8cc8.5db70ae5.ef94@Arunkumars-MacBook-Pro.local>
References: <002E5054D2B84346B6F551575E3A8CA3030A7BE8@DC-GCM-MB-02.gcmlp.com>
 <etPan.52fe8cc8.5db70ae5.ef94@Arunkumars-MacBook-Pro.local>
Message-ID: <002E5054D2B84346B6F551575E3A8CA3030A7C06@DC-GCM-MB-02.gcmlp.com>

Arun

Oops, yes it should.  And vec1 <- c("Corp", "CORP")

Took me while to track down, what was causing but got it in the end ?

Where / how do I file a bug report ?

Thanks

Malcolm

Malcolm Hawkes
On-Site Consultant, Investments - RiskManagement
Grosvenor Capital Management, L.P.
900 N. Michigan Avenue, Suite 1100
Chicago, IL  60611
mhawkes at gcmlp.com

From: Arunkumar Srinivasan [mailto:aragorn168b at gmail.com]
Sent: Friday, February 14, 2014 3:38 PM
To: datatable-help at lists.r-forge.r-project.org; Malcolm Hawkes
Subject: Re: [datatable-help] CJ and setkey sort differently

Malcolm,

Thanks for the nice report. I suppose your `dt` creation should be: `dt <- CJ(vec1, vec2)`. The reason is pretty clear. It's an easy fix. Could you please file a bug report? Thank you.

Arun
________________________________
From: Malcolm Hawkes Malcolm Hawkes<mailto:mhawkes at gcmlp.com>
Reply: Malcolm Hawkes mhawkes at gcmlp.com<mailto:mhawkes at gcmlp.com>
Date: February 14, 2014 at 10:34:21 PM
To: datatable-help at lists.r-forge.r-project.org<mailto:datatable-help at lists.r-forge.r-project.org> datatable-help at lists.r-forge.r-project.org<mailto:datatable-help at lists.r-forge.r-project.org>
Subject:  [datatable-help] CJ and setkey sort differently
Ran in to the warning

Warning in setkeyv(x, cols, verbose = verbose) :
  Already keyed by this key but had invalid row order, key rebuilt. If you didn't go under the hood please let datatable-help know so the root cause can be fixed.

You can reproduce it with
vec1 <- c("CMDTY", "Copper", "CORPOAS")
vec2 <-  1:3
dt <- CJ(vec1, Date)
setkey(dt, V1, V2)

Issue seems to be that CJ (..., sorted = TRUE) and setkey want to sort the character data in different orders, one case-sensitive, one not.

CJ creates

     V1 V2
1: Corp  1
2: Corp  2
3: Corp  3
4: CORP  1
5: CORP  2
6: CORP  3

And it?s keyed as you would expect by V1 then V2
> key(dt)
[1] "V1" "V2"


But after doing setkey you have

     V1 V2
1: CORP  1
2: CORP  2
3: CORP  3
4: Corp  1
5: Corp  2
6: Corp  3


data.table version 1.8.10

> R.version
               _
platform       x86_64-w64-mingw32
arch           x86_64
os             mingw32
system         x86_64, mingw32
status
major          3
minor          0.2
year           2013
month          09
day            25
svn rev        63987
language       R
version.string R version 3.0.2 (2013-09-25)
nickname       Frisbee Sailing
>


Malcolm Hawkes
On-Site Consultant, Investments - RiskManagement
Grosvenor Capital Management, L.P.
900 N. Michigan Avenue, Suite 1100
Chicago, IL  60611
mhawkes at gcmlp.com<mailto:mhawkes at gcmlp.com>


---

Disclosure and Statement of Confidentiality


Grosvenor Securities LLC, Member FINRA, Serves as Placement Agent or Distributor for Certain Investment Products Managed/Advised by GCM Grosvenor-Affiliated Entities.


The contents of this e-mail message and its attachments (if any) may be proprietary and/or confidential and are intended solely for the addressee(s) hereof. In addition, this e-mail message and its attachments (if any) may be subject to non-disclosure or confidentiality agreements or applicable legal privileges, including privileges protecting communications between attorneys or solicitors and their clients or the work product of attorneys and solicitors. If you are not the named addressee, or if this e-mail message has been addressed to you in error, please do not read, disclose, reproduce, distribute, disseminate or otherwise use this message or any of its attachments. Delivery of this e-mail message to any person other than the intended recipient(s) is not intended in any way to waive privilege or confidentiality. If you have received this e-mail message in error, please alert the sender by reply e-mail; we also request that you immediately delete this e-mail message and its attachments (if any). Grosvenor Capital Management, L.P., GCM Customized Fund Investment Group, L.P. and their affiliated entities (collectively, ?GCM Grosvenor?) reserve the right to monitor all e-mail communications through their networks. GCM Grosvenor gives no assurances that this e-mail message and its attachments (if any) are free of viruses and other harmful code.
_______________________________________________
datatable-help mailing list
datatable-help at lists.r-forge.r-project.org<mailto:datatable-help at lists.r-forge.r-project.org>
https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/datatable-help/attachments/20140214/209464d7/attachment-0001.html>

From aragorn168b at gmail.com  Fri Feb 14 22:43:00 2014
From: aragorn168b at gmail.com (Arunkumar Srinivasan)
Date: Fri, 14 Feb 2014 22:43:00 +0100
Subject: [datatable-help] CJ and setkey sort differently
In-Reply-To: <002E5054D2B84346B6F551575E3A8CA3030A7C06@DC-GCM-MB-02.gcmlp.com>
References: <002E5054D2B84346B6F551575E3A8CA3030A7BE8@DC-GCM-MB-02.gcmlp.com>
 <etPan.52fe8cc8.5db70ae5.ef94@Arunkumars-MacBook-Pro.local>
 <002E5054D2B84346B6F551575E3A8CA3030A7C06@DC-GCM-MB-02.gcmlp.com>
Message-ID: <etPan.52fe8de4.6590700b.ef94@Arunkumars-MacBook-Pro.local>

Here:?https://r-forge.r-project.org/tracker/?atid=975&group_id=240&func=browse
You've to create an account, but that's super easy.
Arun
From:?Malcolm Hawkes Malcolm Hawkes
Reply:?Malcolm Hawkes mhawkes at gcmlp.com
Date:?February 14, 2014 at 10:42:07 PM
To:?Arunkumar Srinivasan aragorn168b at gmail.com, datatable-help at lists.r-forge.r-project.org datatable-help at lists.r-forge.r-project.org
Subject:? RE: [datatable-help] CJ and setkey sort differently  
Arun

?

Oops, yes it should.? And vec1 <- c("Corp", "CORP")

?

Took me while to track down, what was causing but got it in the end J

?

Where / how do I file a bug report ?

?

Thanks

?

Malcolm

?

Malcolm Hawkes

On-Site Consultant, Investments - RiskManagement

Grosvenor Capital Management, L.P.

900 N. Michigan Avenue, Suite 1100

Chicago, IL? 60611

mhawkes at gcmlp.com

?

From: Arunkumar Srinivasan [mailto:aragorn168b at gmail.com]
Sent: Friday, February 14, 2014 3:38 PM
To: datatable-help at lists.r-forge.r-project.org; Malcolm Hawkes
Subject: Re: [datatable-help] CJ and setkey sort differently

?

Malcolm,

?

Thanks for the nice report. I suppose your `dt` creation should be: `dt <- CJ(vec1, vec2)`. The reason is pretty clear. It's an easy fix. Could you please file a bug report? Thank you.

?

Arun

From:?Malcolm HawkesMalcolm Hawkes
Reply:?Malcolm Hawkesmhawkes at gcmlp.com
Date:?February 14, 2014 at 10:34:21 PM
To:?datatable-help at lists.r-forge.r-project.orgdatatable-help@lists.r-forge.r-project.org
Subject:? [datatable-help] CJ and setkey sort differently

Ran in to the warning

?

Warning in setkeyv(x, cols, verbose = verbose) :

? Already keyed by this key but had invalid row order, key rebuilt. If you didn't go under the hood please let datatable-help know so the root cause can be fixed.

?

You can reproduce it with

vec1 <- c("CMDTY", "Copper", "CORPOAS")

vec2 <-? 1:3

dt <- CJ(vec1, Date)

setkey(dt, V1, V2)

?

Issue seems to be that CJ (..., sorted = TRUE) and setkey want to sort the character data in different orders, one case-sensitive, one not.

?

CJ creates

?

???? V1 V2

1: Corp? 1

2: Corp? 2

3: Corp? 3

4: CORP? 1

5: CORP? 2

6: CORP? 3

?

And it?s keyed as you would expect by V1 then V2

>key(dt)

[1] "V1" "V2"

?

?

But after doing setkey you have

?

???? V1 V2

1: CORP? 1

2: CORP? 2

3: CORP? 3

4: Corp? 1

5: Corp? 2

6: Corp? 3

?

?

?

data.table version 1.8.10

?

> R.version

?????????????? _??????????????????????????

platform?????? x86_64-w64-mingw32?????????

arch?????????? x86_64?????????????????????

os???????????? mingw32????????????????????

system???????? x86_64, mingw32????????????

status????????????????????????????????????

major????????? 3??????????????????????????

minor????????? 0.2????????????????????????

year?????????? 2013???????????????????????

month????????? 09?????????????????????????

day??????????? 25?????????????????????????

svn rev??????? 63987??????????????????????

language?????? R??????????????????????????

version.string R version 3.0.2 (2013-09-25)

nickname?????? Frisbee Sailing????????????

>?

?

?

Malcolm Hawkes

On-Site Consultant, Investments - RiskManagement

Grosvenor Capital Management, L.P.

900 N. Michigan Avenue, Suite 1100

Chicago, IL? 60611

mhawkes at gcmlp.com

?

?

?

---

?

Disclosure and Statement of Confidentiality

?

?

Grosvenor Securities LLC, Member FINRA, Serves as Placement Agent or Distributor for Certain Investment Products Managed/Advised by GCM Grosvenor-Affiliated Entities.

?

?

The contents of this e-mail message and its attachments (if any) may be proprietary and/or confidential and are intended solely for the addressee(s) hereof. In addition, this e-mail message and its attachments (if any) may be subject to non-disclosure or confidentiality agreements or applicable legal privileges, including privileges protecting communications between attorneys or solicitors and their clients or the work product of attorneys and solicitors. If you are not the named addressee, or if this e-mail message has been addressed to you in error, please do not read, disclose, reproduce, distribute, disseminate or otherwise use this message or any of its attachments. Delivery of this e-mail message to any person other than the intended recipient(s) is not intended in any way to waive privilege or confidentiality. If you have received this e-mail message in error, please alert the sender by reply e-mail; we also request that you immediately delete this e-mail message and its attachments (if any). Grosvenor Capital Management, L.P., GCM Customized Fund Investment Group, L.P. and their affiliated entities (collectively, ?GCM Grosvenor?) reserve the right to monitor all e-mail communications through their networks. GCM Grosvenor gives no assurances that this e-mail message and its attachments (if any) are free of viruses and other harmful code.

_______________________________________________
datatable-help mailing list
datatable-help at lists.r-forge.r-project.org
https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/datatable-help/attachments/20140214/9bd4b0e6/attachment-0001.html>

From mel at mbacou.com  Mon Feb 17 11:14:45 2014
From: mel at mbacou.com (Bacou, Melanie)
Date: Mon, 17 Feb 2014 05:14:45 -0500
Subject: [datatable-help] Problem with data.table and FastRWeb
Message-ID: <5301E115.5080806@mbacou.com>

Hi,

I am testing an R script using FastRWeb (through Rserve). FastRWeb works 
as expected and I can successfully runs Simon Urbanek's examples. 
Problems arise when I try to merge datatables. It seems FastRWeb cannot 
find merge.data.table().

I'm using plenty of other libraries (ggplot, raster, RJDBC, etc.) that 
execute successfully through FastRWeb scripts, so I'm guessing it's 
something peculiar to data.table.

Thanks for any help! --Mel.


Here are reproducible examples.

Test #1: the code below (the entire content of my R script) SUCCEEDS:

# test1.R
library(data.table)

run <- function(...) {
   oclear()
   d1 <- data.table(a=c(1,2,3), b=c("a","b","c"))
   d2 <- data.table(e=c("v","a","b"), f=c(4,6,7))
   otable(d1)
   otable(d2)
}

This returns a simple web page showing 2 tables:
1 	a
2 	b
3 	c

v 	4
a 	6
b 	7


Test #2: the code below (the entire content of my R script) FAILS with:
Error in `[.default`(x, i) : invalid subscript type 'list'

# test2.R
library(data.table)

run <- function(...) {
   oclear()
   d1 <- data.table(a=c(1,2,3), b=c("a","b","c"))
   d2 <- data.table(e=c("v","a","b"), f=c(4,6,7))
   otable(d1)
   otable(d2)
   setkey(d1, b)
   setkey(d2, e)
   otable(d1[d2])
}


-- 
Melanie BACOU
International Food Policy Research Institute
Agricultural Economist, HarvestChoice
Work +1(202)862-5699
E-mail mel at mbacou.com
Visit harvestchoice.org

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/datatable-help/attachments/20140217/532a421d/attachment.html>

From aragorn168b at gmail.com  Mon Feb 17 12:58:55 2014
From: aragorn168b at gmail.com (Arunkumar Srinivasan)
Date: Mon, 17 Feb 2014 12:58:55 +0100
Subject: [datatable-help] Problem with data.table and FastRWeb
In-Reply-To: <5301E115.5080806@mbacou.com>
References: <5301E115.5080806@mbacou.com>
Message-ID: <CAAf756O7Z+_J5Zrmcu3dpNVU6CKaS82Vitd-pPYB1iwUKMDhfA@mail.gmail.com>

Mel,
I'm not able to reproduce this on 1.8.11. Which version are you using?
I'm not aware of this package, and what 'otable' is supposed to do. But I
get no output while running your script, and not the error message as well.


On Mon, Feb 17, 2014 at 11:14 AM, Bacou, Melanie <mel at mbacou.com> wrote:

>  Hi,
>
> I am testing an R script using FastRWeb (through Rserve). FastRWeb works
> as expected and I can successfully runs Simon Urbanek's examples. Problems
> arise when I try to merge datatables. It seems FastRWeb cannot find
> merge.data.table().
>
> I'm using plenty of other libraries (ggplot, raster, RJDBC, etc.) that
> execute successfully through FastRWeb scripts, so I'm guessing it's
> something peculiar to data.table.
>
> Thanks for any help! --Mel.
>
>
> Here are reproducible examples.
>
> Test #1: the code below (the entire content of my R script) SUCCEEDS:
>
> # test1.R
> library(data.table)
>
> run <- function(...) {
>   oclear()
>   d1 <- data.table(a=c(1,2,3), b=c("a","b","c"))
>   d2 <- data.table(e=c("v","a","b"), f=c(4,6,7))
>   otable(d1)
>   otable(d2)
> }
>
> This returns a simple web page showing 2 tables:
>   1 a  2 b  3 c    v 4  a 6  b 7
>
> Test #2: the code below (the entire content of my R script) FAILS with:
> Error in `[.default`(x, i) : invalid subscript type 'list'
>
> # test2.R
>  library(data.table)
>
>  run <- function(...) {
>   oclear()
>    d1 <- data.table(a=c(1,2,3), b=c("a","b","c"))
>    d2 <- data.table(e=c("v","a","b"), f=c(4,6,7))
>    otable(d1)
>    otable(d2)
>    setkey(d1, b)
>    setkey(d2, e)
>    otable(d1[d2])
>  }
>
>
>
>
> --
> Melanie BACOU
> International Food Policy Research Institute
> Agricultural Economist, HarvestChoice
> Work +1(202)862-5699
> E-mail mel at mbacou.com
> Visit harvestchoice.org
>
>
> _______________________________________________
> datatable-help mailing list
> datatable-help at lists.r-forge.r-project.org
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/datatable-help/attachments/20140217/edec2a78/attachment.html>

From mdowle at mdowle.plus.com  Mon Feb 17 23:39:11 2014
From: mdowle at mdowle.plus.com (Matt Dowle)
Date: Mon, 17 Feb 2014 22:39:11 +0000
Subject: [datatable-help] Merging strings claim that the encodings don't
 match
In-Reply-To: <CAAuY0RWrRdzEMCkLTkhkd6DAsWvYJcMX1-3=RD-sCRRxJ42tMw@mail.gmail.com>
References: <CAAuY0RWrRdzEMCkLTkhkd6DAsWvYJcMX1-3=RD-sCRRxJ42tMw@mail.gmail.com>
Message-ID: <53028F8F.4030102@mdowle.plus.com>


Think you may have ended up with some strings internally marked ASCII by 
R,  which Encoding() returns as "unknown".  That shouldn't be a problem 
and they should join fine.   I've change the new warning in v1.8.11 so 
if it was that, it should be ok now (commit 1153), please confirm.

Matt

On 13/02/14 16:05, caneff at gmail.com wrote:
> I have a master DT. I aggregate it in one way, and aggregate it in 
> another with a common key between them.   When I  try to merge these 
> two, it says that the key does not have the same encoding on both 
> sides.  If I call Encoding() on each of the keys, they both are listed 
> as "unknown", so from what I can see they still look the same.
>
> I can't create a safe to share reproducible case unfortunately, the 
> simple ones I've tried all work.  If you can give more advice on how 
> to debug maybe I can.
>
> This is using the latest devel version. I did not have this issue i 1.8.10
>
>
> _______________________________________________
> datatable-help mailing list
> datatable-help at lists.r-forge.r-project.org
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/datatable-help/attachments/20140217/b6b57669/attachment.html>

From mel at mbacou.com  Tue Feb 18 06:31:51 2014
From: mel at mbacou.com (Bacou, Melanie)
Date: Tue, 18 Feb 2014 00:31:51 -0500
Subject: [datatable-help] Problem with data.table and FastRWeb
In-Reply-To: <CAAf756O7Z+_J5Zrmcu3dpNVU6CKaS82Vitd-pPYB1iwUKMDhfA@mail.gmail.com>
References: <5301E115.5080806@mbacou.com>
 <CAAf756O7Z+_J5Zrmcu3dpNVU6CKaS82Vitd-pPYB1iwUKMDhfA@mail.gmail.com>
Message-ID: <5302F047.6020404@mbacou.com>

Hi Arun,

This is a little tricky to reproduce unless you have installed FastRWeb, 
and then started the FastRWeb server. I'm executing these scripts from 
the browser through a call to FastRWeb running on a local port.

Installation is documented here and is quick and straightforward on Linux:
https://rforge.net/FastRWeb/
and an example here:
http://jayemerson.blogspot.mx/2011/10/setting-up-fastrwebrserve-on-ubuntu.html

I'm using FastRWeb to build a simple web service. As long as I stick to 
data.frame methods, everything works fine and I get the expected plots 
and HTML output in the browser. But calls to data.table methods (merge, 
extract) all seem to default to data.frame, and I really don't know how 
to debug that.

I am copying Simon Urbanek who's the maintainer of FastRWeb, in case 
this is more of a FastRWeb issue.

Here is my session info (I am on CentOS 5 and cannot easily upgrade to R 
3.0.2).

 > sessionInfo()
R version 2.15.2 (2012-10-26)
Platform: x86_64-redhat-linux-gnu (64-bit)

locale:
[1] C

attached base packages:
[1] stats     graphics  utils     datasets  grDevices methods base

other attached packages:
  [1] ggmap_2.3          ggplot2_0.9.3.1    RColorBrewer_1.0-5 
raster_2.2-12
  [5] rgeos_0.3-3        rgdal_0.8-16       sp_1.0-14 data.table_1.8.10
  [9] RJDBC_0.2-3        rJava_0.9-6        DBI_0.2-7 rj_1.1.2-3

loaded via a namespace (and not attached):
  [1] MASS_7.3-23         RJSONIO_1.0-3       RgoogleMaps_1.2.0.5
  [4] colorspace_1.2-4    dichromat_2.0-0     digest_0.6.4
  [7] grid_2.15.2         gtable_0.1.2        labeling_0.2
[10] lattice_0.20-24     mapproj_1.2-2       maps_2.3-6
[13] munsell_0.4.2       plyr_1.8            png_0.1-7
[16] proto_0.3-10        reshape2_1.2.2      rj.gd_1.1.0-1
[19] rjson_0.2.13        scales_0.2.3        stringr_0.6.2
[22] tools_2.15.2

Thanks all!
--Mel.


On 2/17/2014 6:58 AM, Arunkumar Srinivasan wrote:
> Mel,
> I'm not able to reproduce this on 1.8.11. Which version are you using?
> I'm not aware of this package, and what 'otable' is supposed to do. 
> But I get no output while running your script, and not the error 
> message as well.
>
>
> On Mon, Feb 17, 2014 at 11:14 AM, Bacou, Melanie <mel at mbacou.com 
> <mailto:mel at mbacou.com>> wrote:
>
>     Hi,
>
>     I am testing an R script using FastRWeb (through Rserve). FastRWeb
>     works as expected and I can successfully runs Simon Urbanek's
>     examples. Problems arise when I try to merge datatables. It seems
>     FastRWeb cannot find merge.data.table().
>
>     I'm using plenty of other libraries (ggplot, raster, RJDBC, etc.)
>     that execute successfully through FastRWeb scripts, so I'm
>     guessing it's something peculiar to data.table.
>
>     Thanks for any help! --Mel.
>
>
>     Here are reproducible examples.
>
>     Test #1: the code below (the entire content of my R script) SUCCEEDS:
>
>     # test1.R
>     library(data.table)
>
>     run <- function(...) {
>       oclear()
>       d1 <- data.table(a=c(1,2,3), b=c("a","b","c"))
>       d2 <- data.table(e=c("v","a","b"), f=c(4,6,7))
>       otable(d1)
>       otable(d2)
>     }
>
>     This returns a simple web page showing 2 tables:
>     1 	a
>     2 	b
>     3 	c
>
>     v 	4
>     a 	6
>     b 	7
>
>
>
>     Test #2: the code below (the entire content of my R script) FAILS
>     with:
>     Error in `[.default`(x, i) : invalid subscript type 'list'
>
>     # test2.R
>     library(data.table)
>
>     run <- function(...) {
>       oclear()
>       d1 <- data.table(a=c(1,2,3), b=c("a","b","c"))
>       d2 <- data.table(e=c("v","a","b"), f=c(4,6,7))
>       otable(d1)
>       otable(d2)
>       setkey(d1, b)
>       setkey(d2, e)
>       otable(d1[d2])
>     }
>
>
>
>
>     -- 
>     Melanie BACOU
>     International Food Policy Research Institute
>     Agricultural Economist, HarvestChoice
>     Work+1(202)862-5699  <tel:%2B1%28202%29862-5699>
>     E-mailmel at mbacou.com  <mailto:mel at mbacou.com>
>     Visitharvestchoice.org  <http://harvestchoice.org>  
>
>
>     _______________________________________________
>     datatable-help mailing list
>     datatable-help at lists.r-forge.r-project.org
>     <mailto:datatable-help at lists.r-forge.r-project.org>
>     https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
>
>

-- 
Melanie BACOU
International Food Policy Research Institute
Agricultural Economist, HarvestChoice
Work +1(202)862-5699
E-mail mel at mbacou.com
Visit harvestchoice.org

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/datatable-help/attachments/20140218/6557a635/attachment-0001.html>

From mdowle at mdowle.plus.com  Tue Feb 18 11:43:03 2014
From: mdowle at mdowle.plus.com (Matt Dowle)
Date: Tue, 18 Feb 2014 10:43:03 +0000
Subject: [datatable-help] Problem with data.table and FastRWeb
In-Reply-To: <5302F047.6020404@mbacou.com>
References: <5301E115.5080806@mbacou.com>
 <CAAf756O7Z+_J5Zrmcu3dpNVU6CKaS82Vitd-pPYB1iwUKMDhfA@mail.gmail.com>
 <5302F047.6020404@mbacou.com>
Message-ID: <53033937.2090607@mdowle.plus.com>


Hi Mel,

Thanks for the info.  It's likely related to cedta() and we can handle 
it from data.table's side as follows.

Background :

     http://stackoverflow.com/a/10529888/403310

Type "data.table:::cedta" so you can see the rules.  I guess FastWeb is 
running your code in its own environment.  First thing,  turn on verbosity :

     options(data.table.verbose=TRUE)     #  or for one statement rather 
than globally,   d1[d2,verbose=TRUE]

and run your code again.  You should see a message  "cedta decided 
'<nsname>' wasn't data.table aware",  where <nsname> is probably "FastRWeb".

This calling environment (let's assume "FastRWeb" from now on) is more 
like .GlobalEnv than a package;  i.e.,  it's where you run your own 
code, you've done library(data.table) in that environment,  and so it is 
data.table aware as far as you're concerned.  So what to do?  There are 
two override mechanisms :

The data.table package contains a character vector :

 > data.table:::cedta.override
[1] "gWidgetsWWW"

It already contains one package which is similar in nature.  You can add 
FastRWeb to that vector yourself as follows :

 > assignInNamespace("cedta.override", c("gWidgetsWWW","FastRWeb"), 
"data.table")
 > data.table:::cedta.override
[1] "gWidgetsWWW" "FastRWeb"

But I'll also add FastRWeb to that vector in data.table, so from the 
next version of data.table you won't have to do it yourself. We'll add 
new packages as we become aware of them.

Alternatively,  the package author (Simon in this case) can provide 
data.table-awareness optionally.  This mechanism was added for dplyr so 
it can control data.table awareness from the caller's end.  That's done 
by setting a variable .datatable.aware=TRUE|FALSE in the calling 
package's namespace. However, in the case of FastRWeb,  the 
cedta.override on data.table's side seems the right way to go.

Matt


On 18/02/14 05:31, Bacou, Melanie wrote:
> Hi Arun,
>
> This is a little tricky to reproduce unless you have installed 
> FastRWeb, and then started the FastRWeb server. I'm executing these 
> scripts from the browser through a call to FastRWeb running on a local 
> port.
>
> Installation is documented here and is quick and straightforward on Linux:
> https://rforge.net/FastRWeb/
> and an example here:
> http://jayemerson.blogspot.mx/2011/10/setting-up-fastrwebrserve-on-ubuntu.html
>
> I'm using FastRWeb to build a simple web service. As long as I stick 
> to data.frame methods, everything works fine and I get the expected 
> plots and HTML output in the browser. But calls to data.table methods 
> (merge, extract) all seem to default to data.frame, and I really don't 
> know how to debug that.
>
> I am copying Simon Urbanek who's the maintainer of FastRWeb, in case 
> this is more of a FastRWeb issue.
>
> Here is my session info (I am on CentOS 5 and cannot easily upgrade to 
> R 3.0.2).
>
> > sessionInfo()
> R version 2.15.2 (2012-10-26)
> Platform: x86_64-redhat-linux-gnu (64-bit)
>
> locale:
> [1] C
>
> attached base packages:
> [1] stats     graphics  utils     datasets  grDevices methods base
>
> other attached packages:
>  [1] ggmap_2.3          ggplot2_0.9.3.1    RColorBrewer_1.0-5 
> raster_2.2-12
>  [5] rgeos_0.3-3        rgdal_0.8-16       sp_1.0-14 data.table_1.8.10
>  [9] RJDBC_0.2-3        rJava_0.9-6        DBI_0.2-7 rj_1.1.2-3
>
> loaded via a namespace (and not attached):
>  [1] MASS_7.3-23         RJSONIO_1.0-3       RgoogleMaps_1.2.0.5
>  [4] colorspace_1.2-4    dichromat_2.0-0     digest_0.6.4
>  [7] grid_2.15.2         gtable_0.1.2        labeling_0.2
> [10] lattice_0.20-24     mapproj_1.2-2       maps_2.3-6
> [13] munsell_0.4.2       plyr_1.8            png_0.1-7
> [16] proto_0.3-10        reshape2_1.2.2      rj.gd_1.1.0-1
> [19] rjson_0.2.13        scales_0.2.3        stringr_0.6.2
> [22] tools_2.15.2
>
> Thanks all!
> --Mel.
>
>
>
>
> On 2/17/2014 6:58 AM, Arunkumar Srinivasan wrote:
>> Mel,
>> I'm not able to reproduce this on 1.8.11. Which version are you using?
>> I'm not aware of this package, and what 'otable' is supposed to do. 
>> But I get no output while running your script, and not the error 
>> message as well.
>>
>>
>> On Mon, Feb 17, 2014 at 11:14 AM, Bacou, Melanie <mel at mbacou.com 
>> <mailto:mel at mbacou.com>> wrote:
>>
>>     Hi,
>>
>>     I am testing an R script using FastRWeb (through Rserve).
>>     FastRWeb works as expected and I can successfully runs Simon
>>     Urbanek's examples. Problems arise when I try to merge
>>     datatables. It seems FastRWeb cannot find merge.data.table().
>>
>>     I'm using plenty of other libraries (ggplot, raster, RJDBC, etc.)
>>     that execute successfully through FastRWeb scripts, so I'm
>>     guessing it's something peculiar to data.table.
>>
>>     Thanks for any help! --Mel.
>>
>>
>>     Here are reproducible examples.
>>
>>     Test #1: the code below (the entire content of my R script) SUCCEEDS:
>>
>>     # test1.R
>>     library(data.table)
>>
>>     run <- function(...) {
>>       oclear()
>>       d1 <- data.table(a=c(1,2,3), b=c("a","b","c"))
>>       d2 <- data.table(e=c("v","a","b"), f=c(4,6,7))
>>       otable(d1)
>>       otable(d2)
>>     }
>>
>>     This returns a simple web page showing 2 tables:
>>     1 	a
>>     2 	b
>>     3 	c
>>
>>     v 	4
>>     a 	6
>>     b 	7
>>
>>
>>
>>     Test #2: the code below (the entire content of my R script) FAILS
>>     with:
>>     Error in `[.default`(x, i) : invalid subscript type 'list'
>>
>>     # test2.R
>>     library(data.table)
>>
>>     run <- function(...) {
>>       oclear()
>>       d1 <- data.table(a=c(1,2,3), b=c("a","b","c"))
>>       d2 <- data.table(e=c("v","a","b"), f=c(4,6,7))
>>       otable(d1)
>>       otable(d2)
>>       setkey(d1, b)
>>       setkey(d2, e)
>>       otable(d1[d2])
>>     }
>>
>>
>>
>>
>>     -- 
>>     Melanie BACOU
>>     International Food Policy Research Institute
>>     Agricultural Economist, HarvestChoice
>>     Work+1(202)862-5699  <tel:%2B1%28202%29862-5699>
>>     E-mailmel at mbacou.com  <mailto:mel at mbacou.com>
>>     Visitharvestchoice.org  <http://harvestchoice.org>  
>>
>>
>>     _______________________________________________
>>     datatable-help mailing list
>>     datatable-help at lists.r-forge.r-project.org
>>     <mailto:datatable-help at lists.r-forge.r-project.org>
>>     https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
>>
>>
>
> -- 
> Melanie BACOU
> International Food Policy Research Institute
> Agricultural Economist, HarvestChoice
> Work +1(202)862-5699
> E-mailmel at mbacou.com
> Visit harvestchoice.org
>
>
> _______________________________________________
> datatable-help mailing list
> datatable-help at lists.r-forge.r-project.org
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/datatable-help/attachments/20140218/97674b5d/attachment.html>

From mel at mbacou.com  Wed Feb 19 07:37:36 2014
From: mel at mbacou.com (Bacou, Melanie)
Date: Wed, 19 Feb 2014 01:37:36 -0500
Subject: [datatable-help] Problem with data.table and FastRWeb
In-Reply-To: <53033937.2090607@mdowle.plus.com>
References: <5301E115.5080806@mbacou.com>
 <CAAf756O7Z+_J5Zrmcu3dpNVU6CKaS82Vitd-pPYB1iwUKMDhfA@mail.gmail.com>
 <5302F047.6020404@mbacou.com> <53033937.2090607@mdowle.plus.com>
Message-ID: <53045130.1070603@mbacou.com>

Hi Matt,

Thanks very much for your detailed explanation, and for offering to 
patch data.table.

Adding
 > assignInNamespace("cedta.override", c("gWidgetsWWW","FastRWeb"), 
"data.table")
did the trick here.

Perfect! --Mel.


On 2/18/2014 5:43 AM, Matt Dowle wrote:
>
> Hi Mel,
>
> Thanks for the info.  It's likely related to cedta() and we can handle 
> it from data.table's side as follows.
>
> Background :
>
> http://stackoverflow.com/a/10529888/403310
>
> Type "data.table:::cedta" so you can see the rules.  I guess FastWeb 
> is running your code in its own environment.  First thing,  turn on 
> verbosity :
>
>     options(data.table.verbose=TRUE)     #  or for one statement 
> rather than globally,   d1[d2,verbose=TRUE]
>
> and run your code again.  You should see a message  "cedta decided 
> '<nsname>' wasn't data.table aware",  where <nsname> is probably 
> "FastRWeb".
>
> This calling environment (let's assume "FastRWeb" from now on) is more 
> like .GlobalEnv than a package;  i.e.,  it's where you run your own 
> code, you've done library(data.table) in that environment,  and so it 
> is data.table aware as far as you're concerned.  So what to do?  There 
> are two override mechanisms :
>
> The data.table package contains a character vector :
>
> > data.table:::cedta.override
> [1] "gWidgetsWWW"
>
> It already contains one package which is similar in nature.  You can 
> add FastRWeb to that vector yourself as follows :
>
> > assignInNamespace("cedta.override", c("gWidgetsWWW","FastRWeb"), 
> "data.table")
> > data.table:::cedta.override
> [1] "gWidgetsWWW" "FastRWeb"
>
> But I'll also add FastRWeb to that vector in data.table, so from the 
> next version of data.table you won't have to do it yourself.  We'll 
> add new packages as we become aware of them.
>
> Alternatively,  the package author (Simon in this case) can provide 
> data.table-awareness optionally.  This mechanism was added for dplyr 
> so it can control data.table awareness from the caller's end.  That's 
> done by setting a variable .datatable.aware=TRUE|FALSE in the calling 
> package's namespace.    However, in the case of FastRWeb,  the 
> cedta.override on data.table's side seems the right way to go.
>
> Matt
>
>
> On 18/02/14 05:31, Bacou, Melanie wrote:
>> Hi Arun,
>>
>> This is a little tricky to reproduce unless you have installed 
>> FastRWeb, and then started the FastRWeb server. I'm executing these 
>> scripts from the browser through a call to FastRWeb running on a 
>> local port.
>>
>> Installation is documented here and is quick and straightforward on 
>> Linux:
>> https://rforge.net/FastRWeb/
>> and an example here:
>> http://jayemerson.blogspot.mx/2011/10/setting-up-fastrwebrserve-on-ubuntu.html
>>
>> I'm using FastRWeb to build a simple web service. As long as I stick 
>> to data.frame methods, everything works fine and I get the expected 
>> plots and HTML output in the browser. But calls to data.table methods 
>> (merge, extract) all seem to default to data.frame, and I really 
>> don't know how to debug that.
>>
>> I am copying Simon Urbanek who's the maintainer of FastRWeb, in case 
>> this is more of a FastRWeb issue.
>>
>> Here is my session info (I am on CentOS 5 and cannot easily upgrade 
>> to R 3.0.2).
>>
>> > sessionInfo()
>> R version 2.15.2 (2012-10-26)
>> Platform: x86_64-redhat-linux-gnu (64-bit)
>>
>> locale:
>> [1] C
>>
>> attached base packages:
>> [1] stats     graphics  utils     datasets  grDevices methods base
>>
>> other attached packages:
>>  [1] ggmap_2.3          ggplot2_0.9.3.1    RColorBrewer_1.0-5 
>> raster_2.2-12
>>  [5] rgeos_0.3-3        rgdal_0.8-16       sp_1.0-14 data.table_1.8.10
>>  [9] RJDBC_0.2-3        rJava_0.9-6        DBI_0.2-7 rj_1.1.2-3
>>
>> loaded via a namespace (and not attached):
>>  [1] MASS_7.3-23         RJSONIO_1.0-3       RgoogleMaps_1.2.0.5
>>  [4] colorspace_1.2-4    dichromat_2.0-0     digest_0.6.4
>>  [7] grid_2.15.2         gtable_0.1.2        labeling_0.2
>> [10] lattice_0.20-24     mapproj_1.2-2       maps_2.3-6
>> [13] munsell_0.4.2       plyr_1.8            png_0.1-7
>> [16] proto_0.3-10        reshape2_1.2.2      rj.gd_1.1.0-1
>> [19] rjson_0.2.13        scales_0.2.3        stringr_0.6.2
>> [22] tools_2.15.2
>>
>> Thanks all!
>> --Mel.
>>
>>
>>
>>
>> On 2/17/2014 6:58 AM, Arunkumar Srinivasan wrote:
>>> Mel,
>>> I'm not able to reproduce this on 1.8.11. Which version are you using?
>>> I'm not aware of this package, and what 'otable' is supposed to do. 
>>> But I get no output while running your script, and not the error 
>>> message as well.
>>>
>>>
>>> On Mon, Feb 17, 2014 at 11:14 AM, Bacou, Melanie <mel at mbacou.com 
>>> <mailto:mel at mbacou.com>> wrote:
>>>
>>>     Hi,
>>>
>>>     I am testing an R script using FastRWeb (through Rserve).
>>>     FastRWeb works as expected and I can successfully runs Simon
>>>     Urbanek's examples. Problems arise when I try to merge
>>>     datatables. It seems FastRWeb cannot find merge.data.table().
>>>
>>>     I'm using plenty of other libraries (ggplot, raster, RJDBC,
>>>     etc.) that execute successfully through FastRWeb scripts, so I'm
>>>     guessing it's something peculiar to data.table.
>>>
>>>     Thanks for any help! --Mel.
>>>
>>>
>>>     Here are reproducible examples.
>>>
>>>     Test #1: the code below (the entire content of my R script)
>>>     SUCCEEDS:
>>>
>>>     # test1.R
>>>     library(data.table)
>>>
>>>     run <- function(...) {
>>>       oclear()
>>>       d1 <- data.table(a=c(1,2,3), b=c("a","b","c"))
>>>       d2 <- data.table(e=c("v","a","b"), f=c(4,6,7))
>>>       otable(d1)
>>>       otable(d2)
>>>     }
>>>
>>>     This returns a simple web page showing 2 tables:
>>>     1 	a
>>>     2 	b
>>>     3 	c
>>>
>>>     v 	4
>>>     a 	6
>>>     b 	7
>>>
>>>
>>>
>>>     Test #2: the code below (the entire content of my R script)
>>>     FAILS with:
>>>     Error in `[.default`(x, i) : invalid subscript type 'list'
>>>
>>>     # test2.R
>>>     library(data.table)
>>>
>>>     run <- function(...) {
>>>       oclear()
>>>       d1 <- data.table(a=c(1,2,3), b=c("a","b","c"))
>>>       d2 <- data.table(e=c("v","a","b"), f=c(4,6,7))
>>>       otable(d1)
>>>       otable(d2)
>>>       setkey(d1, b)
>>>       setkey(d2, e)
>>>       otable(d1[d2])
>>>     }
>>>
>>>
>>>
>>>
>>>     -- 
>>>     Melanie BACOU
>>>     International Food Policy Research Institute
>>>     Agricultural Economist, HarvestChoice
>>>     Work+1(202)862-5699  <tel:%2B1%28202%29862-5699>
>>>     E-mailmel at mbacou.com  <mailto:mel at mbacou.com>
>>>     Visitharvestchoice.org  <http://harvestchoice.org>  
>>>
>>>
>>>     _______________________________________________
>>>     datatable-help mailing list
>>>     datatable-help at lists.r-forge.r-project.org
>>>     <mailto:datatable-help at lists.r-forge.r-project.org>
>>>     https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
>>>
>>>
>>
>> -- 
>> Melanie BACOU
>> International Food Policy Research Institute
>> Agricultural Economist, HarvestChoice
>> Work +1(202)862-5699
>> E-mailmel at mbacou.com
>> Visit harvestchoice.org
>>
>>
>> _______________________________________________
>> datatable-help mailing list
>> datatable-help at lists.r-forge.r-project.org
>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
>

-- 
Melanie BACOU
International Food Policy Research Institute
Agricultural Economist, HarvestChoice
Work +1(202)862-5699
E-mail mel at mbacou.com
Visit harvestchoice.org

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/datatable-help/attachments/20140219/9fc5e826/attachment.html>

From bradleydemarest at gmail.com  Thu Feb 27 04:26:56 2014
From: bradleydemarest at gmail.com (bradley demarest)
Date: Wed, 26 Feb 2014 20:26:56 -0700
Subject: [datatable-help] Obtain data.table_1.8.11.tar.gz source?
Message-ID: <CAAetXrzaRsY6Lm+g4g5bTM-7+QDtkSXaePmV6+7Cu78DwbwfRQ@mail.gmail.com>

I deleted data.table 1.8.11 by prematurely trying to update to 1.9.0.

Now I'm stuck with 1.8.10, but I really need melt and cast for an
ongoing project.

Can anyone provide a link to the 1.8.11 source while the cran build
issues are being resolved?

Sincerely,
Bradley Demarest

From mdowle at mdowle.plus.com  Thu Feb 27 04:38:51 2014
From: mdowle at mdowle.plus.com (Matt Dowle)
Date: Thu, 27 Feb 2014 03:38:51 +0000
Subject: [datatable-help] Obtain data.table_1.8.11.tar.gz source?
In-Reply-To: <CAAetXrzaRsY6Lm+g4g5bTM-7+QDtkSXaePmV6+7Cu78DwbwfRQ@mail.gmail.com>
References: <CAAetXrzaRsY6Lm+g4g5bTM-7+QDtkSXaePmV6+7Cu78DwbwfRQ@mail.gmail.com>
Message-ID: <530EB34B.7040804@mdowle.plus.com>


Sure,  now in the homepage directory.  That takes an hour or so to 
update, so this link should work now :

https://r-forge.r-project.org/scm/viewvc.php/*checkout*/www/data.table_1.8.11.tar.gz?root=datatable

if not, try here and browse from there :

https://r-forge.r-project.org/scm/viewvc.php/www/?root=datatable

Matt


On 27/02/14 03:26, bradley demarest wrote:
> I deleted data.table 1.8.11 by prematurely trying to update to 1.9.0.
>
> Now I'm stuck with 1.8.10, but I really need melt and cast for an
> ongoing project.
>
> Can anyone provide a link to the 1.8.11 source while the cran build
> issues are being resolved?
>
> Sincerely,
> Bradley Demarest
> _______________________________________________
> datatable-help mailing list
> datatable-help at lists.r-forge.r-project.org
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
>


From mdowle at mdowle.plus.com  Thu Feb 27 15:43:50 2014
From: mdowle at mdowle.plus.com (Matt Dowle)
Date: Thu, 27 Feb 2014 14:43:50 +0000
Subject: [datatable-help] v1.9.2 is now on CRAN
Message-ID: <530F4F26.8050801@mdowle.plus.com>


It usually takes a few days for binaries to make their way to all 
mirrors and all platforms.  You can install now from source from CRAN,  
or within an hour the data.table homepage should refresh with the 
Windows .zip for v1.9.2  (Ctrl-F5 and even clearing the browser cache 
may be required to refresh).

NEWS is on CRAN :

     http://cran.r-project.org/web/packages/data.table/NEWS

Real-time NEWS as we now move on to v1.9.3 is here (nothing yet) :

https://r-forge.r-project.org/scm/viewvc.php/pkg/NEWS?view=markup&root=datatable

Matt


From carrieromichele at gmail.com  Thu Feb 27 16:05:07 2014
From: carrieromichele at gmail.com (carrieromichele)
Date: Thu, 27 Feb 2014 15:05:07 +0000
Subject: [datatable-help] Possible bug in 1.9.x versions
Message-ID: <CAP=9f+rmd3=Znha+pRQU0S7k=XpxaPCw6P=ZmUtDFo2crKEOqg@mail.gmail.com>

I just installed the new data.table versions. I tried both 1.9.0, available
(binary) at http://datatable.r-forge.r-project.org/data.table_1.9.0.zip,
and 1.9.2 (CRAN) building from source (using Rtools)

After installing I run my BAU scripts and found out that I had different
results... this is what I could made reproducible

1.8.10

> library(data.table)
data.table 1.8.10  For help type: help("data.table")
> set.seed(1)
> dt <- data.table(id=rep(1:4, each=3),
+                  var1 = rep(letters[1:3], 4),
+                  var2 = rnorm(12),
+                  key="id,var1")
> dt
    id var1       var2
 1:  1    a -0.6264538
 2:  1    b  0.1836433
 3:  1    c -0.8356286
 4:  2    a  1.5952808
 5:  2    b  0.3295078
 6:  2    c -0.8204684
 7:  3    a  0.4874291
 8:  3    b  0.7383247
 9:  3    c  0.5757814
10:  4    a -0.3053884
11:  4    b  1.5117812
12:  4    c  0.3898432
>
> key(dt)
[1] "id"   "var1"
> dt[.(unique(id)), list(var1, var2)]
    id var1       var2
 1:  1    a -0.6264538
 2:  1    b  0.1836433
 3:  1    c -0.8356286
 4:  2    a  1.5952808
 5:  2    b  0.3295078
 6:  2    c -0.8204684
 7:  3    a  0.4874291
 8:  3    b  0.7383247
 9:  3    c  0.5757814
10:  4    a -0.3053884
11:  4    b  1.5117812
12:  4    c  0.3898432

1.9.0


> library(data.table)
data.table 1.9.0  For help type: help("data.table")
Warning message:
package 'data.table' was built under R version 3.1.0
> set.seed(1)
> dt <- data.table(id=rep(1:4, each=3),
+                  var1 = rep(letters[1:3], 4),
+                  var2 = rnorm(12),
+                  key="id,var1")
> dt
    id var1       var2
 1:  1    a -0.6264538
 2:  1    b  0.1836433
 3:  1    c -0.8356286
 4:  2    a  1.5952808
 5:  2    b  0.3295078
 6:  2    c -0.8204684
 7:  3    a  0.4874291
 8:  3    b  0.7383247
 9:  3    c  0.5757814
10:  4    a -0.3053884
11:  4    b  1.5117812
12:  4    c  0.3898432
>
> key(dt)
[1] "id"   "var1"
> dt[.(unique(id)), list(var1, var2)]
    id var1       var2
 1:  1    a -0.6264538
 2:  1    a  0.1836433
 3:  1    a -0.8356286
 4:  2    a  1.5952808
 5:  2    a  0.3295078
 6:  2    a -0.8204684
 7:  3    a  0.4874291
 8:  3    a  0.7383247
 9:  3    a  0.5757814
10:  4    a -0.3053884
11:  4    a  1.5117812
12:  4    a  0.3898432

1.9.2

> library("data.table", lib.loc="C:/Program Files/R/R-3.0.2/library")
data.table 1.9.2  For help type: help("data.table")
> set.seed(1)
> dt <- data.table(id=rep(1:4, each=3),
+                  var1 = rep(letters[1:3], 4),
+                  var2 = rnorm(12),
+                  key="id,var1")
Error in forder(x, cols, sort = TRUE, retGrp = FALSE) :
  object 'Cforder' not found
> dt
    id var1       var2
 1:  1    a -0.6264538
 2:  1    b  0.1836433
 3:  1    c -0.8356286
 4:  2    a  1.5952808
 5:  2    b  0.3295078
 6:  2    c -0.8204684
 7:  3    a  0.4874291
 8:  3    b  0.7383247
 9:  3    c  0.5757814
10:  4    a -0.3053884
11:  4    b  1.5117812
12:  4    c  0.3898432
>
> key(dt)
[1] "id"   "var1"
> dt[.(unique(id)), list(var1, var2)]
Error in `[.data.table`(dt, .(unique(id)), list(var1, var2)) :
  object 'Cbmerge' not found

It seems that in the 1.9.0 version when you join using fewer keys than the
whole set of keys, the first values of the remaining keys are "carried
forward". Other column looks fine.

In the 1.9.2 instead some dependencies seem missing.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/datatable-help/attachments/20140227/97f3390b/attachment.html>

From mdowle at mdowle.plus.com  Thu Feb 27 16:14:30 2014
From: mdowle at mdowle.plus.com (Matt Dowle)
Date: Thu, 27 Feb 2014 15:14:30 +0000
Subject: [datatable-help] Possible bug in 1.9.x versions
In-Reply-To: <CAP=9f+rmd3=Znha+pRQU0S7k=XpxaPCw6P=ZmUtDFo2crKEOqg@mail.gmail.com>
References: <CAP=9f+rmd3=Znha+pRQU0S7k=XpxaPCw6P=ZmUtDFo2crKEOqg@mail.gmail.com>
Message-ID: <530F5656.2030801@mdowle.plus.com>


 From those messages,  it looks like the install didn't work properly.  
This can happen on Windows if another process is still using the older .dll.

On every release we usually do get reports like this.

Since it is Windows,  let's try overkill first :

1. Close all R sessions
2. To be sure,  reboot.  This ensures all locks on open .dlls are fully 
cleared.
3. Start R
4. remove.package("data.table")
5. install.packages("data.table")
6. require(data.table)
7. test.data.table()  -- does it work?
8. Rerun test

The Windows .zip for 1.9.2 is now on the homepage,  so it's best to use 
that one please.

Matt

On 27/02/14 15:05, carrieromichele wrote:
> I just installed the new data.table versions. I tried both 
> 1.9.0, available (binary) at 
> http://datatable.r-forge.r-project.org/data.table_1.9.0.zip, and 1.9.2 
> (CRAN) building from source (using Rtools)
>
> After installing I run my BAU scripts and found out that I had 
> different results... this is what I could made reproducible
>
> 1.8.10
>
> > library(data.table)
> data.table 1.8.10  For help type: help("data.table")
> > set.seed(1)
> > dt <- data.table(id=rep(1:4, each=3),
> +                  var1 = rep(letters[1:3], 4),
> +                  var2 = rnorm(12),
> +                  key="id,var1")
> > dt
>     id var1       var2
>  1:  1    a -0.6264538
>  2:  1    b  0.1836433
>  3:  1    c -0.8356286
>  4:  2    a  1.5952808
>  5:  2    b  0.3295078
>  6:  2    c -0.8204684
>  7:  3    a  0.4874291
>  8:  3    b  0.7383247
>  9:  3    c  0.5757814
> 10:  4    a -0.3053884
> 11:  4    b  1.5117812
> 12:  4    c  0.3898432
> >
> > key(dt)
> [1] "id"   "var1"
> > dt[.(unique(id)), list(var1, var2)]
>     id var1       var2
>  1:  1    a -0.6264538
>  2:  1    b  0.1836433
>  3:  1    c -0.8356286
>  4:  2    a  1.5952808
>  5:  2    b  0.3295078
>  6:  2    c -0.8204684
>  7:  3    a  0.4874291
>  8:  3    b  0.7383247
>  9:  3    c  0.5757814
> 10:  4    a -0.3053884
> 11:  4    b  1.5117812
> 12:  4    c  0.3898432
>
> 1.9.0
>
>
> > library(data.table)
> data.table 1.9.0  For help type: help("data.table")
> Warning message:
> package 'data.table' was built under R version 3.1.0
> > set.seed(1)
> > dt <- data.table(id=rep(1:4, each=3),
> +                  var1 = rep(letters[1:3], 4),
> +                  var2 = rnorm(12),
> +                  key="id,var1")
> > dt
>     id var1       var2
>  1:  1    a -0.6264538
>  2:  1    b  0.1836433
>  3:  1    c -0.8356286
>  4:  2    a  1.5952808
>  5:  2    b  0.3295078
>  6:  2    c -0.8204684
>  7:  3    a  0.4874291
>  8:  3    b  0.7383247
>  9:  3    c  0.5757814
> 10:  4    a -0.3053884
> 11:  4    b  1.5117812
> 12:  4    c  0.3898432
> >
> > key(dt)
> [1] "id"   "var1"
> > dt[.(unique(id)), list(var1, var2)]
>     id var1       var2
>  1:  1    a -0.6264538
>  2:  1    a  0.1836433
>  3:  1    a -0.8356286
>  4:  2    a  1.5952808
>  5:  2    a  0.3295078
>  6:  2    a -0.8204684
>  7:  3    a  0.4874291
>  8:  3    a  0.7383247
>  9:  3    a  0.5757814
> 10:  4    a -0.3053884
> 11:  4    a  1.5117812
> 12:  4    a  0.3898432
>
> 1.9.2
>
> > library("data.table", lib.loc="C:/Program Files/R/R-3.0.2/library")
> data.table 1.9.2  For help type: help("data.table")
> > set.seed(1)
> > dt <- data.table(id=rep(1:4, each=3),
> +                  var1 = rep(letters[1:3], 4),
> +                  var2 = rnorm(12),
> +                  key="id,var1")
> Error in forder(x, cols, sort = TRUE, retGrp = FALSE) :
>   object 'Cforder' not found
> > dt
>     id var1       var2
>  1:  1    a -0.6264538
>  2:  1    b  0.1836433
>  3:  1    c -0.8356286
>  4:  2    a  1.5952808
>  5:  2    b  0.3295078
>  6:  2    c -0.8204684
>  7:  3    a  0.4874291
>  8:  3    b  0.7383247
>  9:  3    c  0.5757814
> 10:  4    a -0.3053884
> 11:  4    b  1.5117812
> 12:  4    c  0.3898432
> >
> > key(dt)
> [1] "id"   "var1"
> > dt[.(unique(id)), list(var1, var2)]
> Error in `[.data.table`(dt, .(unique(id)), list(var1, var2)) :
>   object 'Cbmerge' not found
>
> It seems that in the 1.9.0 version when you join using fewer keys than 
> the whole set of keys, the first values of the remaining keys are 
> "carried forward". Other column looks fine.
>
> In the 1.9.2 instead some dependencies seem missing.
>
>
>
>
> _______________________________________________
> datatable-help mailing list
> datatable-help at lists.r-forge.r-project.org
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/datatable-help/attachments/20140227/a37a6c6a/attachment-0001.html>

From carrieromichele at gmail.com  Thu Feb 27 16:26:27 2014
From: carrieromichele at gmail.com (Michele)
Date: Thu, 27 Feb 2014 07:26:27 -0800 (PST)
Subject: [datatable-help] Possible bug in 1.9.x versions
In-Reply-To: <530F5656.2030801@mdowle.plus.com>
References: <CAP=9f+rmd3=Znha+pRQU0S7k=XpxaPCw6P=ZmUtDFo2crKEOqg@mail.gmail.com>
 <530F5656.2030801@mdowle.plus.com>
Message-ID: <1393514787756-4685932.post@n4.nabble.com>

Hi, thanks for the quick response. Still nothing though. Using the .zip form
r-forge, at http://datatable.r-forge.r-project.org/data.table_1.9.2.zip,
doesn't give me errors like `object 'Cforder' not found
`, but the below join is still incorrect.

> remove.packages("data.table")
Removing package from ?C:/Program Files/R/R-3.0.2/library?
(as ?lib? is unspecified)
> install.packages("C:/Users/MCarrie/Downloads/data.table_1.9.2.zip", repos
> = NULL)
Warning in install.packages :
  package ?C:/Users/MCarrie/Downloads/data.table_1.9.2.zip? is not available
(for R version 3.0.2)
package ?data.table? successfully unpacked and MD5 sums checked
> require(data.table)
Loading required package: data.table
data.table 1.9.2  For help type: help("data.table")
Warning message:
package ?data.table? was built under R version 3.1.0 
> test.data.table()
Running C:/Program Files/R/R-3.0.2/library/data.table/tests/tests.Rraw 
Loading required package: reshape2
Loading required package: reshape
Loading required package: plyr
Loading required package: ggplot2
Loading required package: hexbin
Loading required package: nlme
Loading required package: xts
Loading required package: zoo

Attaching package: ?zoo?

The following objects are masked from ?package:base?:

    as.Date, as.Date.numeric


Attaching package: ?xts?

The following object is masked from ?package:data.table?:

    last

Loading required package: bit64
Loading required package: gdata
gdata: read.xls support for 'XLS' (Excel 97-2004) files
gdata: ENABLED.

gdata: read.xls support for 'XLSX' (Excel 2007+) files
gdata: ENABLED.

Attaching package: ?gdata?

The following object is masked from ?package:stats?:

    nobs

The following object is masked from ?package:utils?:

    object.size

Test 167.2 not run. If required call library(hexbin) first.
Don't know how to automatically pick scale for object of type ITime.
Defaulting to continuous
Don't know how to automatically pick scale for object of type ITime.
Defaulting to continuous
Tests 487 and 488 not run. If required call library(reshape) first.
Tests 897-899 not run. If required call library(bit64) first.
All 1220 tests in inst/tests/tests.Rraw completed ok in 22.115sec on Thu Feb
27 15:19:35 2014

library(data.table)
set.seed(1)

> dt <- data.table(id=rep(1:4, each=3),
+                  var1 = rep(letters[1:3], 4),
+                  var2 = rnorm(12),
+                  key="i ..." ... [TRUNCATED] 

> dt
    id var1       var2
 1:  1    a -0.6264538
 2:  1    b  0.1836433
 3:  1    c -0.8356286
 4:  2    a  1.5952808
 5:  2    b  0.3295078
 6:  2    c -0.8204684
 7:  3    a  0.4874291
 8:  3    b  0.7383247
 9:  3    c  0.5757814
10:  4    a -0.3053884
11:  4    b  1.5117812
12:  4    c  0.3898432

> key(dt)
[1] "id"   "var1"

> dt[.(unique(id)), list(var1, var2)]
    id var1       var2
 1:  1    a -0.6264538
 2:  1    a  0.1836433
 3:  1    a -0.8356286
 4:  2    a  1.5952808
 5:  2    a  0.3295078
 6:  2    a -0.8204684
 7:  3    a  0.4874291
 8:  3    a  0.7383247
 9:  3    a  0.5757814
10:  4    a -0.3053884
11:  4    a  1.5117812
12:  4    a  0.3898432


--
View this message in context: http://r.789695.n4.nabble.com/Possible-bug-in-1-9-x-versions-tp4685930p4685932.html
Sent from the datatable-help mailing list archive at Nabble.com.

From mdowle at mdowle.plus.com  Thu Feb 27 16:49:12 2014
From: mdowle at mdowle.plus.com (Matt Dowle)
Date: Thu, 27 Feb 2014 15:49:12 +0000
Subject: [datatable-help] Possible bug in 1.9.x versions
In-Reply-To: <1393514787756-4685932.post@n4.nabble.com>
References: <CAP=9f+rmd3=Znha+pRQU0S7k=XpxaPCw6P=ZmUtDFo2crKEOqg@mail.gmail.com>
 <530F5656.2030801@mdowle.plus.com> <1393514787756-4685932.post@n4.nabble.com>
Message-ID: <530F5E78.3070906@mdowle.plus.com>


Thanks.  Yes, I see the same.  1,220 tests plus tests from 37 dependent 
packages still isn't enough is it.  Sigh.  Will fix.

Matt


On 27/02/14 15:26, Michele wrote:
> Hi, thanks for the quick response. Still nothing though. Using the .zip form
> r-forge, at http://datatable.r-forge.r-project.org/data.table_1.9.2.zip,
> doesn't give me errors like `object 'Cforder' not found
> `, but the below join is still incorrect.
>
>> remove.packages("data.table")
> Removing package from ?C:/Program Files/R/R-3.0.2/library?
> (as ?lib? is unspecified)
>> install.packages("C:/Users/MCarrie/Downloads/data.table_1.9.2.zip", repos
>> = NULL)
> Warning in install.packages :
>    package ?C:/Users/MCarrie/Downloads/data.table_1.9.2.zip? is not available
> (for R version 3.0.2)
> package ?data.table? successfully unpacked and MD5 sums checked
>> require(data.table)
> Loading required package: data.table
> data.table 1.9.2  For help type: help("data.table")
> Warning message:
> package ?data.table? was built under R version 3.1.0
>> test.data.table()
> Running C:/Program Files/R/R-3.0.2/library/data.table/tests/tests.Rraw
> Loading required package: reshape2
> Loading required package: reshape
> Loading required package: plyr
> Loading required package: ggplot2
> Loading required package: hexbin
> Loading required package: nlme
> Loading required package: xts
> Loading required package: zoo
>
> Attaching package: ?zoo?
>
> The following objects are masked from ?package:base?:
>
>      as.Date, as.Date.numeric
>
>
> Attaching package: ?xts?
>
> The following object is masked from ?package:data.table?:
>
>      last
>
> Loading required package: bit64
> Loading required package: gdata
> gdata: read.xls support for 'XLS' (Excel 97-2004) files
> gdata: ENABLED.
>
> gdata: read.xls support for 'XLSX' (Excel 2007+) files
> gdata: ENABLED.
>
> Attaching package: ?gdata?
>
> The following object is masked from ?package:stats?:
>
>      nobs
>
> The following object is masked from ?package:utils?:
>
>      object.size
>
> Test 167.2 not run. If required call library(hexbin) first.
> Don't know how to automatically pick scale for object of type ITime.
> Defaulting to continuous
> Don't know how to automatically pick scale for object of type ITime.
> Defaulting to continuous
> Tests 487 and 488 not run. If required call library(reshape) first.
> Tests 897-899 not run. If required call library(bit64) first.
> All 1220 tests in inst/tests/tests.Rraw completed ok in 22.115sec on Thu Feb
> 27 15:19:35 2014
>
> library(data.table)
> set.seed(1)
>
>> dt <- data.table(id=rep(1:4, each=3),
> +                  var1 = rep(letters[1:3], 4),
> +                  var2 = rnorm(12),
> +                  key="i ..." ... [TRUNCATED]
>
>> dt
>      id var1       var2
>   1:  1    a -0.6264538
>   2:  1    b  0.1836433
>   3:  1    c -0.8356286
>   4:  2    a  1.5952808
>   5:  2    b  0.3295078
>   6:  2    c -0.8204684
>   7:  3    a  0.4874291
>   8:  3    b  0.7383247
>   9:  3    c  0.5757814
> 10:  4    a -0.3053884
> 11:  4    b  1.5117812
> 12:  4    c  0.3898432
>
>> key(dt)
> [1] "id"   "var1"
>
>> dt[.(unique(id)), list(var1, var2)]
>      id var1       var2
>   1:  1    a -0.6264538
>   2:  1    a  0.1836433
>   3:  1    a -0.8356286
>   4:  2    a  1.5952808
>   5:  2    a  0.3295078
>   6:  2    a -0.8204684
>   7:  3    a  0.4874291
>   8:  3    a  0.7383247
>   9:  3    a  0.5757814
> 10:  4    a -0.3053884
> 11:  4    a  1.5117812
> 12:  4    a  0.3898432
>
>
>
> --
> View this message in context: http://r.789695.n4.nabble.com/Possible-bug-in-1-9-x-versions-tp4685930p4685932.html
> Sent from the datatable-help mailing list archive at Nabble.com.
> _______________________________________________
> datatable-help mailing list
> datatable-help at lists.r-forge.r-project.org
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help


From carrieromichele at gmail.com  Thu Feb 27 16:55:55 2014
From: carrieromichele at gmail.com (Michele)
Date: Thu, 27 Feb 2014 07:55:55 -0800 (PST)
Subject: [datatable-help] Possible bug in 1.9.x versions
In-Reply-To: <530F5E78.3070906@mdowle.plus.com>
References: <CAP=9f+rmd3=Znha+pRQU0S7k=XpxaPCw6P=ZmUtDFo2crKEOqg@mail.gmail.com>
 <530F5656.2030801@mdowle.plus.com> <1393514787756-4685932.post@n4.nabble.com>
 <530F5E78.3070906@mdowle.plus.com>
Message-ID: <1393516555862-4685944.post@n4.nabble.com>

:-) it happens to best ones as well! May I ask if you are using R-devel for
your development? Just because when loading the .zip version, R says:

> Warning message: 
> package ?data.table? was built under R version 3.1.0 

aren't we at 3.0.2?


--
View this message in context: http://r.789695.n4.nabble.com/Possible-bug-in-1-9-x-versions-tp4685930p4685944.html
Sent from the datatable-help mailing list archive at Nabble.com.

From mdowle at mdowle.plus.com  Thu Feb 27 17:11:48 2014
From: mdowle at mdowle.plus.com (Matt Dowle)
Date: Thu, 27 Feb 2014 16:11:48 +0000
Subject: [datatable-help] Possible bug in 1.9.x versions
In-Reply-To: <1393516555862-4685944.post@n4.nabble.com>
References: <CAP=9f+rmd3=Znha+pRQU0S7k=XpxaPCw6P=ZmUtDFo2crKEOqg@mail.gmail.com>
 <530F5656.2030801@mdowle.plus.com> <1393514787756-4685932.post@n4.nabble.com>
 <530F5E78.3070906@mdowle.plus.com> <1393516555862-4685944.post@n4.nabble.com>
Message-ID: <530F63C4.3020702@mdowle.plus.com>


On 27/02/14 15:55, Michele wrote:
> :-) it happens to best ones as well! May I ask if you are using R-devel for
> your development?
Not only Rdevel, but Rdevel compiled with ASAN, 2.14.0, 3.0.2, and some 
3.0.3beta as well for good measure.

Here was the CRAN submission covering email :

====
I have rerun R CMD check on :
    Stated dependency (R 2.14.0)
    Winbuilder Rdevel.
    Rdevel r65060 with ASAN
    Causata, gems and treemap (both Rdevel and 3.0.2 but not 3.0.3beta)
    devtools::revdep() on R 3.0.2  (43 packages and all their dependencies)
====

But yes that Windows .zip came from Winbuilder R-devel.  Ok, so it needs 
to be the R-release option on Winbuilder that goes on the homepage. Will 
remember that when the patch is ready. Thanks.

Matt

> Just because when loading the .zip version, R says:
>
>> Warning message:
>> package ?data.table? was built under R version 3.1.0
> aren't we at 3.0.2?
>
>
>
> --
> View this message in context: http://r.789695.n4.nabble.com/Possible-bug-in-1-9-x-versions-tp4685930p4685944.html
> Sent from the datatable-help mailing list archive at Nabble.com.
> _______________________________________________
> datatable-help mailing list
> datatable-help at lists.r-forge.r-project.org
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help


From mdowle at mdowle.plus.com  Fri Feb 28 15:52:16 2014
From: mdowle at mdowle.plus.com (Matt Dowle)
Date: Fri, 28 Feb 2014 14:52:16 +0000
Subject: [datatable-help] Possible bug in 1.9.x versions
In-Reply-To: <1393516555862-4685944.post@n4.nabble.com>
References: <CAP=9f+rmd3=Znha+pRQU0S7k=XpxaPCw6P=ZmUtDFo2crKEOqg@mail.gmail.com>
 <530F5656.2030801@mdowle.plus.com> <1393514787756-4685932.post@n4.nabble.com>
 <530F5E78.3070906@mdowle.plus.com> <1393516555862-4685944.post@n4.nabble.com>
Message-ID: <5310A2A0.1050407@mdowle.plus.com>


Now fixed and new Windows .zip for R-release uploaded to data.table 
homepage.
We'll aim to release to CRAN fairly soon.

Thanks again Michele.

 From NEWS :

o  When joining to fewer columns than the key has, using one of the 
later key columns explicitly in j repeated the first value. A problem 
introduced by v1.9.2 and not caught by our 1,220 tests, or tests in 37 
dependent packages. Test added. Many thanks to Michele Carrier for 
reporting.
     DT = data.table(a=1:2, b=letters[1:6], key="a,b")    # keyed by a and b
     DT[.(1), b]    # correct result again (joining just to a not b but 
using b)

Matt