[datatable-help] subsetting by second key

Arunkumar Srinivasan aragorn168b at gmail.com
Sun Jun 15 18:06:34 CEST 2014


Note that `CJ` by default sorts the columns and sets key to all the columns, which means the result would be sorted as well. If that's not desirable, you should be using `CJ` with `sorted=FALSE`.

Arun

From: Arunkumar Srinivasan aragorn168b at gmail.com
Reply: Arunkumar Srinivasan aragorn168b at gmail.com
Date: June 15, 2014 at 6:04:59 PM
To: G See gsee000 at gmail.com
Cc: datatable-help at lists.r-forge.r-project.org datatable-help at lists.r-forge.r-project.org
Subject:  Re: [datatable-help] subsetting by second key  

Sure, you can update it. No, there's no advantage. I just dint think of CJ at the time (probably because I tried it with J and it worked, because it's just 1 value for the 2nd key col).

Arun

From: G See gsee000 at gmail.com
Reply: G See gsee000 at gmail.com
Date: June 15, 2014 at 6:03:13 PM
To: Arunkumar Srinivasan aragorn168b at gmail.com
Cc: datatable-help at lists.r-forge.r-project.org datatable-help at lists.r-forge.r-project.org
Subject:  Re: [datatable-help] subsetting by second key

Thank you Arun. Should that answer be updated to use CJ(.), then? Is
there an advantage to using J(.) over CJ(.) if you know that you're
only looking for one value in the second column?

On Sun, Jun 15, 2014 at 10:56 AM, Arunkumar Srinivasan
<aragorn168b at gmail.com> wrote:
> unique(Species) is of length 3, where as the 2nd entry c(1.5, 2) is of
> length 2.
>
> J in J(.) is replaced with list(.) internally (using lazy evaluation),
> following which it’s converted to a data.table using as.data.table(list(.)).
>
> And here your list is:
>
> list(c("setosa", "versicolor", "virginica") , c(1.5, 2.0)) which results in
> the warning because it has to recycle to convert it to a data.table.
>
> In the example you’ve linked, J(.) and CJ(.) will return the same result
> (because there’s just one value in 2nd column). So, the results don’t
> change. But the general expression is to use CJ(.) along with nomatch=0L, as
> you’ve done.
>
> Those two expressions are equivalent, yes.
>
>
> Arun
>
> From: G See gsee000 at gmail.com
> Reply: G See gsee000 at gmail.com
> Date: June 15, 2014 at 5:45:11 PM
> To: datatable-help at lists.r-forge.r-project.org
> datatable-help at lists.r-forge.r-project.org
> Subject: [datatable-help] subsetting by second key
>
> Hi,
>
> I want to subset a data.table using only its second key, which is
> demonstrated here
> http://stackoverflow.com/questions/15597685/subsetting-data-table-by-2nd-column-only-of-a-2-column-key-using-binary-search/15597713#15597713
>
> However, I need to subset with more than one value in the secondary key
>
> Is this warning expected? What exactly is it telling me?
>
> library(data.table)
> DT <- data.table(iris, key="Species,Petal.Width")
> DT[J(unique(Species), c(1.5, 2.0)), nomatch=0L]
> # Sepal.Length Sepal.Width Petal.Length Petal.Width Species
> #1: 6.0 2.2 5.0 1.5 virginica
> #2: 6.3 2.8 5.1 1.5 virginica
> #Warning message:
> #In as.data.table.list(i) :
> # Item 2 is of size 2 but maximum size is 3 (recycled leaving a
> remainder of 1 items)
>
>
> It looks like I can get what I want with either of these; can you
> confirm that both of these will always return the same result?
>
> DT[Petal.Width %in% c(1.5, 2.0)] # vector scan
> DT[CJ(unique(Species), c(1.5, 2.0)), nomatch=0L]
>
>
> Thanks,
> Garrett
> _______________________________________________
> datatable-help mailing list
> datatable-help at lists.r-forge.r-project.org
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/datatable-help/attachments/20140615/a8ab3adf/attachment.html>


More information about the datatable-help mailing list