[datatable-help] using J() to select for a value that is in something other than the first key

Joseph Voelkel jgvcqa at rit.edu
Tue Feb 7 19:57:14 CET 2012


Well, I don't know what's faster, but I'll have to agree with Farrel's philosophy. In addition, I KNOW that if I were to look at the code for $i 'trick' a few weeks later, I would have uneasy feelings about it. I would prefer that I can easily read my code-the less thought, the better.

Here are the two ways I would think about doing it. No idea of speed issues.

dt <- data.table(a=c('a','a','a','a','b','b','b','b'),
  b=c('a','b','a','b','a','b','b','a'),c=1:8,key=c('a','b'))

# method 1. Just include all possible values of the first key in J. To me, this is conceptually the simplest
dt[J(unique(a),"b")]

# method 2. Swap the keys, twice
setkey(dt,b,a)
dt<-dt[J("b")]
setkey(dt,a,b)


From: datatable-help-bounces at r-forge.wu-wien.ac.at [mailto:datatable-help-bounces at r-forge.wu-wien.ac.at] On Behalf Of Farrel Buchinsky
Sent: Thursday, January 19, 2012 8:59 PM
To: Steve Lianoglou
Cc: datatable-help at r-forge.wu-wien.ac.at
Subject: Re: [datatable-help] using J() to select for a value that is in something other than the first key

I do not know if that is how all indexes work. I am not really a card-carrying database manager or programmer. I just play one in my spare time. The price I pay is not remembering how to write syntax when I need to do something. That to  me, is a higher price than slow subsetting. If the syntax is not easy I would rather just use the traditional vector scan methods that one sees in conventional data.frame subset commands.

Notwithstanding my idiosyncratic needs, I thank you very much for your explanation.
Farrel



On Thu, Jan 19, 2012 at 18:26, Steve Lianoglou <mailinglist.honeypot at gmail.com<mailto:mailinglist.honeypot at gmail.com>> wrote:
On Thu, Jan 19, 2012 at 6:09 PM, Farrel Buchinsky <fjbuch at gmail.com<mailto:fjbuch at gmail.com>> wrote:
> Oy gevalt!.Am I correct to believe that the technique is rearranging the
> data.table so that J can accept the input as pertaining to a secondary key?
> That seems as if it is too much work for me and my computer. I will rather
> stick to the vector scan methods for now.
Not the entire data.table, just the key columns.

Depending on how many queries you're going to make against the 2nd key
only, the pay off for your troubles could be anywhere from zero to
mucho. Of course if you simply don't have the RAM to make the idx
data.table in the first place, then that's that.

That's how all indexes work though, no? In a database for instance, if
you have a compound key/index over two or more columns, the index will
only help queries that work any prefix (or whole) part of the key, and
not just any subset elements of it (as you want to do here), right?

HTH,
-steve

--
Steve Lianoglou
Graduate Student: Computational Systems Biology
 | Memorial Sloan-Kettering Cancer Center
 | Weill Medical College of Cornell University
Contact Info: http://cbio.mskcc.org/~lianos/contact

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/datatable-help/attachments/20120207/9aec3522/attachment.html>


More information about the datatable-help mailing list