[datatable-help] Using a data.table to perform a two-way lookup
Short, Tom
TShort at epri.com
Tue Apr 12 12:53:45 CEST 2011
> -----Original Message-----
> From: datatable-help-bounces at r-forge.wu-wien.ac.at
> [mailto:datatable-help-bounces at r-forge.wu-wien.ac.at] On
> Behalf Of Karl Ove Hufthammer
>
> I have found one way of achieving this, creating two
> identical data.tables with different keys:
>
> options(stringsAsFactors=FALSE)
> dat=data.frame(x=c("1","1","2","3"), y=c("a","b","a","c"))
> dat
> A <- B <- data.table(dat)
> key(A)="x"
> key(B)="y"
>
> A[B["a"][,x]][,y]
>
> The problem is performance (my real-life data.table is *much*
> larger), since B["a"][,x] outputs a character vector. When
> this is used in A[...], the character is converted to a factor
> with appropriate levels, and it turns out (shown using
> 'Rprof') that the majority of the time running the function
> is taken up by 'levels<-', i.e., creating this factor /
> attaching the levels.
>
> I believe one potential solution would be to have both 'x'
> and 'y' being factors, so that there is no conversion to/from
> characters. This would eliminate both the conversion '"a" to
> factor' and 'B["a"][,x] to factor'.
> However, 'data.table' doesn't accept 'i' being a factor (and
> if I convert it to the internal numeric codes, it thinks I
> mean row numbers).
>
> Any suggestions on how to solve this?
To answer part of your inquiry, you can use factors by enclosing i with
J() as follows:
options(stringsAsFactors=TRUE)
dat=data.frame(x=c("1","1","2","3"), y=c("a","b","a","c"))
A <- B <- data.table(dat)
key(A)="x"
key(B)="y"
A[J(B["a"][,x])][,y]
- Tom
More information about the datatable-help
mailing list