[datatable-help] Using a data.table to perform a two-way lookup

Matthew Dowle mdowle at mdowle.plus.com
Thu Apr 14 22:06:12 CEST 2011


Oh and if the total size of your real table is a large part of RAM and
holding a copy is a struggle, then you can always do a 'manual'
secondary key. Basically create an indexing table storing the row
numbers to look up in the big table,  and set it's key to be the other
column. There are some past threads on that, search for "secondary".
Matthew

On Thu, 2011-04-14 at 21:00 +0100, Matthew Dowle wrote:
> Hi Karl,
> 
> > I’m not sure what you mean with ‘iterate bulk joins’. Or, I thought that
> > was what I was doing … :)
> 
> It was the [1] in this :
> 
>   y.current=as.character(y2x$y[which(!processed)[1]])
>                                                 ^^^
> that made me think it was joining one value at a time. By bulk join I
> meant doing all the unique(y) in one join, rather than one by one, if
> possible.
> 
> > > More to the point, why store the x integers as character? Can't they be
> > > kept as integers and the levels<- thing goes away.
> > 
> > It’s only in this simplified example that they’re numbers. In my real
> > dataset they’re character strings.
> Ah ok.  Makes sense now.  Tom's idea to drop the levels and put them
> back afterwards sounds like a good way to go then.
> 
> It's on the FR list to add secondary keys, btw. If we had that then you
> wouldn't need to copy the table just to give it a different key.
> 
> Matthew
> 
> 




More information about the datatable-help mailing list