[datatable-help] Not getting behavior described in FAQ

Short, Tom TShort at epri.com
Fri May 7 18:59:29 CEST 2010


> -----Original Message-----
> From: datatable-help-bounces at lists.r-forge.r-project.org 
> [mailto:datatable-help-bounces at lists.r-forge.r-project.org] 
> On Behalf Of Harish
> Sent: Friday, May 07, 2010 11:08 AM
> To: datatable-help at lists.r-forge.r-project.org
> Subject: [datatable-help] Not getting behavior described in FAQ
> 
> Hi,
> 
> I recently ran across data.table and am experimenting with 
> it.  I have a few questions that I am hoping to get your help with.
> 
> *** Question #1 ***
> I like the concept of data.tables quite a bit and would like 
> to use it as much as possible.  However, some other packages 
> I want to use (e.g. ggplot2) need data frames as inputs.
> 
> Would I take a huge performance hit if I convert to data 
> frames intermittently and use data.tables as much as 
> possible?  Or should I use data.tables only when I have 
> significant querying to do?

Converting back and forth between data tables and data frames is
efficient. Just use:

as.data.table(df) or as.data.frame(dt)

This still involves a copy though. For very large data
table/frames, you can manually set the class directly to avoid a
copy. 

class(dd) <- "data.frame"
attr(dd,"row.names") <- 1:nrow(dd) # needed to get row names

or the other way:

class(dd) <- "data.table"
 
> *** Question #2 ***
> For some reason, FAQ 1.1 and 1.2 do not work for me as described.
> 
> I have version 1.4 of the data.table package (downloaded from 
> r-forge today).
> 
> Example:
>    a <- as.data.table(installed.packages())
>    a[,5]
>    a[,"Version"]
> 
> ==> a[,5] does not return 5 as indicated in the FAQ 1.1, but 
> returns the 5th column.
> ==> a[,"Version"] does not return "Version" as indicated in 
> the FAQ 1.2, but returns the "Version" column.
> 
> What is the bug -- the FAQ or the behavior of the code?

I think it's a bug in the FAQ. This was only recently
changed. For the cases where j is a single number or character
string, it's the same as using "with=FALSE". So,

a[,5] returns a data table consisting of the fifth column, but
a[,5 + 0] returns the number five, and a[,5:6] returns 5:6.  

a[,"Version"] returns a data table consisting of the "Version"
column, but a[,c("Version")] and a[,c("Version","Package")]
return character strings.


> *** Question #3 ***
> 
> FAQ 1.10 says "although it appears as though x[y] does not 
> return the columns in y, you can actually use the columns 
> from y in the j expression."  However, I don't get this behavior.
> 
> Example:
>    x <- data.table( a=rep(letters[1:4], each=3), 
> b=rep(letters[10:12], times=4), v=1:12, key='a,b')
> 
>    y <- data.table( a=letters[1:3], b=letters[10:12], 
> col3=1:3, key='a,b' )
> 
> Then, I expect one of the following to give me col3 from y...
>    x[y,"col3", with=FALSE]
>    x[y, list(col3)]
> ...but I don't.
> 
> What am I doing wrong?

In the move to Matthew's new code to speed up grouping, I think
this feature got removed. I think we'd like to re-introduce it,
but I'm not sure about the feasibility or timetable for that.

For now, if you want to mix columns like that, you have to use
merge and then possibly index further on the merged data.

z <- merge(x,y)

     a b v col3
[1,] a j 1    1
[2,] b k 5    2
[3,] c l 9    3

Thanks for the input, Harish.

- Tom


More information about the datatable-help mailing list