[datatable-help] Skipping some Vi names

Joseph Voelkel jgvcqa at rit.edu
Fri Jul 15 16:06:14 CEST 2011


Thanks.

1. Where the quotes came from... No good reason. As I said, I hadn't used data.table in a while, and was a bit unsure of the syntax. I know the syntax is all rational, but for example, I first tried to use by= via something like by=names(oldDataFrame)[1:3] , which was, say c("x1","x2","x3"), but then found I needed the syntax "x1,x2,x3" (which to me does not seem very R-like, because I can't use a simple R function like names() in by= to transfer the information). So I used a similar (?) syntax for the j term. No, your documentation has it done correctly. Sorry about that.
2. Thanks for mentioning .SD. I have wanted to use this several times, but in every case I have other variables in the data table as well. Is there a natural way to use something like dt[,lapply(.SD, sum), by="x,y"], if for example, dt's variables are x, y, A1, A2, A3, B1, B2, B3, B4, but I only want to sum over the Ai's? (Imagine a case where there are 40 Ai's and 40 Bi's, for example.)

Joe

-----Original Message-----
From: Matthew Dowle [mailto:mdowlenoreply at virginmedia.com] On Behalf Of Matthew Dowle
Sent: Friday, July 15, 2011 4:04 AM
To: Joseph Voelkel
Cc: datatable-help at lists.r-forge.r-project.org
Subject: Re: [datatable-help] Skipping some Vi names

Hi,

Yes, no quotes please. Also interested how you arrived at using quotes;
anything misleading in documentation please shout.

I've called it a bug though, fixed it, and added a test, so it can't
trip up users again. It was deparsing and reparsing j, which it didn't
need to do.

So in future this will happen :

> DT[,list("sum(a),sum(b)")]
                V1
[1,] sum(a),sum(b)

or, if DT contained 3 groups, this :

> DT[,list("sum(a),sum(b)"),by=a]
     a            V1
[1,] 1 sum(a),sum(b)
[2,] 2 sum(a),sum(b)
[3,] 3 sum(a),sum(b)
> 

Matthew


On Thu, 2011-07-14 at 18:14 -0400, Steve Lianoglou wrote:
> Hi Joseph,
> 
> On Thu, Jul 14, 2011 at 4:43 PM, Joseph Voelkel <jgvcqa at rit.edu> wrote:
> > Continuing with the example below, here is another problem: names() returns
> > NA’s for all Vi names except V1.
> >
> >> (dt2<-dt[,list("sum(A1),sum(A2),sum(A3)"),by="x,y"])
> >
> >      x y V1 V4 V5
> > [1,] 1 1  1  7 13
> > [2,] 1 2  4 10 16
> > [3,] 2 1  2  8 14
> > [4,] 2 3  5 11 17
> > [5,] 3 2  3  9 15
> > [6,] 3 3  6 12 18
> >
> >> names(dt2)
> >
> > [1] "x"  "y"  "V1" NA   NA
> 
> I'm curious why you put your expression in quotes? Did you see that in
> the manual somewhere?
> 
> Not doing that fixes your problems:
> 
> R> (dt2<-dt[,list(sum(A1),sum(A2),sum(A3)),by="x,y"])
>      x y V1 V2 V3
> [1,] 1 1  1  7 13
> [2,] 1 2  4 10 16
> [3,] 2 1  2  8 14
> [4,] 2 3  5 11 17
> [5,] 3 2  3  9 15
> [6,] 3 3  6 12 18
> 
> and
> 
> R> names(dt2)
> [1] "x"  "y"  "V1" "V2" "V3"
> 
> All columns are named, and there are no "gaps" in the colnames ...
> 
> I guess Matthew can comment on why this happens, but just use normal
> expressions/blocks for your `j` expression in the meantime :-)
> 
> If it's because you have a lot of column names, note that you can also do:
> 
> R> dt[,lapply(.SD, sum), by="x,y"]
> dt[,lapply(.SD, sum), by="x,y"]
>      x y A1 A2 A3
> [1,] 1 1  1  7 13
> [2,] 1 2  4 10 16
> [3,] 2 1  2  8 14
> [4,] 2 3  5 11 17
> [5,] 3 2  3  9 15
> [6,] 3 3  6 12 18
> 
> Note also that the column names are different here, too.
> 
> HTH,
> -steve
> 




More information about the datatable-help mailing list