[datatable-help] Subsetting columns in data.table

Matthew Dowle mdowle at mdowle.plus.com
Sat Nov 17 02:08:33 CET 2012


Hi Berto,

I think we may be experiencing a language barrier here. This is 
datatable-help; i.e.,
not r-help. You *can* write in another language on this list, if you'd 
like to, in
case someone else here understands better. The rules are less strict 
here. Nobody has yet
done so, but there is no rule against it. Why not?

Proceeding in English for now ...

I like the lack of spaces, but what do the * mean?  In other words, 
you've presented
a line of code :
     DT[y>=3&*v<=7&w<=7*,sum(y), by=x]
but that doesn't actually evaluate to anything, does it? So that's 
pseudo-code. I don't even need to copy and paste that into R to know 
it's invalid.  That cannot possible give the expected result, because of 
the "*" characters.

Might you be looking for something like :

     sapply(.SD, `<`, 7)

?   Dunno. Guessing.

But, focussing on this part of your email :

> But if the number of columns grows, I can't specify all columns 
> anymore,
> maybe should I use column names?

You actually do, really, honestly, need to show us, physically, in 
email, what you mean. Columns
of what? Growing how? Show us 2,3,4,5 columns. Show us the manual way. 
Show us the input and
show us the output.

Your email can be very long. It can contain very little English.  But 
you
actually need to show what the output is you would like, for me (at 
least)
to understand.

What I am certain of is that whatever you want to do is possible. And 
if it isn't, then
we will likely enhance data.table to do it.

Matthew


On 16.11.2012 18:32, Berto wrote:
> Hi Matthew, thanks for the quick reply.
>
> I want to find all the rows that are above a threshold for one column 
> (e.g.
> y>=3) and below another threshold for all the rest (e.g. v<=7&w<=7, 
> for a
> threshold<7).
>
> Once I have this subsetting, I'd like to use the sum function (e.g. 
> sum(y),
> by=x).
>
> I know how to do it for a low number of columns, specifiying all 
> columns
> names<threshold:
>
> DT[y>=3&*v<=7&w<=7*,sum(y), by=x]
>
> which gives the expected result:
>
>     x V1
> 1: a  9
> 2: b  3
>
> But if the number of columns grows, I can't specify all columns 
> anymore,
> maybe should I use column names?
>
> cols <- cols[names(DT) %in% "y" == FALSE] #column names excluding the 
> one
> with higher threshold
>
> Hope to be clearer this time, otherwise please let me know!
>
>
>
>
> --
> View this message in context:
> 
> http://r.789695.n4.nabble.com/Subsetting-columns-in-data-table-tp4649736p4649779.html
> Sent from the datatable-help mailing list archive at Nabble.com.
> _______________________________________________
> datatable-help mailing list
> datatable-help at lists.r-forge.r-project.org
> 
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help


More information about the datatable-help mailing list