[datatable-help] Selecting rows that don't match

Matthew Dowle mdowle at mdowle.plus.com
Sun Aug 21 12:12:46 CEST 2011


Hi Ben,
Welcome to the list. Two answers inline ...
Matthew

On Fri, 2011-08-19 at 17:17 -0700, Ben Goldstein wrote:
> Sorry for the double post. But on the same topic, in more general
> sense, how does one select rows based off of any logical?
> 
> 
> For example:
> 
> 
> DF = data.frame(x=1:3,y=4:6,z=7:9)
> ds <- 1 
> new.DF <- DF[which(DF$x > ds),]
> 
DT[,which(x>ds)]

or

DT[x>ds,which=TRUE]

Note you don't need the 'DF$' in data.table. It's a little less typing,
and if you have many similar named objects then there's a little less
rooom for bugs due to typos.

> 
> Thanks,
> 
> 
> Ben
> 
> On Fri, Aug 19, 2011 at 4:57 PM, Ben Goldstein
> <ben.goldstein at gmail.com> wrote:
>         Hello All,
>         
>         
>         I just started using data.table and like some of the features.
>         I apologize if this is a basic question, but I can't find it
>         posted on the list. I want to select the rows that don't match
>         a key.
>         
>         
>         For example, in data.frame i'd do:
>         
>         
>         DF = data.frame(x=1:3,y=4:6,z=7:9)
>         ds <- c(1,2)
>         new.DF <- DF[which(! DF$x %in% ds),]
>         
>         
>         where, ds is a vector of values that I don't want matched. So,
>         new.DF would just consist of the third row.
>         
>         
>         How do I do something comparable with data.table?

> DT = as.data.table(DF)
> DT[!x%in%ds]
     x y z
[1,] 3 6 9
> 

But, this isn't a good example of 'rows not matching a key', because the
%in% is a vector scan, and so is '>'. The general form for a 'not join'
is:

w = DT[J(...),which=TRUE]  # fast binary search
DT[-w]

There is feature request to either add a 'not' argument to [.data.table
to make it one step,  or thinking about it now, the following would be
nicer :

DT[-J(...)]

Matthew


>         
>         
>         I have:
>         DT = data.table(DF)
>         setkey(DT,x)
>         and then I know I can select ds, by:
>         DT[ds]
>         but something like:
>         DT[!ds] obviously doesn't work...
>         
>         
>         Thanks,
>         
>         
>         Ben
> 
> 
> _______________________________________________
> datatable-help mailing list
> datatable-help at lists.r-forge.r-project.org
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help




More information about the datatable-help mailing list