[datatable-help] BUG: droplevels mangles subsetted data.table

Matthew Dowle mdowle at mdowle.plus.com
Tue Feb 21 22:42:27 CET 2012


Yes, could do. Building on that here's a quick stab at
droplevels.data.table. This does it by reference, or it could take a
copy(). If it takes a copy() it would be consistent with base (probably
required), but then how best to make a non-copying version available?

droplevels.data.table = function(dt) {
    oldkey = key( dt )
    for (i in names(dt)) {
        if (is.factor(dt[[i]])) dt[,i:=droplevels(dt[[i]]),with=FALSE]
    }
    setkeyv( dt, oldkey )
    dt
}

On Tue, 2012-02-21 at 15:38 -0500, Prasad Chalasani wrote:
> Meanwhile as a work-around, I suppose one should do:
> 
> keys <- key( dt ) # this could in general be a large set of keys
> sub_d <- droplevels( as.data.frame( dt[ name != 'a' ] ) )
> sub_dt <- data.table( sub_d )
> setkeyv( sub_dt, keys )
> 
> 
> 
> On Feb 21, 2012, at 1:59 PM, Matthew Dowle wrote:
> 
> > 
> > I see the problem too but (just) adding droplevels.data.table might miss
> > the root cause.
> > 
> >> because the way the
> >> droplevels.data.frame method works isn't compatible with data.table
> >> indexing.
> > 
> > But it's intended to be. I can see the switch at the top of [.data.table
> > is detecting the caller isn't data.table aware, and it is then dispatching
> > to `[.data.frame` but why it then isn't working I'm not sure. Something to
> > do with the missing j or missing drop not being passed through correctly,
> > perhaps.
> > 
> > I have heard it said (once or twice) that data.table is "almost"
> > compatible with non-data.table-aware packages, but never had an example
> > before. I wonder if this is it!
> > 
> > A (fast) droplevels.data.table using := would be good anyway, though.
> > 
> > Matthew
> > 
> > 
> > 
> >> Hi,
> >> 
> >> I see what the problem is -- we need to provide a
> >> droplevels.data.table S3 method, because the way the
> >> droplevels.data.frame method works isn't compatible with data.table
> >> indexing.
> >> 
> >> Will fix:
> >> 
> >> https://r-forge.r-project.org/tracker/index.php?func=detail&aid=1841&group_id=240&atid=975
> >> 
> >> Thanks for raising the flag.
> >> 
> >> Cheers,
> >> -steve
> >> 
> >> On Tue, Feb 21, 2012 at 12:38 PM, pchalasani <pchalasani at gmail.com> wrote:
> >>>  Surprising that this wasn't noticed before, or perhaps I'm not
> >>> following
> >>> some recommended idiom to drop levels when using  data.table. The
> >>> following
> >>> code illustrates the bug clearly: The bug remains regardless of whether
> >>> I
> >>> use "subset" or simply use dt1 = dt[ name != 'a' ].
> >>> 
> >>> 
> >>> 
> >>>    d <- data.table(name = c('a','b','c'), value = 1:3)
> >>>    dt <- data.table(d)
> >>>    setkey(dt,'name')
> >>>    dt1 <- subset(dt,name != 'a')  # or dt1 <- dt[ name != 'a' ]
> >>>    > dt1
> >>>          name value
> >>>     [1,]    b     2
> >>>     [2,]    c     3
> >>> 
> >>>    > droplevels(dt1)
> >>>          name value
> >>>     [1,]    b     1
> >>>     [2,]    c     3
> >>> 
> >>> 
> >>> 
> >>> --
> >>> View this message in context:
> >>> http://r.789695.n4.nabble.com/BUG-droplevels-mangles-subsetted-data-table-tp4407694p4407694.html
> >>> Sent from the datatable-help mailing list archive at Nabble.com.
> >>> _______________________________________________
> >>> datatable-help mailing list
> >>> datatable-help at lists.r-forge.r-project.org
> >>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
> >> 
> >> 
> >> 
> >> --
> >> Steve Lianoglou
> >> Graduate Student: Computational Systems Biology
> >>  | Memorial Sloan-Kettering Cancer Center
> >>  | Weill Medical College of Cornell University
> >> Contact Info: http://cbio.mskcc.org/~lianos/contact
> >> _______________________________________________
> >> datatable-help mailing list
> >> datatable-help at lists.r-forge.r-project.org
> >> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
> >> 
> > 
> > 
> 




More information about the datatable-help mailing list