[datatable-help] BUG: droplevels mangles subsetted data.table

Steve Lianoglou mailinglist.honeypot at gmail.com
Tue Feb 21 23:52:40 CET 2012


Hi,

I guess I'm missing something, but ... why isn't your proposed
droplevels.data.table consistent with base? Because the ordering of
the rows might change (maybe(?))?

-steve

On Tue, Feb 21, 2012 at 4:42 PM, Matthew Dowle <mdowle at mdowle.plus.com> wrote:
>
> Yes, could do. Building on that here's a quick stab at
> droplevels.data.table. This does it by reference, or it could take a
> copy(). If it takes a copy() it would be consistent with base (probably
> required), but then how best to make a non-copying version available?
>
> droplevels.data.table = function(dt) {
>    oldkey = key( dt )
>    for (i in names(dt)) {
>        if (is.factor(dt[[i]])) dt[,i:=droplevels(dt[[i]]),with=FALSE]
>    }
>    setkeyv( dt, oldkey )
>    dt
> }
>
> On Tue, 2012-02-21 at 15:38 -0500, Prasad Chalasani wrote:
>> Meanwhile as a work-around, I suppose one should do:
>>
>> keys <- key( dt ) # this could in general be a large set of keys
>> sub_d <- droplevels( as.data.frame( dt[ name != 'a' ] ) )
>> sub_dt <- data.table( sub_d )
>> setkeyv( sub_dt, keys )
>>
>>
>>
>> On Feb 21, 2012, at 1:59 PM, Matthew Dowle wrote:
>>
>> >
>> > I see the problem too but (just) adding droplevels.data.table might miss
>> > the root cause.
>> >
>> >> because the way the
>> >> droplevels.data.frame method works isn't compatible with data.table
>> >> indexing.
>> >
>> > But it's intended to be. I can see the switch at the top of [.data.table
>> > is detecting the caller isn't data.table aware, and it is then dispatching
>> > to `[.data.frame` but why it then isn't working I'm not sure. Something to
>> > do with the missing j or missing drop not being passed through correctly,
>> > perhaps.
>> >
>> > I have heard it said (once or twice) that data.table is "almost"
>> > compatible with non-data.table-aware packages, but never had an example
>> > before. I wonder if this is it!
>> >
>> > A (fast) droplevels.data.table using := would be good anyway, though.
>> >
>> > Matthew
>> >
>> >
>> >
>> >> Hi,
>> >>
>> >> I see what the problem is -- we need to provide a
>> >> droplevels.data.table S3 method, because the way the
>> >> droplevels.data.frame method works isn't compatible with data.table
>> >> indexing.
>> >>
>> >> Will fix:
>> >>
>> >> https://r-forge.r-project.org/tracker/index.php?func=detail&aid=1841&group_id=240&atid=975
>> >>
>> >> Thanks for raising the flag.
>> >>
>> >> Cheers,
>> >> -steve
>> >>
>> >> On Tue, Feb 21, 2012 at 12:38 PM, pchalasani <pchalasani at gmail.com> wrote:
>> >>>  Surprising that this wasn't noticed before, or perhaps I'm not
>> >>> following
>> >>> some recommended idiom to drop levels when using  data.table. The
>> >>> following
>> >>> code illustrates the bug clearly: The bug remains regardless of whether
>> >>> I
>> >>> use "subset" or simply use dt1 = dt[ name != 'a' ].
>> >>>
>> >>>
>> >>>
>> >>>    d <- data.table(name = c('a','b','c'), value = 1:3)
>> >>>    dt <- data.table(d)
>> >>>    setkey(dt,'name')
>> >>>    dt1 <- subset(dt,name != 'a')  # or dt1 <- dt[ name != 'a' ]
>> >>>    > dt1
>> >>>          name value
>> >>>     [1,]    b     2
>> >>>     [2,]    c     3
>> >>>
>> >>>    > droplevels(dt1)
>> >>>          name value
>> >>>     [1,]    b     1
>> >>>     [2,]    c     3
>> >>>
>> >>>
>> >>>
>> >>> --
>> >>> View this message in context:
>> >>> http://r.789695.n4.nabble.com/BUG-droplevels-mangles-subsetted-data-table-tp4407694p4407694.html
>> >>> Sent from the datatable-help mailing list archive at Nabble.com.
>> >>> _______________________________________________
>> >>> datatable-help mailing list
>> >>> datatable-help at lists.r-forge.r-project.org
>> >>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
>> >>
>> >>
>> >>
>> >> --
>> >> Steve Lianoglou
>> >> Graduate Student: Computational Systems Biology
>> >>  | Memorial Sloan-Kettering Cancer Center
>> >>  | Weill Medical College of Cornell University
>> >> Contact Info: http://cbio.mskcc.org/~lianos/contact
>> >> _______________________________________________
>> >> datatable-help mailing list
>> >> datatable-help at lists.r-forge.r-project.org
>> >> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
>> >>
>> >
>> >
>>
>
>



-- 
Steve Lianoglou
Graduate Student: Computational Systems Biology
 | Memorial Sloan-Kettering Cancer Center
 | Weill Medical College of Cornell University
Contact Info: http://cbio.mskcc.org/~lianos/contact


More information about the datatable-help mailing list