[datatable-help] Using list valued columns with by (Matthew Dowle)

Wed Feb 22 12:31:13 CET 2012

Hi,

That's passing 'b' to f() which happens to be the name of the whole data
set (there isn't a column 'b').  Try d[,f(.SD), by=x].

Thanks for the kind words. If you haven't already done so, please do vote
for the package on Crantastic (the 'I use it' button, and, also, the vote
button). That may help others when they consider data.table for the first
time. The more users, the more feedback and the more edge cases we catch,
hopefully. Same goes for other packages.

    http://crantastic.org/packages/data-table

Matthew

> I wanted to follow up on this as I am trying to do something similar
> to what Chris asked about.  but first, let me say thanks for the work
> on this.  I have several different situations where a call to ddply
> takes about 10 minutes but only ~1second with data.table.  So I'm very
> thankful for the package, but I'm still very much a novice with it.
> Here's the present problem.
>
> here's the toy data again:
> d<-  data.table(x=rep(1:2,each=10), y=rnorm(20), key="x")
>
>> dim(d)
> [1] 20  2
>
> I would like to generate a column of fitted values from lm that I'll
> later cbind to the original data.
>
> f<- function(d) list(pred = fitted(lm(y ~ x,d)))
>
> p<- d[,f(d), by = x]
>
>> dim(p)
> [1] 40  2
>
> for reasons I don't understand, this generates 2 sets of (correct)
> "pred" values, but the "x" values are wrong.  Why does this generate
> two duplicate sets?  I should say that the real data has ~2 million
> rows and the call will be something closer to: p<- d[,f(d), by =
> list(X1, X2, X3, X4)].
>
> Matthew
>
>
>
>
>
>
>> or functional form :
>>
>> f <- function(y) list(a=mean(y), b=list(rep(y[1],3)) )
>> data[, f(y), by=x]
>>     x           a                                  b
>> [1,] 1 -0.07760762 -0.1715334, -0.1715334, -0.1715334
>> [2,] 2 0.36923570          1.01892, 1.01892, 1.01892
>>
>
>
>
>
>
>>> data <- data.table(x=rep(1:2,each=10), y=rnorm(20), key="x")
>>>
>>> f <- function(y) {
>>>   return( list(a=mean(y), b=rep(y[1],10) )
>>> }
>>>
>>> result <- data[, list(f(y)), by=x]
>>>
>>>
>>> What winds up happening is that result winds up having V1 alternate
>>> between f(y)$a and f(y)$b, resulting in 4 rows, 2 for each value of x.
>>> What I want instead is result to have 2 rows,  with V1 being the list
>>> that gets returned from f(y).
>>>
>>> I have found that this works:
>>>
>>> result <- data[, list(list(f(y))), by=x]
>>>
>>> But then I have to do:
>>>
>>> result[J(1),][,V1][[1]]
>>>
>>> to get the same thing I would get from f(result[J(1),][,V1]).  I want
>>> to lose the [[1]] but I can't seem to see how I would do so.  Really
>>> what I would envision is like with sapply, I want to do
>>>
>>>
>>> result <- data[, f(y), by=x, simplify=FALSE]
>>>
>>> But of course simplify isn't an argument for data.table. Thoughts?
>>>
>>> -Chris
>>> _______________________________________________
>>> datatable-help mailing list
>>> datatable-help at lists.r-forge.r-project.org
>>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
>>
>>
>>
>>
>> ------------------------------
>>
>> Message: 2
>> Date: Tue, 21 Feb 2012 17:52:40 -0500
>> From: Steve Lianoglou <mailinglist.honeypot at gmail.com>
>> To: mdowle at mdowle.plus.com
>> Cc: datatable-help at r-forge.wu-wien.ac.at, Prasad Chalasani
>>        <pchalasani at gmail.com>
>> Subject: Re: [datatable-help] BUG: droplevels mangles subsetted
>>        data.table
>> Message-ID:
>>      
>>  <CAHA9McNBLauiNK9FjR2JNOw10r8VFHyrFVU0upEa4qJWFdEvOw at mail.gmail.com>
>> Content-Type: text/plain; charset=ISO-8859-1
>>
>> Hi,
>>
>> I guess I'm missing something, but ... why isn't your proposed
>> droplevels.data.table consistent with base? Because the ordering of
>> the rows might change (maybe(?))?
>>
>> -steve
>>
>> On Tue, Feb 21, 2012 at 4:42 PM, Matthew Dowle <mdowle at mdowle.plus.com>
>> wrote:
>>>
>>> Yes, could do. Building on that here's a quick stab at
>>> droplevels.data.table. This does it by reference, or it could take a
>>> copy(). If it takes a copy() it would be consistent with base (probably
>>> required), but then how best to make a non-copying version available?
>>>
>>> droplevels.data.table = function(dt) {
>>> ? ?oldkey = key( dt )
>>> ? ?for (i in names(dt)) {
>>> ? ? ? ?if (is.factor(dt[[i]])) dt[,i:=droplevels(dt[[i]]),with=FALSE]
>>> ? ?}
>>> ? ?setkeyv( dt, oldkey )
>>> ? ?dt
>>> }
>>>
>>> On Tue, 2012-02-21 at 15:38 -0500, Prasad Chalasani wrote:
>>>> Meanwhile as a work-around, I suppose one should do:
>>>>
>>>> keys <- key( dt ) # this could in general be a large set of keys
>>>> sub_d <- droplevels( as.data.frame( dt[ name != 'a' ] ) )
>>>> sub_dt <- data.table( sub_d )
>>>> setkeyv( sub_dt, keys )
>>>>
>>>>
>>>>
>>>> On Feb 21, 2012, at 1:59 PM, Matthew Dowle wrote:
>>>>
>>>> >
>>>> > I see the problem too but (just) adding droplevels.data.table might
>>>> miss
>>>> > the root cause.
>>>> >
>>>> >> because the way the
>>>> >> droplevels.data.frame method works isn't compatible with data.table
>>>> >> indexing.
>>>> >
>>>> > But it's intended to be. I can see the switch at the top of
>>>> [.data.table
>>>> > is detecting the caller isn't data.table aware, and it is then
>>>> dispatching
>>>> > to `[.data.frame` but why it then isn't working I'm not sure.
>>>> Something to
>>>> > do with the missing j or missing drop not being passed through
>>>> correctly,
>>>> > perhaps.
>>>> >
>>>> > I have heard it said (once or twice) that data.table is "almost"
>>>> > compatible with non-data.table-aware packages, but never had an
>>>> example
>>>> > before. I wonder if this is it!
>>>> >
>>>> > A (fast) droplevels.data.table using := would be good anyway,
>>>> though.
>>>> >
>>>> > Matthew
>>>> >
>>>> >
>>>> >
>>>> >> Hi,
>>>> >>
>>>> >> I see what the problem is -- we need to provide a
>>>> >> droplevels.data.table S3 method, because the way the
>>>> >> droplevels.data.frame method works isn't compatible with data.table
>>>> >> indexing.
>>>> >>
>>>> >> Will fix:
>>>> >>
>>>> >> https://r-forge.r-project.org/tracker/index.php?func=detail&aid=1841&group_id=240&atid=975
>>>> >>
>>>> >> Thanks for raising the flag.
>>>> >>
>>>> >> Cheers,
>>>> >> -steve
>>>> >>
>>>> >> On Tue, Feb 21, 2012 at 12:38 PM, pchalasani <pchalasani at gmail.com>
>>>> wrote:
>>>> >>> ?Surprising that this wasn't noticed before, or perhaps I'm not
>>>> >>> following
>>>> >>> some recommended idiom to drop levels when using ?data.table. The
>>>> >>> following
>>>> >>> code illustrates the bug clearly: The bug remains regardless of
>>>> whether
>>>> >>> I
>>>> >>> use "subset" or simply use dt1 = dt[ name != 'a' ].
>>>> >>>
>>>> >>>
>>>> >>>
>>>> >>> ? ?d <- data.table(name = c('a','b','c'), value = 1:3)
>>>> >>> ? ?dt <- data.table(d)
>>>> >>> ? ?setkey(dt,'name')
>>>> >>> ? ?dt1 <- subset(dt,name != 'a') ?# or dt1 <- dt[ name != 'a' ]
>>>> >>> ? ?> dt1
>>>> >>> ? ? ? ? ?name value
>>>> >>> ? ? [1,] ? ?b ? ? 2
>>>> >>> ? ? [2,] ? ?c ? ? 3
>>>> >>>
>>>> >>> ? ?> droplevels(dt1)
>>>> >>> ? ? ? ? ?name value
>>>> >>> ? ? [1,] ? ?b ? ? 1
>>>> >>> ? ? [2,] ? ?c ? ? 3
>>>> >>>
>>>> >>>
>>>> >>>
>>>> >>> --
>>>> >>> View this message in context:
>>>> >>> http://r.789695.n4.nabble.com/BUG-droplevels-mangles-subsetted-data-table-tp4407694p4407694.html
>>>> >>> Sent from the datatable-help mailing list archive at Nabble.com.
>>>> >>> _______________________________________________
>>>> >>> datatable-help mailing list
>>>> >>> datatable-help at lists.r-forge.r-project.org
>>>> >>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
>>>> >>
>>>> >>
>>>> >>
>>>> >> --
>>>> >> Steve Lianoglou
>>>> >> Graduate Student: Computational Systems Biology
>>>> >> ?| Memorial Sloan-Kettering Cancer Center
>>>> >> ?| Weill Medical College of Cornell University
>>>> >> Contact Info: http://cbio.mskcc.org/~lianos/contact
>>>> >> _______________________________________________
>>>> >> datatable-help mailing list
>>>> >> datatable-help at lists.r-forge.r-project.org
>>>> >> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
>>>> >>
>>>> >
>>>> >
>>>>
>>>
>>>
>>
>>
>>
>> --
>> Steve Lianoglou
>> Graduate Student: Computational Systems Biology
>> ?| Memorial Sloan-Kettering Cancer Center
>> ?| Weill Medical College of Cornell University
>> Contact Info: http://cbio.mskcc.org/~lianos/contact
>>
>>
>> ------------------------------
>>
>> Message: 3
>> Date: Tue, 21 Feb 2012 23:22:59 +0000
>> From: Matthew Dowle <mdowle at mdowle.plus.com>
>> To: Steve Lianoglou <mailinglist.honeypot at gmail.com>
>> Cc: datatable-help at r-forge.wu-wien.ac.at
>> Subject: Re: [datatable-help] BUG: droplevels mangles subsetted
>>        data.table
>> Message-ID: <1329866579.2108.208.camel at netbook>
>> Content-Type: text/plain; charset="UTF-8"
>>
>> Hi. Just because as it stands it doesn't copy, so
>>
>>    newDT = dropfactors(DT)
>>
>> would change DT by reference with newDT a new pointer to that same
>> modified object, whereas base would leave DT unchanged with newDT a
>> modified copy.
>>
>> Just adding dt=copy(dt) at the start of the function would make it
>> consistent,  but then how would we (data.table-aware code) call the
>> non-copying version if we wanted that (which is likely needed, given the
>> motivation of dropping unused levels I guess). Could continue the set*
>> theme and create setdropfactors()? but that doesn't roll off the tongue.
>> Or the copy() could be switched in the usual way :
>>
>>     if (!cedta) dt = copy(dt)
>>
>> and then we data.table users would just know that droplevels worked by
>> reference and we should copy() first if we want a copy, in the usual
>> way. Whilst not upsetting non-data.table-aware packages, since they
>> would still copy. Think I prefer the switched copy, carefully
>> documented, which would save yet another new function. I'm thinking that
>> users' expectations of dropfactors() would probably be that it worked by
>> reference on data.tables anyway (or if not, would want it to after the
>> initial surprise).
>>
>> Matthew
>>
>> On Tue, 2012-02-21 at 17:52 -0500, Steve Lianoglou wrote:
>>> Hi,
>>>
>>> I guess I'm missing something, but ... why isn't your proposed
>>> droplevels.data.table consistent with base? Because the ordering of
>>> the rows might change (maybe(?))?
>>>
>>> -steve
>>>
>>> On Tue, Feb 21, 2012 at 4:42 PM, Matthew Dowle <mdowle at mdowle.plus.com>
>>> wrote:
>>> >
>>> > Yes, could do. Building on that here's a quick stab at
>>> > droplevels.data.table. This does it by reference, or it could take a
>>> > copy(). If it takes a copy() it would be consistent with base
>>> (probably
>>> > required), but then how best to make a non-copying version available?
>>> >
>>> > droplevels.data.table = function(dt) {
>>> >    oldkey = key( dt )
>>> >    for (i in names(dt)) {
>>> >        if (is.factor(dt[[i]])) dt[,i:=droplevels(dt[[i]]),with=FALSE]
>>> >    }
>>> >    setkeyv( dt, oldkey )
>>> >    dt
>>> > }
>>> >
>>> > On Tue, 2012-02-21 at 15:38 -0500, Prasad Chalasani wrote:
>>> >> Meanwhile as a work-around, I suppose one should do:
>>> >>
>>> >> keys <- key( dt ) # this could in general be a large set of keys
>>> >> sub_d <- droplevels( as.data.frame( dt[ name != 'a' ] ) )
>>> >> sub_dt <- data.table( sub_d )
>>> >> setkeyv( sub_dt, keys )
>>> >>
>>> >>
>>> >>
>>> >> On Feb 21, 2012, at 1:59 PM, Matthew Dowle wrote:
>>> >>
>>> >> >
>>> >> > I see the problem too but (just) adding droplevels.data.table
>>> might miss
>>> >> > the root cause.
>>> >> >
>>> >> >> because the way the
>>> >> >> droplevels.data.frame method works isn't compatible with
>>> data.table
>>> >> >> indexing.
>>> >> >
>>> >> > But it's intended to be. I can see the switch at the top of
>>> [.data.table
>>> >> > is detecting the caller isn't data.table aware, and it is then
>>> dispatching
>>> >> > to `[.data.frame` but why it then isn't working I'm not sure.
>>> Something to
>>> >> > do with the missing j or missing drop not being passed through
>>> correctly,
>>> >> > perhaps.
>>> >> >
>>> >> > I have heard it said (once or twice) that data.table is "almost"
>>> >> > compatible with non-data.table-aware packages, but never had an
>>> example
>>> >> > before. I wonder if this is it!
>>> >> >
>>> >> > A (fast) droplevels.data.table using := would be good anyway,
>>> though.
>>> >> >
>>> >> > Matthew
>>> >> >
>>> >> >
>>> >> >
>>> >> >> Hi,
>>> >> >>
>>> >> >> I see what the problem is -- we need to provide a
>>> >> >> droplevels.data.table S3 method, because the way the
>>> >> >> droplevels.data.frame method works isn't compatible with
>>> data.table
>>> >> >> indexing.
>>> >> >>
>>> >> >> Will fix:
>>> >> >>
>>> >> >> https://r-forge.r-project.org/tracker/index.php?func=detail&aid=1841&group_id=240&atid=975
>>> >> >>
>>> >> >> Thanks for raising the flag.
>>> >> >>
>>> >> >> Cheers,
>>> >> >> -steve
>>> >> >>
>>> >> >> On Tue, Feb 21, 2012 at 12:38 PM, pchalasani
>>> <pchalasani at gmail.com> wrote:
>>> >> >>>  Surprising that this wasn't noticed before, or perhaps I'm not
>>> >> >>> following
>>> >> >>> some recommended idiom to drop levels when using  data.table.
>>> The
>>> >> >>> following
>>> >> >>> code illustrates the bug clearly: The bug remains regardless of
>>> whether
>>> >> >>> I
>>> >> >>> use "subset" or simply use dt1 = dt[ name != 'a' ].
>>> >> >>>
>>> >> >>>
>>> >> >>>
>>> >> >>>    d <- data.table(name = c('a','b','c'), value = 1:3)
>>> >> >>>    dt <- data.table(d)
>>> >> >>>    setkey(dt,'name')
>>> >> >>>    dt1 <- subset(dt,name != 'a')  # or dt1 <- dt[ name != 'a' ]
>>> >> >>>    > dt1
>>> >> >>>          name value
>>> >> >>>     [1,]    b     2
>>> >> >>>     [2,]    c     3
>>> >> >>>
>>> >> >>>    > droplevels(dt1)
>>> >> >>>          name value
>>> >> >>>     [1,]    b     1
>>> >> >>>     [2,]    c     3
>>> >> >>>
>>> >> >>>
>>> >> >>>
>>> >> >>> --
>>> >> >>> View this message in context:
>>> >> >>> http://r.789695.n4.nabble.com/BUG-droplevels-mangles-subsetted-data-table-tp4407694p4407694.html
>>> >> >>> Sent from the datatable-help mailing list archive at Nabble.com.
>>> >> >>> _______________________________________________
>>> >> >>> datatable-help mailing list
>>> >> >>> datatable-help at lists.r-forge.r-project.org
>>> >> >>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
>>> >> >>
>>> >> >>
>>> >> >>
>>> >> >> --
>>> >> >> Steve Lianoglou
>>> >> >> Graduate Student: Computational Systems Biology
>>> >> >>  | Memorial Sloan-Kettering Cancer Center
>>> >> >>  | Weill Medical College of Cornell University
>>> >> >> Contact Info: http://cbio.mskcc.org/~lianos/contact
>>> >> >> _______________________________________________
>>> >> >> datatable-help mailing list
>>> >> >> datatable-help at lists.r-forge.r-project.org
>>> >> >> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
>>> >> >>
>>> >> >
>>> >> >
>>> >>
>>> >
>>> >
>>>
>>>
>>>
>>
>>
>>
>>
>> ------------------------------
>>
>> Message: 4
>> Date: Tue, 21 Feb 2012 21:24:33 -0500
>> From: Steve Lianoglou <mailinglist.honeypot at gmail.com>
>> To: mdowle at mdowle.plus.com
>> Cc: datatable-help at r-forge.wu-wien.ac.at
>> Subject: Re: [datatable-help] BUG: droplevels mangles subsetted
>>        data.table
>> Message-ID:
>>      
>>  <CAHA9McNzNWNS+=4pXwLwfj5GvnpUerJx9otUOV4pY1fEXfk=rw at mail.gmail.com>
>> Content-Type: text/plain; charset=ISO-8859-1
>>
>> Ahh, right ... the copying. Good point.
>>
>> Regarding the logic you suggest as to when to copy or not, how do you
>> feel about going the explicit route instead of trying to take a best
>> guess when we should/shouldn't copy via `cedta` and doing the
>> 'data.frame behavior' by default.
>>
>> By that I mean: since the droplevels function has a `...` param, can
>> we do something like:
>>
>> droplevels.data.table <- function(x, except=NULL, do.copy=TRUE, ...) {
>>  if (do.copy) {
>>    x <- copy(x)
>>  }
>>  oldkey = key(x)
>>  change.me <- names(x)
>>  if (!is.null(except)) {
>>    change.me <- setdiff(change.me, names(x)[except])
>>  }
>>  for (i in change.me)) {
>>       if (is.factor(x[[i]])) x[,i:=droplevels(x[[i]]),with=FALSE]
>>   }
>>  setkeyv( x, oldkey )
>> }
>>
>> yay/nay?
>>
>> -steve
>>
>> On Tue, Feb 21, 2012 at 6:22 PM, Matthew Dowle <mdowle at mdowle.plus.com>
>> wrote:
>>> Hi. Just because as it stands it doesn't copy, so
>>>
>>> ? ?newDT = dropfactors(DT)
>>>
>>> would change DT by reference with newDT a new pointer to that same
>>> modified object, whereas base would leave DT unchanged with newDT a
>>> modified copy.
>>>
>>> Just adding dt=copy(dt) at the start of the function would make it
>>> consistent, ?but then how would we (data.table-aware code) call the
>>> non-copying version if we wanted that (which is likely needed, given
>>> the
>>> motivation of dropping unused levels I guess). Could continue the set*
>>> theme and create setdropfactors()? but that doesn't roll off the
>>> tongue.
>>> Or the copy() could be switched in the usual way :
>>>
>>> ? ? if (!cedta) dt = copy(dt)
>>>
>>> and then we data.table users would just know that droplevels worked by
>>> reference and we should copy() first if we want a copy, in the usual
>>> way. Whilst not upsetting non-data.table-aware packages, since they
>>> would still copy. Think I prefer the switched copy, carefully
>>> documented, which would save yet another new function. I'm thinking
>>> that
>>> users' expectations of dropfactors() would probably be that it worked
>>> by
>>> reference on data.tables anyway (or if not, would want it to after the
>>> initial surprise).
>>>
>>> Matthew
>>>
>>> On Tue, 2012-02-21 at 17:52 -0500, Steve Lianoglou wrote:
>>>> Hi,
>>>>
>>>> I guess I'm missing something, but ... why isn't your proposed
>>>> droplevels.data.table consistent with base? Because the ordering of
>>>> the rows might change (maybe(?))?
>>>>
>>>> -steve
>>>>
>>>> On Tue, Feb 21, 2012 at 4:42 PM, Matthew Dowle
>>>> <mdowle at mdowle.plus.com> wrote:
>>>> >
>>>> > Yes, could do. Building on that here's a quick stab at
>>>> > droplevels.data.table. This does it by reference, or it could take a
>>>> > copy(). If it takes a copy() it would be consistent with base
>>>> (probably
>>>> > required), but then how best to make a non-copying version
>>>> available?
>>>> >
>>>> > droplevels.data.table = function(dt) {
>>>> > ? ?oldkey = key( dt )
>>>> > ? ?for (i in names(dt)) {
>>>> > ? ? ? ?if (is.factor(dt[[i]]))
>>>> dt[,i:=droplevels(dt[[i]]),with=FALSE]
>>>> > ? ?}
>>>> > ? ?setkeyv( dt, oldkey )
>>>> > ? ?dt
>>>> > }
>>>> >
>>>> > On Tue, 2012-02-21 at 15:38 -0500, Prasad Chalasani wrote:
>>>> >> Meanwhile as a work-around, I suppose one should do:
>>>> >>
>>>> >> keys <- key( dt ) # this could in general be a large set of keys
>>>> >> sub_d <- droplevels( as.data.frame( dt[ name != 'a' ] ) )
>>>> >> sub_dt <- data.table( sub_d )
>>>> >> setkeyv( sub_dt, keys )
>>>> >>
>>>> >>
>>>> >>
>>>> >> On Feb 21, 2012, at 1:59 PM, Matthew Dowle wrote:
>>>> >>
>>>> >> >
>>>> >> > I see the problem too but (just) adding droplevels.data.table
>>>> might miss
>>>> >> > the root cause.
>>>> >> >
>>>> >> >> because the way the
>>>> >> >> droplevels.data.frame method works isn't compatible with
>>>> data.table
>>>> >> >> indexing.
>>>> >> >
>>>> >> > But it's intended to be. I can see the switch at the top of
>>>> [.data.table
>>>> >> > is detecting the caller isn't data.table aware, and it is then
>>>> dispatching
>>>> >> > to `[.data.frame` but why it then isn't working I'm not sure.
>>>> Something to
>>>> >> > do with the missing j or missing drop not being passed through
>>>> correctly,
>>>> >> > perhaps.
>>>> >> >
>>>> >> > I have heard it said (once or twice) that data.table is "almost"
>>>> >> > compatible with non-data.table-aware packages, but never had an
>>>> example
>>>> >> > before. I wonder if this is it!
>>>> >> >
>>>> >> > A (fast) droplevels.data.table using := would be good anyway,
>>>> though.
>>>> >> >
>>>> >> > Matthew
>>>> >> >
>>>> >> >
>>>> >> >
>>>> >> >> Hi,
>>>> >> >>
>>>> >> >> I see what the problem is -- we need to provide a
>>>> >> >> droplevels.data.table S3 method, because the way the
>>>> >> >> droplevels.data.frame method works isn't compatible with
>>>> data.table
>>>> >> >> indexing.
>>>> >> >>
>>>> >> >> Will fix:
>>>> >> >>
>>>> >> >> https://r-forge.r-project.org/tracker/index.php?func=detail&aid=1841&group_id=240&atid=975
>>>> >> >>
>>>> >> >> Thanks for raising the flag.
>>>> >> >>
>>>> >> >> Cheers,
>>>> >> >> -steve
>>>> >> >>
>>>> >> >> On Tue, Feb 21, 2012 at 12:38 PM, pchalasani
>>>> <pchalasani at gmail.com> wrote:
>>>> >> >>> ?Surprising that this wasn't noticed before, or perhaps I'm not
>>>> >> >>> following
>>>> >> >>> some recommended idiom to drop levels when using ?data.table.
>>>> The
>>>> >> >>> following
>>>> >> >>> code illustrates the bug clearly: The bug remains regardless of
>>>> whether
>>>> >> >>> I
>>>> >> >>> use "subset" or simply use dt1 = dt[ name != 'a' ].
>>>> >> >>>
>>>> >> >>>
>>>> >> >>>
>>>> >> >>> ? ?d <- data.table(name = c('a','b','c'), value = 1:3)
>>>> >> >>> ? ?dt <- data.table(d)
>>>> >> >>> ? ?setkey(dt,'name')
>>>> >> >>> ? ?dt1 <- subset(dt,name != 'a') ?# or dt1 <- dt[ name != 'a' ]
>>>> >> >>> ? ?> dt1
>>>> >> >>> ? ? ? ? ?name value
>>>> >> >>> ? ? [1,] ? ?b ? ? 2
>>>> >> >>> ? ? [2,] ? ?c ? ? 3
>>>> >> >>>
>>>> >> >>> ? ?> droplevels(dt1)
>>>> >> >>> ? ? ? ? ?name value
>>>> >> >>> ? ? [1,] ? ?b ? ? 1
>>>> >> >>> ? ? [2,] ? ?c ? ? 3
>>>> >> >>>
>>>> >> >>>
>>>> >> >>>
>>>> >> >>> --
>>>> >> >>> View this message in context:
>>>> >> >>> http://r.789695.n4.nabble.com/BUG-droplevels-mangles-subsetted-data-table-tp4407694p4407694.html
>>>> >> >>> Sent from the datatable-help mailing list archive at
>>>> Nabble.com.
>>>> >> >>> _______________________________________________
>>>> >> >>> datatable-help mailing list
>>>> >> >>> datatable-help at lists.r-forge.r-project.org
>>>> >> >>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
>>>> >> >>
>>>> >> >>
>>>> >> >>
>>>> >> >> --
>>>> >> >> Steve Lianoglou
>>>> >> >> Graduate Student: Computational Systems Biology
>>>> >> >> ?| Memorial Sloan-Kettering Cancer Center
>>>> >> >> ?| Weill Medical College of Cornell University
>>>> >> >> Contact Info: http://cbio.mskcc.org/~lianos/contact
>>>> >> >> _______________________________________________
>>>> >> >> datatable-help mailing list
>>>> >> >> datatable-help at lists.r-forge.r-project.org
>>>> >> >> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
>>>> >> >>
>>>> >> >
>>>> >> >
>>>> >>
>>>> >
>>>> >
>>>>
>>>>
>>>>
>>>
>>>
>>
>>
>>
>> --
>> Steve Lianoglou
>> Graduate Student: Computational Systems Biology
>> ?| Memorial Sloan-Kettering Cancer Center
>> ?| Weill Medical College of Cornell University
>> Contact Info: http://cbio.mskcc.org/~lianos/contact
>>
>>
>> ------------------------------
>>
>> _______________________________________________
>> datatable-help mailing list
>> datatable-help at lists.r-forge.r-project.org
>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
>>
>> End of datatable-help Digest, Vol 24, Issue 9
>> *********************************************
> _______________________________________________
> datatable-help mailing list
> datatable-help at lists.r-forge.r-project.org
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
>