[datatable-help] Over-writing factors

Chris Neff caneff at gmail.com
Fri Sep 9 18:48:52 CEST 2011


Once again forgive my ignorance, but why can't whole column
replacements using $ <- just redirect to ":="?

On 9 September 2011 12:42, Matthew Dowle <mdowle at mdowle.plus.com> wrote:
> Oops, I missed that the question was about whole column replacements.
>
> I'm generally using := now.  Will have a think about this ...
>
> "Chris Neff" <caneff at gmail.com> wrote in message
> news:CAAuY0RUusbcvsfTyQBg4m+cwa3ZhwRJMwpL-Q0fBLaQ+EGn6+A at mail.gmail.com...
> This was surprising to me. I would've thought test.dt$X <- "foo" and
> test.dt[["X"]] <- "foo" should do the exact same thing because the
> entire column is being overwritten, no? I would understand the issue
> if it was test.dt$X[i] <- "foo", but it isn't.
>
> In a related note, I'm surprised this doesn't work either:
>
>> DT=data.table(x=1:10,y=1:10,key="x")
>> DT$x="foo"
> Warning message:
> In `[<-.data.table`(x, j = name, value = value) :
>  NAs introduced by coercion
>> DT[["x"]]="foo" # Works fine and sets DT$x to be a character vector with
>> "foo" repeated.
>
> DT$x is allowed to be overwritten by other classes (i.e. DT$x <-
> rnorm(10) when DT$x starts as an integer), why not a character, or
> change "foo" to a factor and assign it if that is what must be done?
> This is also true with a non-key variable, like DT$y.
>
>
> On 8 September 2011 18:15, Matthew Dowle <mdowle at mdowle.plus.com> wrote:
>> It's intended. data.table requires sorted factor levels. If the levels
>> somehow become unsorted then binary search joins don't work. We might
>> be able to allow it and make data.table maintain things (with a speed
>> penalty due to the resort of levels and re-write of the entire integer
>> column), but, allowing character columns is (hopefully) not far away
>> which should be much better solution.
>>
>> Or, in the meantime, you can go 'under the hood'. Add "something
>> different" to the end of the factor levels using levels()<-, then the
>> assignment to the column should work. If that column is part of a key
>> then make sure to recall setkey() which will check and resort the levels
>> for you (with a warning message). If an existing level is changing name,
>> then it's faster to assign (once) directly to the levels(). But again,
>> careful to maintain sorted levels if that column is to be used in joins.
>>
>> Matthew
>>
>>
>> On Thu, 2011-09-08 at 14:27 -0500, Damian Betebenner wrote:
>>> Not sure whether the following is an intended behavior with
>>> data.table. Perhaps it is something idiosyncratic with factors.
>>>
>>> It is different than what one gets with data.frame
>>>
>>>
>>>
>>> test.df <- data.frame(X=letters[1:10], Y=rnorm(10))
>>>
>>> test.dt <- data.table(X=letters[1:10], Y=rnorm(10))
>>>
>>>
>>>
>>> test.df$X <- "Something Different"
>>>
>>> test.dt$X <- "Something Different"
>>>
>>> Error in `[<-.data.table`(x, j = name, value = value) :
>>>
>>> Some or all RHS not present in factor column levels
>>>
>>>
>>>
>>>
>>>
>>> test.df <- data.frame(X=letters[1:10], Y=rnorm(10))
>>>
>>> test.dt <- data.table(X=letters[1:10], Y=rnorm(10))
>>>
>>>
>>>
>>> test.df[["X"]] <- "Something Different"
>>>
>>> test.dt[["X"]] <- "Something Different"
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> Damian Betebenner
>>>
>>> Center for Assessment
>>>
>>> PO Box 351
>>>
>>> Dover, NH 03821-0351
>>>
>>>
>>>
>>> Phone (office): (603) 516-7900
>>>
>>> Phone (cell): (857) 234-2474
>>>
>>> Fax: (603) 516-7910
>>>
>>>
>>>
>>> dbetebenner at nciea.org
>>>
>>> www.nciea.org
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> datatable-help mailing list
>>> datatable-help at lists.r-forge.r-project.org
>>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
>>
>>
>> _______________________________________________
>> datatable-help mailing list
>> datatable-help at lists.r-forge.r-project.org
>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
>>
>
>
>
> _______________________________________________
> datatable-help mailing list
> datatable-help at lists.r-forge.r-project.org
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
>


More information about the datatable-help mailing list