[datatable-help] Over-writing factors
Chris Neff
caneff at gmail.com
Fri Sep 9 18:22:46 CEST 2011
This was surprising to me. I would've thought test.dt$X <- "foo" and
test.dt[["X"]] <- "foo" should do the exact same thing because the
entire column is being overwritten, no? I would understand the issue
if it was test.dt$X[i] <- "foo", but it isn't.
In a related note, I'm surprised this doesn't work either:
> DT=data.table(x=1:10,y=1:10,key="x")
> DT$x="foo"
Warning message:
In `[<-.data.table`(x, j = name, value = value) :
NAs introduced by coercion
> DT[["x"]]="foo" # Works fine and sets DT$x to be a character vector with "foo" repeated.
DT$x is allowed to be overwritten by other classes (i.e. DT$x <-
rnorm(10) when DT$x starts as an integer), why not a character, or
change "foo" to a factor and assign it if that is what must be done?
This is also true with a non-key variable, like DT$y.
On 8 September 2011 18:15, Matthew Dowle <mdowle at mdowle.plus.com> wrote:
> It's intended. data.table requires sorted factor levels. If the levels
> somehow become unsorted then binary search joins don't work. We might
> be able to allow it and make data.table maintain things (with a speed
> penalty due to the resort of levels and re-write of the entire integer
> column), but, allowing character columns is (hopefully) not far away
> which should be much better solution.
>
> Or, in the meantime, you can go 'under the hood'. Add "something
> different" to the end of the factor levels using levels()<-, then the
> assignment to the column should work. If that column is part of a key
> then make sure to recall setkey() which will check and resort the levels
> for you (with a warning message). If an existing level is changing name,
> then it's faster to assign (once) directly to the levels(). But again,
> careful to maintain sorted levels if that column is to be used in joins.
>
> Matthew
>
>
> On Thu, 2011-09-08 at 14:27 -0500, Damian Betebenner wrote:
>> Not sure whether the following is an intended behavior with
>> data.table. Perhaps it is something idiosyncratic with factors.
>>
>> It is different than what one gets with data.frame
>>
>>
>>
>> test.df <- data.frame(X=letters[1:10], Y=rnorm(10))
>>
>> test.dt <- data.table(X=letters[1:10], Y=rnorm(10))
>>
>>
>>
>> test.df$X <- "Something Different"
>>
>> test.dt$X <- "Something Different"
>>
>> Error in `[<-.data.table`(x, j = name, value = value) :
>>
>> Some or all RHS not present in factor column levels
>>
>>
>>
>>
>>
>> test.df <- data.frame(X=letters[1:10], Y=rnorm(10))
>>
>> test.dt <- data.table(X=letters[1:10], Y=rnorm(10))
>>
>>
>>
>> test.df[["X"]] <- "Something Different"
>>
>> test.dt[["X"]] <- "Something Different"
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> Damian Betebenner
>>
>> Center for Assessment
>>
>> PO Box 351
>>
>> Dover, NH 03821-0351
>>
>>
>>
>> Phone (office): (603) 516-7900
>>
>> Phone (cell): (857) 234-2474
>>
>> Fax: (603) 516-7910
>>
>>
>>
>> dbetebenner at nciea.org
>>
>> www.nciea.org
>>
>>
>>
>>
>>
>>
>>
>>
>> _______________________________________________
>> datatable-help mailing list
>> datatable-help at lists.r-forge.r-project.org
>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
>
>
> _______________________________________________
> datatable-help mailing list
> datatable-help at lists.r-forge.r-project.org
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
>
More information about the datatable-help
mailing list