[datatable-help] Over-writing factors

Matthew Dowle mdowle at mdowle.plus.com
Sat Sep 10 17:17:58 CEST 2011


$<- and [<- already do redirect to := (or rather, the internal C
function that := uses), but it can never (unless a change in R happens)
be as fast as using := directly because the copy of the whole table
comes from <- itself (dispatch via `*tmp*`). That is most of the very
reason := was needed, to avoid <-'s copy. It's been quite an adventure!

I've had a look now, and yes there's a bug or two here, and more tests
needed ...

On Fri, 2011-09-09 at 12:48 -0400, Chris Neff wrote:
> Once again forgive my ignorance, but why can't whole column
> replacements using $ <- just redirect to ":="?
> 
> On 9 September 2011 12:42, Matthew Dowle <mdowle at mdowle.plus.com> wrote:
> > Oops, I missed that the question was about whole column replacements.
> >
> > I'm generally using := now.  Will have a think about this ...
> >
> > "Chris Neff" <caneff at gmail.com> wrote in message
> > news:CAAuY0RUusbcvsfTyQBg4m+cwa3ZhwRJMwpL-Q0fBLaQ+EGn6+A at mail.gmail.com...
> > This was surprising to me. I would've thought test.dt$X <- "foo" and
> > test.dt[["X"]] <- "foo" should do the exact same thing because the
> > entire column is being overwritten, no? I would understand the issue
> > if it was test.dt$X[i] <- "foo", but it isn't.
> >
> > In a related note, I'm surprised this doesn't work either:
> >
> >> DT=data.table(x=1:10,y=1:10,key="x")
> >> DT$x="foo"
> > Warning message:
> > In `[<-.data.table`(x, j = name, value = value) :
> >  NAs introduced by coercion
> >> DT[["x"]]="foo" # Works fine and sets DT$x to be a character vector with
> >> "foo" repeated.
> >
> > DT$x is allowed to be overwritten by other classes (i.e. DT$x <-
> > rnorm(10) when DT$x starts as an integer), why not a character, or
> > change "foo" to a factor and assign it if that is what must be done?
> > This is also true with a non-key variable, like DT$y.
> >
> >
> > On 8 September 2011 18:15, Matthew Dowle <mdowle at mdowle.plus.com> wrote:
> >> It's intended. data.table requires sorted factor levels. If the levels
> >> somehow become unsorted then binary search joins don't work. We might
> >> be able to allow it and make data.table maintain things (with a speed
> >> penalty due to the resort of levels and re-write of the entire integer
> >> column), but, allowing character columns is (hopefully) not far away
> >> which should be much better solution.
> >>
> >> Or, in the meantime, you can go 'under the hood'. Add "something
> >> different" to the end of the factor levels using levels()<-, then the
> >> assignment to the column should work. If that column is part of a key
> >> then make sure to recall setkey() which will check and resort the levels
> >> for you (with a warning message). If an existing level is changing name,
> >> then it's faster to assign (once) directly to the levels(). But again,
> >> careful to maintain sorted levels if that column is to be used in joins.
> >>
> >> Matthew
> >>
> >>
> >> On Thu, 2011-09-08 at 14:27 -0500, Damian Betebenner wrote:
> >>> Not sure whether the following is an intended behavior with
> >>> data.table. Perhaps it is something idiosyncratic with factors.
> >>>
> >>> It is different than what one gets with data.frame
> >>>
> >>>
> >>>
> >>> test.df <- data.frame(X=letters[1:10], Y=rnorm(10))
> >>>
> >>> test.dt <- data.table(X=letters[1:10], Y=rnorm(10))
> >>>
> >>>
> >>>
> >>> test.df$X <- "Something Different"
> >>>
> >>> test.dt$X <- "Something Different"
> >>>
> >>> Error in `[<-.data.table`(x, j = name, value = value) :
> >>>
> >>> Some or all RHS not present in factor column levels
> >>>
> >>>
> >>>
> >>>
> >>>
> >>> test.df <- data.frame(X=letters[1:10], Y=rnorm(10))
> >>>
> >>> test.dt <- data.table(X=letters[1:10], Y=rnorm(10))
> >>>
> >>>
> >>>
> >>> test.df[["X"]] <- "Something Different"
> >>>
> >>> test.dt[["X"]] <- "Something Different"
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>> Damian Betebenner
> >>>
> >>> Center for Assessment
> >>>
> >>> PO Box 351
> >>>
> >>> Dover, NH 03821-0351
> >>>
> >>>
> >>>
> >>> Phone (office): (603) 516-7900
> >>>
> >>> Phone (cell): (857) 234-2474
> >>>
> >>> Fax: (603) 516-7910
> >>>
> >>>
> >>>
> >>> dbetebenner at nciea.org
> >>>
> >>> www.nciea.org
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>> _______________________________________________
> >>> datatable-help mailing list
> >>> datatable-help at lists.r-forge.r-project.org
> >>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
> >>
> >>
> >> _______________________________________________
> >> datatable-help mailing list
> >> datatable-help at lists.r-forge.r-project.org
> >> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
> >>
> >
> >
> >
> > _______________________________________________
> > datatable-help mailing list
> > datatable-help at lists.r-forge.r-project.org
> > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
> >




More information about the datatable-help mailing list