[datatable-help] Over-writing factors
Matthew Dowle
mdowle at mdowle.plus.com
Tue Sep 27 00:46:05 CEST 2011
Just committed. Hope this fixes this thread.
o Factor columns on LHS of :=, [<- and $<- can now be assigned
new levels; e.g.,
DT = data.table(A=c("a","b"))
DT[2,"A"] <- "c" # adds new level automatically
DT[2,A:="c"] # same (faster)
DT$A = "newlevel" # adds new level and recycles it through A
Thanks to Damian Betebenner and Chris Neff for highlighting.
To change the type of a column, provide a full length RHS (i.e.
'replace' column syntax).
On Sat, 2011-09-10 at 16:17 +0100, Matthew Dowle wrote:
> $<- and [<- already do redirect to := (or rather, the internal C
> function that := uses), but it can never (unless a change in R happens)
> be as fast as using := directly because the copy of the whole table
> comes from <- itself (dispatch via `*tmp*`). That is most of the very
> reason := was needed, to avoid <-'s copy. It's been quite an adventure!
>
> I've had a look now, and yes there's a bug or two here, and more tests
> needed ...
>
> On Fri, 2011-09-09 at 12:48 -0400, Chris Neff wrote:
> > Once again forgive my ignorance, but why can't whole column
> > replacements using $ <- just redirect to ":="?
> >
> > On 9 September 2011 12:42, Matthew Dowle <mdowle at mdowle.plus.com> wrote:
> > > Oops, I missed that the question was about whole column replacements.
> > >
> > > I'm generally using := now. Will have a think about this ...
> > >
> > > "Chris Neff" <caneff at gmail.com> wrote in message
> > > news:CAAuY0RUusbcvsfTyQBg4m+cwa3ZhwRJMwpL-Q0fBLaQ+EGn6+A at mail.gmail.com...
> > > This was surprising to me. I would've thought test.dt$X <- "foo" and
> > > test.dt[["X"]] <- "foo" should do the exact same thing because the
> > > entire column is being overwritten, no? I would understand the issue
> > > if it was test.dt$X[i] <- "foo", but it isn't.
> > >
> > > In a related note, I'm surprised this doesn't work either:
> > >
> > >> DT=data.table(x=1:10,y=1:10,key="x")
> > >> DT$x="foo"
> > > Warning message:
> > > In `[<-.data.table`(x, j = name, value = value) :
> > > NAs introduced by coercion
> > >> DT[["x"]]="foo" # Works fine and sets DT$x to be a character vector with
> > >> "foo" repeated.
> > >
> > > DT$x is allowed to be overwritten by other classes (i.e. DT$x <-
> > > rnorm(10) when DT$x starts as an integer), why not a character, or
> > > change "foo" to a factor and assign it if that is what must be done?
> > > This is also true with a non-key variable, like DT$y.
> > >
> > >
> > > On 8 September 2011 18:15, Matthew Dowle <mdowle at mdowle.plus.com> wrote:
> > >> It's intended. data.table requires sorted factor levels. If the levels
> > >> somehow become unsorted then binary search joins don't work. We might
> > >> be able to allow it and make data.table maintain things (with a speed
> > >> penalty due to the resort of levels and re-write of the entire integer
> > >> column), but, allowing character columns is (hopefully) not far away
> > >> which should be much better solution.
> > >>
> > >> Or, in the meantime, you can go 'under the hood'. Add "something
> > >> different" to the end of the factor levels using levels()<-, then the
> > >> assignment to the column should work. If that column is part of a key
> > >> then make sure to recall setkey() which will check and resort the levels
> > >> for you (with a warning message). If an existing level is changing name,
> > >> then it's faster to assign (once) directly to the levels(). But again,
> > >> careful to maintain sorted levels if that column is to be used in joins.
> > >>
> > >> Matthew
> > >>
> > >>
> > >> On Thu, 2011-09-08 at 14:27 -0500, Damian Betebenner wrote:
> > >>> Not sure whether the following is an intended behavior with
> > >>> data.table. Perhaps it is something idiosyncratic with factors.
> > >>>
> > >>> It is different than what one gets with data.frame
> > >>>
> > >>>
> > >>>
> > >>> test.df <- data.frame(X=letters[1:10], Y=rnorm(10))
> > >>>
> > >>> test.dt <- data.table(X=letters[1:10], Y=rnorm(10))
> > >>>
> > >>>
> > >>>
> > >>> test.df$X <- "Something Different"
> > >>>
> > >>> test.dt$X <- "Something Different"
> > >>>
> > >>> Error in `[<-.data.table`(x, j = name, value = value) :
> > >>>
> > >>> Some or all RHS not present in factor column levels
> > >>>
> > >>>
> > >>>
> > >>>
> > >>>
> > >>> test.df <- data.frame(X=letters[1:10], Y=rnorm(10))
> > >>>
> > >>> test.dt <- data.table(X=letters[1:10], Y=rnorm(10))
> > >>>
> > >>>
> > >>>
> > >>> test.df[["X"]] <- "Something Different"
> > >>>
> > >>> test.dt[["X"]] <- "Something Different"
> > >>>
> > >>>
> > >>>
> > >>>
> > >>>
> > >>>
> > >>>
> > >>>
> > >>>
> > >>> Damian Betebenner
> > >>>
> > >>> Center for Assessment
> > >>>
> > >>> PO Box 351
> > >>>
> > >>> Dover, NH 03821-0351
> > >>>
> > >>>
> > >>>
> > >>> Phone (office): (603) 516-7900
> > >>>
> > >>> Phone (cell): (857) 234-2474
> > >>>
> > >>> Fax: (603) 516-7910
> > >>>
> > >>>
> > >>>
> > >>> dbetebenner at nciea.org
> > >>>
> > >>> www.nciea.org
> > >>>
> > >>>
> > >>>
> > >>>
> > >>>
> > >>>
> > >>>
> > >>>
> > >>> _______________________________________________
> > >>> datatable-help mailing list
> > >>> datatable-help at lists.r-forge.r-project.org
> > >>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
> > >>
> > >>
> > >> _______________________________________________
> > >> datatable-help mailing list
> > >> datatable-help at lists.r-forge.r-project.org
> > >> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
> > >>
> > >
> > >
> > >
> > > _______________________________________________
> > > datatable-help mailing list
> > > datatable-help at lists.r-forge.r-project.org
> > > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
> > >
>
>
> _______________________________________________
> datatable-help mailing list
> datatable-help at lists.r-forge.r-project.org
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
More information about the datatable-help
mailing list