[datatable-help] Wonder whether there is an easier way to changepart of data.table values
Short, Tom
TShort at epri.com
Tue Aug 3 04:28:13 CEST 2010
I've just checked in versions of [<-.data.table and $<-.data.table that
check for the columns adjusted and reset the key if appropriate. This
brings up some incompatibilities:
(*) KEYS -- Before, you could do:
dt$key_column = anything
And it wouldn't change the status of the key. Now, the key will be
nullified.
(*) ASSIGNMENT DIFFERENCE
Before: dt["a"] <- "b" meant change column a.
Now: dt[,"a"] <- "b"
Now, you can do
dt[J("a"), "somecol"] <- 33 means assign 33 to the column "somecol"
based on the key being equal to "a".
(*) QUESTIONS
- Do we need a "keep.key" argument for cases where we don't want the key
nullified (the user knows the order is unaffected). This isn't really
possible for dt$a[1:4] <- something.
- Is it a good idea to use data.table-style indexing for the i part of
[<-.data.table? I was skeptical when Branson first asked (prefering
data.frame compatibility), but it makes more sense now that I think
about it.
Finally, we should put a warning in the documentation somewhere that
functions that re-arrange or assign values to a data frame may "corrupt"
the key of a data.table. Data.table-aware functions should account for
this, but other functions may not.
- Tom
> -----Original Message-----
> From: Short, Tom
> Sent: Thursday, July 29, 2010 15:39
> To: 'Branson Owen'; datatable-help at lists.r-forge.r-project.org
> Subject: RE: [datatable-help] Wonder whether there is an
> easier way to changepart of data.table values
>
> Branson,
>
> [<-.data.table and $<-.data.table both need a bit of work.
>
> > DT <- data.table(A = c("A", "Z"), Z = 1:10, key = "A")
> >
>
> You've found the DT$column[index] = new value approach, but
> if you use that on keys, DT may no longer be sorted right:
>
> > DT$A[10] <- "A"
> > DT
> A Z
> [1,] A 1
> [2,] A 3
> [3,] A 5
> [4,] A 7
> [5,] A 9
> [6,] Z 2
> [7,] Z 4
> [8,] Z 6
> [9,] Z 8
> [10,] A 100
>
> Since we now inherit from data.frames, we can just use
> [<-.data.frame. It still has the problem that it won't remove
> the key if the key'd column changes.
>
> >
> > `[<-.data.table` <- `[<-.data.frame`
> > DT[9,"Z"] <- 22
> > DT
> A Z
> [1,] A 1
> [2,] A 3
> [3,] A 5
> [4,] A 7
> [5,] A 9
> [6,] Z 2
> [7,] Z 4
> [8,] Z 6
> [9,] Z 22
> [10,] A 100
>
> I'm not sure we want to be able to do DT[select,] <- something.
>
> Something like the following will work for a simple select:
>
> > DT <- data.table(A = c("A", "Z"), Z = 1:10, key = "A")
> > `[<-.data.table` <- `[<-.data.frame` DT[DT[J("A"), which=TRUE,
> > mult="all"], "Z"] <- 44 DT
> A Z
> [1,] A 44
> [2,] A 44
> [3,] A 44
> [4,] A 44
> [5,] A 44
> [6,] Z 2
> [7,] Z 4
> [8,] Z 6
> [9,] Z 8
> [10,] Z 10
>
> The following is equivalent:
>
> > DT$Z[DT[J("A"), which=TRUE, mult="all"]] <- 55 DT
> A Z
> [1,] A 55
> [2,] A 55
> [3,] A 55
> [4,] A 55
> [5,] A 55
> [6,] Z 2
> [7,] Z 4
> [8,] Z 6
> [9,] Z 8
> [10,] Z 10
>
> I'd prefer to use as much of [<-.data.frame and
> $<-.data.frame as possible. $<-.data.frame is pretty easy:
>
> > "$<-.data.table" = function (x, name, value) {
> + res <- `$<-.data.frame`(x, name, value)
> + if (any(name %in% key(x)))
> + key(res) <- NULL
> + res
> + }
> > DT <- data.table(A = c("A", "Z"), Z = 1:10, key = "A") DT$Z[3] <- 33
> > key(DT)
> [1] "A"
> > DT$A[10] <- "A"
> > key(DT)
> NULL
> > DT
> A Z
> [1,] A 1
> [2,] A 3
> [3,] A 33
> [4,] A 7
> [5,] A 9
> [6,] Z 2
> [7,] Z 4
> [8,] Z 6
> [9,] Z 8
> [10,] A 10
>
> This doesn't allow x to be a data.table-style select. If we
> want that, I could experiment some.
>
> [<-.data.table is more challenging, but I could take a shot
> on a plane ride next week.
>
> - Tom
>
>
>
> > -----Original Message-----
> > From: datatable-help-bounces at lists.r-forge.r-project.org
> > [mailto:datatable-help-bounces at lists.r-forge.r-project.org]
> > On Behalf Of Branson Owen
> > Sent: Thursday, July 29, 2010 14:39
> > To: datatable-help at lists.r-forge.r-project.org
> > Subject: [datatable-help] Wonder whether there is an easier way to
> > changepart of data.table values
> >
> > I thought I have no more question, but ... Please take your time to
> > respond, I don't want to overwhelm your time.
> >
> > ** I want to only change values for certain rows of
> selected columns.
> > **
> >
> > In data.frame, I can do something like:
> >
> > > DF[row index, "column"] = new value.
> >
> > In data.table, this has been disabled even using "with = FALSE"
> >
> > >DT[3,"Z", with = FALSE]
> >
> > Z
> > [1,] 20
> >
> > > DT[3,"Z", with = FALSE] <- 1
> > Error in `[<-.data.table`(`*tmp*`, 3, "Z", with = FALSE,
> value = 1) :
> > unused argument(s) (with = FALSE)
> >
> >
> > Actually, I found that there is no way I can edit value
> using [,] in
> > DT. The only way I found to change value is using
> DT$column[index] =
> > new value
> >
> > This would make the following task difficult:
> >
> > # DOES NOT WORK #
> > > DT[join/select, {
> > columnA <- calculation based on columnB, C, D, ...
> > }]
> > # DOES NOT WORK #
> >
> > It didn't complain, but it doesn't change value at all. I
> guess this
> > is due to the syntax of with in data.frame because it doesn't work
> > there, either.
> >
> > At this moment, my solution is:
> > > DT[join/select, {
> > DT$columnA[index] <<- calculation based on columnB, C, D, ...
> > }]
> >
> > with the help of DT$columnA[index] and super assign <<-. We
> also need
> > to either get index by ourselves like DT[select/join, which = T] or
> > store it first. Not sure whether this is the best solution.
> >
> > In DF, it would be
> >
> > > index = using scan
> > > DF[index, columnA] = with(DF, calculation based on columnB,
> > C, D, ...)
> >
> > Note that this doesn't work for DT. At this moment, the only way I
> > found to edit DT is
> > > DT$column[index] = new value
> >
> > I don't think my example is uncommon, but I can't find
> common solution
> > using data.table. Maybe, I missed something.
> >
> > Any comments will be highly appreciated. Thank you very
> much again for
> > your help.
> >
> > Best regards,
> > _______________________________________________
> > datatable-help mailing list
> > datatable-help at lists.r-forge.r-project.org
> > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/d
> atatable-help
> >
More information about the datatable-help
mailing list