[datatable-help] Wonder whether there is an easier way to changepart of data.table values

Short, Tom TShort at epri.com
Tue Aug 3 04:28:13 CEST 2010


I've just checked in versions of [<-.data.table and $<-.data.table that
check for the columns adjusted and reset the key if appropriate. This
brings up some incompatibilities:

(*) KEYS -- Before, you could do:

dt$key_column = anything

And it wouldn't change the status of the key. Now, the key will be
nullified.

(*) ASSIGNMENT DIFFERENCE

Before: dt["a"] <- "b" meant change column a.
Now:    dt[,"a"] <- "b" 

Now, you can do 
dt[J("a"), "somecol"] <- 33 means assign 33 to the column "somecol"
based on the key being equal to "a".

(*) QUESTIONS

- Do we need a "keep.key" argument for cases where we don't want the key
nullified (the user knows the order is unaffected). This isn't really
possible for dt$a[1:4] <- something.

- Is it a good idea to use data.table-style indexing for the i part of
[<-.data.table? I was skeptical when Branson first asked (prefering
data.frame compatibility), but it makes more sense now that I think
about it.

Finally, we should put a warning in the documentation somewhere that
functions that re-arrange or assign values to a data frame may "corrupt"
the key of a data.table. Data.table-aware functions should account for
this, but other functions may not.

- Tom
 

> -----Original Message-----
> From: Short, Tom 
> Sent: Thursday, July 29, 2010 15:39
> To: 'Branson Owen'; datatable-help at lists.r-forge.r-project.org
> Subject: RE: [datatable-help] Wonder whether there is an 
> easier way to changepart of data.table values
> 
> Branson,
> 
> [<-.data.table and $<-.data.table both need a bit of work. 
> 
> > DT <- data.table(A = c("A", "Z"), Z = 1:10, key = "A")
> > 
> 
> You've found the DT$column[index] = new value approach, but 
> if you use that on keys, DT may no longer be sorted right:
> 
> > DT$A[10] <- "A"
> > DT
>       A   Z
>  [1,] A   1
>  [2,] A   3
>  [3,] A   5
>  [4,] A   7
>  [5,] A   9
>  [6,] Z   2
>  [7,] Z   4
>  [8,] Z   6
>  [9,] Z   8
> [10,] A 100
> 
> Since we now inherit from data.frames, we can just use 
> [<-.data.frame. It still has the problem that it won't remove 
> the key if the key'd column changes.
> 
> > 
> > `[<-.data.table` <- `[<-.data.frame`
> > DT[9,"Z"] <- 22
> > DT
>       A   Z
>  [1,] A   1
>  [2,] A   3
>  [3,] A   5
>  [4,] A   7
>  [5,] A   9
>  [6,] Z   2
>  [7,] Z   4
>  [8,] Z   6
>  [9,] Z  22
> [10,] A 100
> 
> I'm not sure we want to be able to do DT[select,] <- something.
> 
> Something like the following will work for a simple select:
> 
> > DT <- data.table(A = c("A", "Z"), Z = 1:10, key = "A") 
> > `[<-.data.table` <- `[<-.data.frame` DT[DT[J("A"), which=TRUE, 
> > mult="all"], "Z"] <- 44 DT
>       A  Z
>  [1,] A 44
>  [2,] A 44
>  [3,] A 44
>  [4,] A 44
>  [5,] A 44
>  [6,] Z  2
>  [7,] Z  4
>  [8,] Z  6
>  [9,] Z  8
> [10,] Z 10
> 
> The following is equivalent:
> 
> > DT$Z[DT[J("A"), which=TRUE, mult="all"]] <- 55 DT
>       A  Z
>  [1,] A 55
>  [2,] A 55
>  [3,] A 55
>  [4,] A 55
>  [5,] A 55
>  [6,] Z  2
>  [7,] Z  4
>  [8,] Z  6
>  [9,] Z  8
> [10,] Z 10
> 
> I'd prefer to use as much of [<-.data.frame and 
> $<-.data.frame as possible.  $<-.data.frame is pretty easy:
> 
> > "$<-.data.table" = function (x, name, value) {
> +     res <- `$<-.data.frame`(x, name, value)
> +     if (any(name %in% key(x)))
> +         key(res) <- NULL
> +     res
> + }
> > DT <- data.table(A = c("A", "Z"), Z = 1:10, key = "A") DT$Z[3] <- 33
> > key(DT)
> [1] "A"
> > DT$A[10] <- "A"
> > key(DT)
> NULL
> > DT
>       A  Z
>  [1,] A  1
>  [2,] A  3
>  [3,] A 33
>  [4,] A  7
>  [5,] A  9
>  [6,] Z  2
>  [7,] Z  4
>  [8,] Z  6
>  [9,] Z  8
> [10,] A 10
> 
> This doesn't allow x to be a data.table-style select. If we 
> want that, I could experiment some.
> 
> [<-.data.table is more challenging, but I could take a shot 
> on a plane ride next week.
> 
> - Tom 
> 
>  
> 
> > -----Original Message-----
> > From: datatable-help-bounces at lists.r-forge.r-project.org
> > [mailto:datatable-help-bounces at lists.r-forge.r-project.org]
> > On Behalf Of Branson Owen
> > Sent: Thursday, July 29, 2010 14:39
> > To: datatable-help at lists.r-forge.r-project.org
> > Subject: [datatable-help] Wonder whether there is an easier way to 
> > changepart of data.table values
> > 
> > I thought I have no more question, but ... Please take your time to 
> > respond, I don't want to overwhelm your time.
> > 
> > ** I want to only change values for certain rows of 
> selected columns. 
> > **
> > 
> > In data.frame, I can do something like:
> > 
> > > DF[row index, "column"] = new value.
> > 
> > In data.table, this has been disabled even using "with = FALSE"
> > 
> > >DT[3,"Z", with = FALSE]
> > 
> >        Z
> > [1,] 20
> > 
> > > DT[3,"Z", with = FALSE] <- 1
> > Error in `[<-.data.table`(`*tmp*`, 3, "Z", with = FALSE, 
> value = 1) :
> >   unused argument(s) (with = FALSE)
> > 
> > 
> > Actually, I found that there is no way I can edit value 
> using [,] in 
> > DT. The only way I found to change value is using 
> DT$column[index] = 
> > new value
> > 
> > This would make the following task difficult:
> > 
> > # DOES NOT WORK #
> > > DT[join/select, {
> > columnA <- calculation based on columnB, C, D, ...
> > }]
> > # DOES NOT WORK #
> > 
> > It didn't complain, but it doesn't change value at all. I 
> guess this 
> > is due to the syntax of with in data.frame because it doesn't work 
> > there, either.
> > 
> > At this moment, my solution is:
> > > DT[join/select, {
> > DT$columnA[index] <<- calculation based on columnB, C, D, ...
> > }]
> > 
> > with the help of DT$columnA[index] and super assign <<-. We 
> also need 
> > to either get index by ourselves like DT[select/join, which = T] or 
> > store it first. Not sure whether this is the best solution.
> > 
> > In DF, it would be
> > 
> > > index = using scan
> > > DF[index, columnA] = with(DF, calculation based on columnB,
> > C, D, ...)
> > 
> > Note that this doesn't work for DT. At this moment, the only way I 
> > found to edit DT is
> > > DT$column[index] = new value
> > 
> > I don't think my example is uncommon, but I can't find 
> common solution 
> > using data.table. Maybe, I missed something.
> > 
> > Any comments will be highly appreciated. Thank you very 
> much again for 
> > your help.
> > 
> > Best regards,
> > _______________________________________________
> > datatable-help mailing list
> > datatable-help at lists.r-forge.r-project.org
> > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/d
> atatable-help
> > 


More information about the datatable-help mailing list