[datatable-help] mapply cannot modify in place when iterating over list of DTs

Ricardo Saporta saporta at scarletmail.rutgers.edu
Fri Sep 20 20:01:16 CEST 2013


One warning per DT in the list
  (I added the line breaks)
-Rick
=============================================
Warning messages:

1: In `[.data.table`(DT, , `:=`(c("Col3", "Col4"), list(C3, C4))) :

  Invalid .internal.selfref detected and fixed by taking a copy of the
whole table so that := can add this new column by reference. At an earlier
point, this data.table has been copied by R (or been created manually using
structure() or similar). Avoid key<-, names<- and attr<- which in R
currently (and oddly) may copy the whole data.table. Use set* syntax
instead to avoid copying: ?set, ?setnames and ?setattr. Also, in R<v3.1.0,
list(DT1,DT2) copied the entire DT1 and DT2 (R's list() used to copy named
objects); please upgrade to R>=v3.1.0 if that is biting. If this message
doesn't help, please report to datatable-help so the root cause can be
fixed.

2: In `[.data.table`(DT, , `:=`(c("Col3", "Col4"), list(C3, C4))) :

  Invalid .internal.selfref detected and fixed by taking a copy of the
whole table so that := can add this new column by reference. At an earlier
point, this data.table has been copied by R (or been created manually using
structure() or similar). Avoid key<-, names<- and attr<- which in R
currently (and oddly) may copy the whole data.table. Use set* syntax
instead to avoid copying: ?set, ?setnames and ?setattr. Also, in R<v3.1.0,
list(DT1,DT2) copied the entire DT1 and DT2 (R's list() used to copy named
objects); please upgrade to R>=v3.1.0 if that is biting. If this message
doesn't help, please report to datatable-help so the root cause can be
fixed.
=============================================




On Fri, Sep 20, 2013 at 12:49 PM, Matthew Dowle <mdowle at mdowle.plus.com>wrote:

>
> Hi,
>
> What's the warning?
>
> Matthew
>
>
>
> On 20/09/13 14:48, Ricardo Saporta wrote:
>
>  I've encountered the following issue iterating over a list of
> data.tables.
> The issue is only with mapply, not with lapply .
>
>
> Given a list of data.table's, mapply'ing over the list directly
> cannot modify in place.
>
>  Also if attempting to add a new column, we get an "Invalid
> .internal.selfref" warning.
> Modifying an existing column does not issue a warning, but still fails to
> modify-in-place
>
>  WORKAROUND:
> ----------
> The workaround is to iterate over an index to the list, then to
>   modify each data.table via list.of.DTs[[i]][ .. ]
>
>  **Interestingly, this issue occurs with `mapply`, but not `lapply`.**
>
>
> EXAMPLE:
> --------
>   # Given a list of DT's and two lists of vectors,
>   #   we want to add the corresponding vectors as columns to the DT.
>
>  ## ---------------- ##
> ##   SAMPLE DATA:   ##
> ## ---------------- ##
>   # list of data.tables
>   list.DT <- list(
>     DT1=data.table(Col1=111:115, Col2=121:125),
>     DT2=data.table(Col1=211:215, Col2=221:225)
>     )
>
>    # lists of columns to add
>   list.Col3 <- list(131:135, 231:235)
>   list.Col4 <- list(141:145, 241:245)
>
>
>  ## ------------------------------------ ##
> ##   Iterating over the list elements   ##
> ##     adding a new column              ##
> ## ------------------------------------ ##
> ##   Will issue warning and             ##
> ##     will fail to modify in place     ##
> ## ------------------------------------ ##
>   mapply (
>       function(DT, C3, C4)
>           DT[, c("Col3", "Col4") := list(C3, C4)],
>
>       list.DT,  # iterating over the list
>       list.Col3, list.Col4,
>       SIMPLIFY=FALSE
>     )
>
>    ## Note the lack of change
>   list.DT
>
>
>  ## ------------------------------------ ##
> ##   Iterating over an index            ##
> ## ------------------------------------ ##
>   mapply (
>       function(i, C3, C4)
>          list.DT[[i]] [, c("Col3", "Col4") := list(C3, C4)],
>
>       seq(list.DT),   # iterating over an index to the list
>       list.Col3, list.Col4,
>       SIMPLIFY=FALSE
>     )
>
>    ## Note each DT _has_ been modified
>   list.DT
>
>  ## ------------------------------------ ##
> ##   Iterating over the list elements   ##
> ##     modifying existing column        ##
> ## ------------------------------------ ##
> ##   No warning issued, but             ##
> ##     Will fail to modify in place     ##
> ## ------------------------------------ ##
>   mapply (
>       function(DT, C3, C4)
>          DT[, c("Col3", "Col4") := list(Col3*1e3, Col4*1e4)],
>
>        list.DT,  # iterating over the list
>       list.Col3, list.Col4,
>       SIMPLIFY=FALSE
>     )
>
>    ## Note the lack of change (compare with output from `mapply`)
>   list.DT
>
>  ## ------------------------------------ ##
> ##                                      ##
> ##   `lapply` works as expected.        ##
> ##                                      ##
> ## ------------------------------------ ##
>
>   ## NOW WITH lapply
>   lapply(list.DT,
>     function(DT)
>       DT[, newCol := LETTERS[1:5]]
>   )
>
>    ## Note the new column:
>   list.DT
>
>
>
>  # ========================== #
>
>  ##   NON-WORKAROUNDS   ##
> ##
> ## I also tried all of the following alternatives
> ##   in hopes of being able to iterate over the list
> ##   directly, using `mapply`.
> ## None of these worked.
>
>  # (1) Creating the DTs First, then creating the list from them
>     DT1 <- data.table(Col1=111:115, Col2=121:125)
>     DT2 <- data.table(Col1=211:215, Col2=221:225)
>
>      list.DT <- list(DT1=DT1,DT2=DT2 )
>
>
>  # (2) Same as 1, and using `copy()` in the call to `list()`
>     list.DT <- list(DT1=copy(DT1),
>                     DT2=copy(DT2) )
>
>  # (3) lapply'ing `copy` and then iterating over that list
>     list.DT <- lapply(list.DT, copy)
>
>  # (4) Not naming the list elements
>     list.DT <- list(DT1, DT2)
>     # and tried
>     list.DT <- list(copy(DT1), copy(DT2))
>
>  ## All of the above still failed to modify in place
> ##   (and also issued the same warning if trying to add a column)
> ##    when iterating using mapply
>
>    mapply(function(DT, C3, C4)
>     DT[, c("Col3", "Col4") := list(C3, C4)],
>     list.DT, list.Col3, list.Col4,
>     SIMPLIFY=FALSE)
>
>
>  # ========================== #
>
>
>  Ricardo Saporta
>  Rutgers University, New Jersey
>  e: saporta at rutgers.edu
>
>
>
> _______________________________________________
> datatable-help mailing listdatatable-help at lists.r-forge.r-project.orghttps://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/datatable-help/attachments/20130920/76f92312/attachment.html>


More information about the datatable-help mailing list