[datatable-help] mapply cannot modify in place when iterating over list of DTs

Matthew Dowle mdowle at mdowle.plus.com
Fri Sep 20 20:18:44 CEST 2013


Does this sentence from the warning help?

" Also, in R<v3.1.0, list(DT1,DT2) copied the entire DT1 and DT2 (R's 
list() used to copy named objects); please upgrade to R>=v3.1.0 if that 
is biting. "

Matthew

On 20/09/13 19:01, Ricardo Saporta wrote:
> One warning per DT in the list
>   (I added the line breaks)
> -Rick
> =============================================
> Warning messages:
>
> 1: In `[.data.table`(DT, , `:=`(c("Col3", "Col4"), list(C3, C4))) :
>
>   Invalid .internal.selfref detected and fixed by taking a copy of the 
> whole table so that := can add this new column by reference. At an 
> earlier point, this data.table has been copied by R (or been created 
> manually using structure() or similar). Avoid key<-, names<- and 
> attr<- which in R currently (and oddly) may copy the whole data.table. 
> Use set* syntax instead to avoid copying: ?set, ?setnames and 
> ?setattr. Also, in R<v3.1.0, list(DT1,DT2) copied the entire DT1 and 
> DT2 (R's list() used to copy named objects); please upgrade to 
> R>=v3.1.0 if that is biting. If this message doesn't help, please 
> report to datatable-help so the root cause can be fixed.
>
> 2: In `[.data.table`(DT, , `:=`(c("Col3", "Col4"), list(C3, C4))) :
>
>   Invalid .internal.selfref detected and fixed by taking a copy of the 
> whole table so that := can add this new column by reference. At an 
> earlier point, this data.table has been copied by R (or been created 
> manually using structure() or similar). Avoid key<-, names<- and 
> attr<- which in R currently (and oddly) may copy the whole data.table. 
> Use set* syntax instead to avoid copying: ?set, ?setnames and 
> ?setattr. Also, in R<v3.1.0, list(DT1,DT2) copied the entire DT1 and 
> DT2 (R's list() used to copy named objects); please upgrade to 
> R>=v3.1.0 if that is biting. If this message doesn't help, please 
> report to datatable-help so the root cause can be fixed.
> =============================================
>
>
>
>
> On Fri, Sep 20, 2013 at 12:49 PM, Matthew Dowle 
> <mdowle at mdowle.plus.com <mailto:mdowle at mdowle.plus.com>> wrote:
>
>
>     Hi,
>
>     What's the warning?
>
>     Matthew
>
>
>
>     On 20/09/13 14:48, Ricardo Saporta wrote:
>>     I've encountered the following issue iterating over a list of
>>     data.tables.
>>     The issue is only with mapply, not with lapply .
>>
>>     Given a list of data.table's, mapply'ing over the list directly
>>     cannot modify in place.
>>
>>     Also if attempting to add a new column, we get an "Invalid
>>     .internal.selfref" warning.
>>     Modifying an existing column does not issue a warning, but still
>>     fails to modify-in-place
>>
>>     WORKAROUND:
>>     ----------
>>     The workaround is to iterate over an index to the list, then to
>>       modify each data.table via list.of.DTs[[i]][ .. ]
>>
>>     **Interestingly, this issue occurs with `mapply`, but not `lapply`.**
>>
>>     EXAMPLE:
>>     --------
>>       # Given a list of DT's and two lists of vectors,
>>       #   we want to add the corresponding vectors as columns to the DT.
>>
>>     ## ---------------- ##
>>     ##   SAMPLE DATA:   ##
>>     ## ---------------- ##
>>       # list of data.tables
>>       list.DT <- list(
>>         DT1=data.table(Col1=111:115, Col2=121:125),
>>         DT2=data.table(Col1=211:215, Col2=221:225)
>>         )
>>
>>       # lists of columns to add
>>       list.Col3 <- list(131:135, 231:235)
>>       list.Col4 <- list(141:145, 241:245)
>>
>>
>>     ## ------------------------------------ ##
>>     ##   Iterating over the list elements ##
>>     ##     adding a new column  ##
>>     ## ------------------------------------ ##
>>     ##   Will issue warning and ##
>>     ##     will fail to modify in place ##
>>     ## ------------------------------------ ##
>>       mapply (
>>           function(DT, C3, C4)
>>              DT[, c("Col3", "Col4") := list(C3, C4)],
>>           list.DT,  # iterating over the list
>>           list.Col3, list.Col4,
>>           SIMPLIFY=FALSE
>>         )
>>
>>       ## Note the lack of change
>>       list.DT
>>
>>
>>     ## ------------------------------------ ##
>>     ##   Iterating over an index  ##
>>     ## ------------------------------------ ##
>>       mapply (
>>           function(i, C3, C4)
>>              list.DT[[i]] [, c("Col3", "Col4") := list(C3, C4)],
>>           seq(list.DT),   # iterating over an index to the list
>>           list.Col3, list.Col4,
>>           SIMPLIFY=FALSE
>>         )
>>
>>       ## Note each DT _has_ been modified
>>       list.DT
>>
>>     ## ------------------------------------ ##
>>     ##   Iterating over the list elements ##
>>     ##     modifying existing column  ##
>>     ## ------------------------------------ ##
>>     ##   No warning issued, but ##
>>     ##     Will fail to modify in place ##
>>     ## ------------------------------------ ##
>>       mapply (
>>           function(DT, C3, C4)
>>              DT[, c("Col3", "Col4") := list(Col3*1e3, Col4*1e4)],
>>
>>           list.DT,  # iterating over the list
>>           list.Col3, list.Col4,
>>           SIMPLIFY=FALSE
>>         )
>>
>>       ## Note the lack of change (compare with output from `mapply`)
>>       list.DT
>>
>>     ## ------------------------------------ ##
>>     ##  ##
>>     ##   `lapply` works as expected.  ##
>>     ##  ##
>>     ## ------------------------------------ ##
>>       ## NOW WITH lapply
>>       lapply(list.DT,
>>         function(DT)
>>           DT[, newCol := LETTERS[1:5]]
>>       )
>>
>>       ## Note the new column:
>>       list.DT
>>
>>
>>
>>     # ========================== #
>>
>>     ##   NON-WORKAROUNDS   ##
>>     ##
>>     ## I also tried all of the following alternatives
>>     ##   in hopes of being able to iterate over the list
>>     ##   directly, using `mapply`.
>>     ## None of these worked.
>>
>>     # (1) Creating the DTs First, then creating the list from them
>>         DT1 <- data.table(Col1=111:115, Col2=121:125)
>>         DT2 <- data.table(Col1=211:215, Col2=221:225)
>>
>>         list.DT <- list(DT1=DT1,DT2=DT2 )
>>
>>
>>     # (2) Same as 1, and using `copy()` in the call to `list()`
>>         list.DT <- list(DT1=copy(DT1),
>>                         DT2=copy(DT2) )
>>
>>     # (3) lapply'ing `copy` and then iterating over that list
>>         list.DT <- lapply(list.DT, copy)
>>
>>     # (4) Not naming the list elements
>>         list.DT <- list(DT1, DT2)
>>         # and tried
>>         list.DT <- list(copy(DT1), copy(DT2))
>>
>>     ## All of the above still failed to modify in place
>>     ##   (and also issued the same warning if trying to add a column)
>>     ##    when iterating using mapply
>>
>>       mapply(function(DT, C3, C4)
>>         DT[, c("Col3", "Col4") := list(C3, C4)],
>>         list.DT, list.Col3, list.Col4,
>>         SIMPLIFY=FALSE)
>>
>>
>>     # ========================== #
>>
>>
>>     Ricardo Saporta
>>     Rutgers University, New Jersey
>>     e: saporta at rutgers.edu <mailto:saporta at rutgers.edu>
>>
>>
>>
>>     _______________________________________________
>>     datatable-help mailing list
>>     datatable-help at lists.r-forge.r-project.org  <mailto:datatable-help at lists.r-forge.r-project.org>
>>     https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
>
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/datatable-help/attachments/20130920/7c560937/attachment-0001.html>


More information about the datatable-help mailing list