[datatable-help] mapply cannot modify in place when iterating over list of DTs

Matthew Dowle mdowle at mdowle.plus.com
Tue Sep 24 03:42:38 CEST 2013


Hi,
Basically adding columns by reference to a data.table when it's a member 
of a list of data.table, is really difficult to handle internally.  I 
had to special case internally to get around list() copying, so that the 
binding can change inside the list on the shallow copy when [[ is used.  
A for loop is the way to add columns by reference inside a list of 
data.table, and that should work ok using [[.  But doing that via lapply 
and mapply is really stretching it.  Even catching user expectations in 
this area is difficult.  Ideally we'd catch mapply, yes,  but really 
data.table likes to be rbindlist()-ed and then ops to work on a single 
large data.table.  We can advice to the warning message not to use 
mapply or lapply to add columns by reference to a list of data.table 
(use a for loop instead) ?
Matthew


On 22/09/13 03:02, Ricardo Saporta wrote:
> Matthew,
>
> I did notice the warning, but something doesnt add up:
>
> If the issue is simply that it is being copied when created, then 
> wouldnt we expect the same warning to arise when we try to modify the 
> table in using `mapply` or `lapply`? (the latter does not produce a 
> warning.
>
> If on the otherhand, the issue pertains specifically to mapply (which 
> I assume it does), then why is it only a problem when we iterate over 
> the list directly, whereas iterating indirectly by using an index does 
> not produce any warnings.
> While overall, this is minor if one is aware of the issue, I think it 
> might allow for unnoticed bugs to creep into someones code. 
> Specifically if using mapply to modify a list of DTs and the user not 
> realizing that the modifications are not being held.
>
> That being said, I'm not sure how this could even be addressed if the 
> root is in mapply, but is it worth trying to address?
>
> Rick
>
>
> On Fri, Sep 20, 2013 at 2:18 PM, Matthew Dowle <mdowle at mdowle.plus.com 
> <mailto:mdowle at mdowle.plus.com>> wrote:
>
>     Does this sentence from the warning help?
>
>
>     " Also, in R<v3.1.0, list(DT1,DT2) copied the entire DT1 and DT2
>     (R's list() used to copy named objects); please upgrade to
>     R>=v3.1.0 if that is biting. "
>
>     Matthew
>
>
>     On 20/09/13 19:01, Ricardo Saporta wrote:
>>     One warning per DT in the list
>>       (I added the line breaks)
>>     -Rick
>>     =============================================
>>     Warning messages:
>>
>>     1: In `[.data.table`(DT, , `:=`(c("Col3", "Col4"), list(C3, C4))) :
>>
>>       Invalid .internal.selfref detected and fixed by taking a copy
>>     of the whole table so that := can add this new column by
>>     reference. At an earlier point, this data.table has been copied
>>     by R (or been created manually using structure() or similar).
>>     Avoid key<-, names<- and attr<- which in R currently (and oddly)
>>     may copy the whole data.table. Use set* syntax instead to avoid
>>     copying: ?set, ?setnames and ?setattr. Also, in R<v3.1.0,
>>     list(DT1,DT2) copied the entire DT1 and DT2 (R's list() used to
>>     copy named objects); please upgrade to R>=v3.1.0 if that is
>>     biting. If this message doesn't help, please report to
>>     datatable-help so the root cause can be fixed.
>>
>>     2: In `[.data.table`(DT, , `:=`(c("Col3", "Col4"), list(C3, C4))) :
>>
>>       Invalid .internal.selfref detected and fixed by taking a copy
>>     of the whole table so that := can add this new column by
>>     reference. At an earlier point, this data.table has been copied
>>     by R (or been created manually using structure() or similar).
>>     Avoid key<-, names<- and attr<- which in R currently (and oddly)
>>     may copy the whole data.table. Use set* syntax instead to avoid
>>     copying: ?set, ?setnames and ?setattr. Also, in R<v3.1.0,
>>     list(DT1,DT2) copied the entire DT1 and DT2 (R's list() used to
>>     copy named objects); please upgrade to R>=v3.1.0 if that is
>>     biting. If this message doesn't help, please report to
>>     datatable-help so the root cause can be fixed.
>>     =============================================
>>
>>
>>
>>
>>     On Fri, Sep 20, 2013 at 12:49 PM, Matthew Dowle
>>     <mdowle at mdowle.plus.com <mailto:mdowle at mdowle.plus.com>> wrote:
>>
>>
>>         Hi,
>>
>>         What's the warning?
>>
>>         Matthew
>>
>>
>>
>>         On 20/09/13 14:48, Ricardo Saporta wrote:
>>>         I've encountered the following issue iterating over a list
>>>         of data.tables.
>>>         The issue is only with mapply, not with lapply .
>>>
>>>         Given a list of data.table's, mapply'ing over the list directly
>>>         cannot modify in place.
>>>
>>>         Also if attempting to add a new column, we get an "Invalid
>>>         .internal.selfref" warning.
>>>         Modifying an existing column does not issue a warning, but
>>>         still fails to modify-in-place
>>>
>>>         WORKAROUND:
>>>         ----------
>>>         The workaround is to iterate over an index to the list, then to
>>>           modify each data.table via list.of.DTs[[i]][ .. ]
>>>
>>>         **Interestingly, this issue occurs with `mapply`, but not
>>>         `lapply`.**
>>>
>>>         EXAMPLE:
>>>         --------
>>>           # Given a list of DT's and two lists of vectors,
>>>           #   we want to add the corresponding vectors as columns to
>>>         the DT.
>>>
>>>         ## ---------------- ##
>>>         ##   SAMPLE DATA:   ##
>>>         ## ---------------- ##
>>>           # list of data.tables
>>>           list.DT <- list(
>>>         DT1=data.table(Col1=111:115, Col2=121:125),
>>>         DT2=data.table(Col1=211:215, Col2=221:225)
>>>             )
>>>
>>>           # lists of columns to add
>>>           list.Col3 <- list(131:135, 231:235)
>>>           list.Col4 <- list(141:145, 241:245)
>>>
>>>
>>>         ## ------------------------------------ ##
>>>         ##   Iterating over the list elements   ##
>>>         ##     adding a new column              ##
>>>         ## ------------------------------------ ##
>>>         ##   Will issue warning and             ##
>>>         ##     will fail to modify in place     ##
>>>         ## ------------------------------------ ##
>>>           mapply (
>>>               function(DT, C3, C4)
>>>                  DT[, c("Col3", "Col4") := list(C3, C4)],
>>>               list.DT,  # iterating over the list
>>>               list.Col3, list.Col4,
>>>               SIMPLIFY=FALSE
>>>             )
>>>
>>>           ## Note the lack of change
>>>           list.DT
>>>
>>>
>>>         ## ------------------------------------ ##
>>>         ##   Iterating over an index            ##
>>>         ## ------------------------------------ ##
>>>           mapply (
>>>               function(i, C3, C4)
>>>                  list.DT[[i]] [, c("Col3", "Col4") := list(C3, C4)],
>>>               seq(list.DT),   # iterating over an index to the list
>>>               list.Col3, list.Col4,
>>>               SIMPLIFY=FALSE
>>>             )
>>>
>>>           ## Note each DT _has_ been modified
>>>           list.DT
>>>
>>>         ## ------------------------------------ ##
>>>         ##   Iterating over the list elements   ##
>>>         ##     modifying existing column        ##
>>>         ## ------------------------------------ ##
>>>         ##   No warning issued, but             ##
>>>         ##     Will fail to modify in place     ##
>>>         ## ------------------------------------ ##
>>>           mapply (
>>>               function(DT, C3, C4)
>>>                  DT[, c("Col3", "Col4") := list(Col3*1e3, Col4*1e4)],
>>>
>>>               list.DT,  # iterating over the list
>>>               list.Col3, list.Col4,
>>>               SIMPLIFY=FALSE
>>>             )
>>>
>>>           ## Note the lack of change (compare with output from `mapply`)
>>>           list.DT
>>>
>>>         ## ------------------------------------ ##
>>>         ##                ##
>>>         ##   `lapply` works as expected.        ##
>>>         ##                ##
>>>         ## ------------------------------------ ##
>>>           ## NOW WITH lapply
>>>           lapply(list.DT,
>>>             function(DT)
>>>               DT[, newCol := LETTERS[1:5]]
>>>           )
>>>
>>>           ## Note the new column:
>>>           list.DT
>>>
>>>
>>>
>>>         # ========================== #
>>>
>>>         ##   NON-WORKAROUNDS ##
>>>         ##
>>>         ## I also tried all of the following alternatives
>>>         ##   in hopes of being able to iterate over the list
>>>         ##   directly, using `mapply`.
>>>         ## None of these worked.
>>>
>>>         # (1) Creating the DTs First, then creating the list from them
>>>             DT1 <- data.table(Col1=111:115, Col2=121:125)
>>>             DT2 <- data.table(Col1=211:215, Col2=221:225)
>>>
>>>             list.DT <- list(DT1=DT1,DT2=DT2 )
>>>
>>>
>>>         # (2) Same as 1, and using `copy()` in the call to `list()`
>>>             list.DT <- list(DT1=copy(DT1),
>>>         DT2=copy(DT2) )
>>>
>>>         # (3) lapply'ing `copy` and then iterating over that list
>>>             list.DT <- lapply(list.DT, copy)
>>>
>>>         # (4) Not naming the list elements
>>>             list.DT <- list(DT1, DT2)
>>>             # and tried
>>>             list.DT <- list(copy(DT1), copy(DT2))
>>>
>>>         ## All of the above still failed to modify in place
>>>         ##   (and also issued the same warning if trying to add a
>>>         column)
>>>         ##    when iterating using mapply
>>>
>>>           mapply(function(DT, C3, C4)
>>>             DT[, c("Col3", "Col4") := list(C3, C4)],
>>>             list.DT, list.Col3, list.Col4,
>>>             SIMPLIFY=FALSE)
>>>
>>>
>>>         # ========================== #
>>>
>>>
>>>         Ricardo Saporta
>>>         Rutgers University, New Jersey
>>>         e: saporta at rutgers.edu <mailto:saporta at rutgers.edu>
>>>
>>>
>>>
>>>         _______________________________________________
>>>         datatable-help mailing list
>>>         datatable-help at lists.r-forge.r-project.org  <mailto:datatable-help at lists.r-forge.r-project.org>
>>>         https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
>>
>>
>
>
>
>
> _______________________________________________
> datatable-help mailing list
> datatable-help at lists.r-forge.r-project.org
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/datatable-help/attachments/20130924/963b7191/attachment-0001.html>


More information about the datatable-help mailing list