[datatable-help] mapply cannot modify in place when iterating over list of DTs

Ricardo Saporta saporta at scarletmail.rutgers.edu
Tue Sep 24 06:15:18 CEST 2013


On Mon, Sep 23, 2013 at 9:42 PM, Matthew Dowle <mdowle at mdowle.plus.com>wrote:

>
> Hi,
> Basically adding columns by reference to a data.table when it's a member
> of a list of data.table, is really difficult to handle internally.  I had
> to special case internally to get around list() copying, so that the
> binding can change inside the list on the shallow copy when [[ is used.  A
> for loop is the way to add columns by reference inside a list of
> data.table, and that should work ok using [[.  But doing that via lapply
> and mapply is really stretching it.
>

That makes sense.  I took a whack at it, but couldn't even come close.



> Even catching user expectations in this area is difficult.  Ideally we'd
> catch mapply, yes,  but really data.table likes to be rbindlist()-ed and
> then ops to work on a single large data.table.
>

Agreed.  In the application where this came up, I am dealing with a list of
tables with different dims (hence not rbinding)


> We can advice to the warning message not to use mapply or lapply to add
> columns by reference to a list of data.table (use a for loop instead) ?
>

Perhaps a warning that modifications to the DT's in the list are likely to
not have stuck and to use rbindlist when possible?



>
> Matthew
>
>
>
> On 22/09/13 03:02, Ricardo Saporta wrote:
>
> Matthew,
>
>  I did notice the warning, but something doesnt add up:
>
>  If the issue is simply that it is being copied when created, then
> wouldnt we expect the same warning to arise when we try to modify the table
> in using `mapply` or `lapply`? (the latter does not produce a warning.
>
>  If on the otherhand, the issue pertains specifically to mapply (which I
> assume it does), then why is it only a problem when we iterate over the
> list directly, whereas iterating indirectly by using an index does not
> produce any warnings.
>
>   While overall, this is minor if one is aware of the issue, I think it
> might allow for unnoticed bugs to creep into someones code.   Specifically
> if using mapply to modify a list of DTs and the user not realizing that the
> modifications are not being held.
>
>  That being said, I'm not sure how this could even be addressed if the
> root is in mapply, but is it worth trying to address?
>
>  Rick
>
>
> On Fri, Sep 20, 2013 at 2:18 PM, Matthew Dowle <mdowle at mdowle.plus.com>wrote:
>
>>  Does this sentence from the warning help?
>>
>>
>> " Also, in R<v3.1.0, list(DT1,DT2) copied the entire DT1 and DT2 (R's
>> list() used to copy named objects); please upgrade to R>=v3.1.0 if that is
>> biting. "
>>
>>  Matthew
>>
>>
>> On 20/09/13 19:01, Ricardo Saporta wrote:
>>
>> One warning per DT in the list
>>   (I added the line breaks)
>> -Rick
>> =============================================
>>  Warning messages:
>>
>>  1: In `[.data.table`(DT, , `:=`(c("Col3", "Col4"), list(C3, C4))) :
>>
>>    Invalid .internal.selfref detected and fixed by taking a copy of the
>> whole table so that := can add this new column by reference. At an earlier
>> point, this data.table has been copied by R (or been created manually using
>> structure() or similar). Avoid key<-, names<- and attr<- which in R
>> currently (and oddly) may copy the whole data.table. Use set* syntax
>> instead to avoid copying: ?set, ?setnames and ?setattr. Also, in R<v3.1.0,
>> list(DT1,DT2) copied the entire DT1 and DT2 (R's list() used to copy named
>> objects); please upgrade to R>=v3.1.0 if that is biting. If this message
>> doesn't help, please report to datatable-help so the root cause can be
>> fixed.
>>
>>  2: In `[.data.table`(DT, , `:=`(c("Col3", "Col4"), list(C3, C4))) :
>>
>>    Invalid .internal.selfref detected and fixed by taking a copy of the
>> whole table so that := can add this new column by reference. At an earlier
>> point, this data.table has been copied by R (or been created manually using
>> structure() or similar). Avoid key<-, names<- and attr<- which in R
>> currently (and oddly) may copy the whole data.table. Use set* syntax
>> instead to avoid copying: ?set, ?setnames and ?setattr. Also, in R<v3.1.0,
>> list(DT1,DT2) copied the entire DT1 and DT2 (R's list() used to copy named
>> objects); please upgrade to R>=v3.1.0 if that is biting. If this message
>> doesn't help, please report to datatable-help so the root cause can be
>> fixed.
>>  =============================================
>>
>>
>>
>>
>> On Fri, Sep 20, 2013 at 12:49 PM, Matthew Dowle <mdowle at mdowle.plus.com>wrote:
>>
>>>
>>> Hi,
>>>
>>> What's the warning?
>>>
>>> Matthew
>>>
>>>
>>>
>>> On 20/09/13 14:48, Ricardo Saporta wrote:
>>>
>>>   I've encountered the following issue iterating over a list of
>>> data.tables.
>>> The issue is only with mapply, not with lapply .
>>>
>>>
>>> Given a list of data.table's, mapply'ing over the list directly
>>> cannot modify in place.
>>>
>>>  Also if attempting to add a new column, we get an "Invalid
>>> .internal.selfref" warning.
>>> Modifying an existing column does not issue a warning, but still fails
>>> to modify-in-place
>>>
>>>  WORKAROUND:
>>> ----------
>>> The workaround is to iterate over an index to the list, then to
>>>   modify each data.table via list.of.DTs[[i]][ .. ]
>>>
>>>  **Interestingly, this issue occurs with `mapply`, but not `lapply`.**
>>>
>>>
>>> EXAMPLE:
>>> --------
>>>   # Given a list of DT's and two lists of vectors,
>>>   #   we want to add the corresponding vectors as columns to the DT.
>>>
>>>  ## ---------------- ##
>>> ##   SAMPLE DATA:   ##
>>> ## ---------------- ##
>>>   # list of data.tables
>>>   list.DT <- list(
>>>     DT1=data.table(Col1=111:115, Col2=121:125),
>>>     DT2=data.table(Col1=211:215, Col2=221:225)
>>>     )
>>>
>>>    # lists of columns to add
>>>   list.Col3 <- list(131:135, 231:235)
>>>   list.Col4 <- list(141:145, 241:245)
>>>
>>>
>>>  ## ------------------------------------ ##
>>> ##   Iterating over the list elements   ##
>>> ##     adding a new column              ##
>>> ## ------------------------------------ ##
>>> ##   Will issue warning and             ##
>>> ##     will fail to modify in place     ##
>>> ## ------------------------------------ ##
>>>   mapply (
>>>       function(DT, C3, C4)
>>>           DT[, c("Col3", "Col4") := list(C3, C4)],
>>>
>>>       list.DT,  # iterating over the list
>>>       list.Col3, list.Col4,
>>>       SIMPLIFY=FALSE
>>>     )
>>>
>>>    ## Note the lack of change
>>>   list.DT
>>>
>>>
>>>  ## ------------------------------------ ##
>>> ##   Iterating over an index            ##
>>> ## ------------------------------------ ##
>>>   mapply (
>>>       function(i, C3, C4)
>>>          list.DT[[i]] [, c("Col3", "Col4") := list(C3, C4)],
>>>
>>>       seq(list.DT),   # iterating over an index to the list
>>>       list.Col3, list.Col4,
>>>       SIMPLIFY=FALSE
>>>     )
>>>
>>>    ## Note each DT _has_ been modified
>>>   list.DT
>>>
>>>  ## ------------------------------------ ##
>>> ##   Iterating over the list elements   ##
>>> ##     modifying existing column        ##
>>> ## ------------------------------------ ##
>>> ##   No warning issued, but             ##
>>> ##     Will fail to modify in place     ##
>>> ## ------------------------------------ ##
>>>   mapply (
>>>       function(DT, C3, C4)
>>>          DT[, c("Col3", "Col4") := list(Col3*1e3, Col4*1e4)],
>>>
>>>        list.DT,  # iterating over the list
>>>       list.Col3, list.Col4,
>>>       SIMPLIFY=FALSE
>>>     )
>>>
>>>    ## Note the lack of change (compare with output from `mapply`)
>>>   list.DT
>>>
>>>  ## ------------------------------------ ##
>>> ##                                      ##
>>> ##   `lapply` works as expected.        ##
>>> ##                                      ##
>>> ## ------------------------------------ ##
>>>
>>>   ## NOW WITH lapply
>>>   lapply(list.DT,
>>>     function(DT)
>>>       DT[, newCol := LETTERS[1:5]]
>>>   )
>>>
>>>    ## Note the new column:
>>>   list.DT
>>>
>>>
>>>
>>>  # ========================== #
>>>
>>>  ##   NON-WORKAROUNDS   ##
>>> ##
>>> ## I also tried all of the following alternatives
>>> ##   in hopes of being able to iterate over the list
>>> ##   directly, using `mapply`.
>>> ## None of these worked.
>>>
>>>  # (1) Creating the DTs First, then creating the list from them
>>>     DT1 <- data.table(Col1=111:115, Col2=121:125)
>>>     DT2 <- data.table(Col1=211:215, Col2=221:225)
>>>
>>>      list.DT <- list(DT1=DT1,DT2=DT2 )
>>>
>>>
>>>  # (2) Same as 1, and using `copy()` in the call to `list()`
>>>     list.DT <- list(DT1=copy(DT1),
>>>                     DT2=copy(DT2) )
>>>
>>>  # (3) lapply'ing `copy` and then iterating over that list
>>>     list.DT <- lapply(list.DT, copy)
>>>
>>>  # (4) Not naming the list elements
>>>     list.DT <- list(DT1, DT2)
>>>     # and tried
>>>     list.DT <- list(copy(DT1), copy(DT2))
>>>
>>>  ## All of the above still failed to modify in place
>>> ##   (and also issued the same warning if trying to add a column)
>>> ##    when iterating using mapply
>>>
>>>    mapply(function(DT, C3, C4)
>>>     DT[, c("Col3", "Col4") := list(C3, C4)],
>>>     list.DT, list.Col3, list.Col4,
>>>     SIMPLIFY=FALSE)
>>>
>>>
>>>  # ========================== #
>>>
>>>
>>>  Ricardo Saporta
>>>  Rutgers University, New Jersey
>>>  e: saporta at rutgers.edu
>>>
>>>
>>>
>>>  _______________________________________________
>>> datatable-help mailing listdatatable-help at lists.r-forge.r-project.orghttps://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
>>>
>>>
>>>
>>
>>
>
>
> _______________________________________________
> datatable-help mailing listdatatable-help at lists.r-forge.r-project.orghttps://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/datatable-help/attachments/20130924/d90fc865/attachment-0001.html>


More information about the datatable-help mailing list