[datatable-help] mapply cannot modify in place when iterating over list of DTs
Ricardo Saporta
saporta at scarletmail.rutgers.edu
Tue Sep 24 06:15:18 CEST 2013
On Mon, Sep 23, 2013 at 9:42 PM, Matthew Dowle <mdowle at mdowle.plus.com>wrote:
>
> Hi,
> Basically adding columns by reference to a data.table when it's a member
> of a list of data.table, is really difficult to handle internally. I had
> to special case internally to get around list() copying, so that the
> binding can change inside the list on the shallow copy when [[ is used. A
> for loop is the way to add columns by reference inside a list of
> data.table, and that should work ok using [[. But doing that via lapply
> and mapply is really stretching it.
>
That makes sense. I took a whack at it, but couldn't even come close.
> Even catching user expectations in this area is difficult. Ideally we'd
> catch mapply, yes, but really data.table likes to be rbindlist()-ed and
> then ops to work on a single large data.table.
>
Agreed. In the application where this came up, I am dealing with a list of
tables with different dims (hence not rbinding)
> We can advice to the warning message not to use mapply or lapply to add
> columns by reference to a list of data.table (use a for loop instead) ?
>
Perhaps a warning that modifications to the DT's in the list are likely to
not have stuck and to use rbindlist when possible?
>
> Matthew
>
>
>
> On 22/09/13 03:02, Ricardo Saporta wrote:
>
> Matthew,
>
> I did notice the warning, but something doesnt add up:
>
> If the issue is simply that it is being copied when created, then
> wouldnt we expect the same warning to arise when we try to modify the table
> in using `mapply` or `lapply`? (the latter does not produce a warning.
>
> If on the otherhand, the issue pertains specifically to mapply (which I
> assume it does), then why is it only a problem when we iterate over the
> list directly, whereas iterating indirectly by using an index does not
> produce any warnings.
>
> While overall, this is minor if one is aware of the issue, I think it
> might allow for unnoticed bugs to creep into someones code. Specifically
> if using mapply to modify a list of DTs and the user not realizing that the
> modifications are not being held.
>
> That being said, I'm not sure how this could even be addressed if the
> root is in mapply, but is it worth trying to address?
>
> Rick
>
>
> On Fri, Sep 20, 2013 at 2:18 PM, Matthew Dowle <mdowle at mdowle.plus.com>wrote:
>
>> Does this sentence from the warning help?
>>
>>
>> " Also, in R<v3.1.0, list(DT1,DT2) copied the entire DT1 and DT2 (R's
>> list() used to copy named objects); please upgrade to R>=v3.1.0 if that is
>> biting. "
>>
>> Matthew
>>
>>
>> On 20/09/13 19:01, Ricardo Saporta wrote:
>>
>> One warning per DT in the list
>> (I added the line breaks)
>> -Rick
>> =============================================
>> Warning messages:
>>
>> 1: In `[.data.table`(DT, , `:=`(c("Col3", "Col4"), list(C3, C4))) :
>>
>> Invalid .internal.selfref detected and fixed by taking a copy of the
>> whole table so that := can add this new column by reference. At an earlier
>> point, this data.table has been copied by R (or been created manually using
>> structure() or similar). Avoid key<-, names<- and attr<- which in R
>> currently (and oddly) may copy the whole data.table. Use set* syntax
>> instead to avoid copying: ?set, ?setnames and ?setattr. Also, in R<v3.1.0,
>> list(DT1,DT2) copied the entire DT1 and DT2 (R's list() used to copy named
>> objects); please upgrade to R>=v3.1.0 if that is biting. If this message
>> doesn't help, please report to datatable-help so the root cause can be
>> fixed.
>>
>> 2: In `[.data.table`(DT, , `:=`(c("Col3", "Col4"), list(C3, C4))) :
>>
>> Invalid .internal.selfref detected and fixed by taking a copy of the
>> whole table so that := can add this new column by reference. At an earlier
>> point, this data.table has been copied by R (or been created manually using
>> structure() or similar). Avoid key<-, names<- and attr<- which in R
>> currently (and oddly) may copy the whole data.table. Use set* syntax
>> instead to avoid copying: ?set, ?setnames and ?setattr. Also, in R<v3.1.0,
>> list(DT1,DT2) copied the entire DT1 and DT2 (R's list() used to copy named
>> objects); please upgrade to R>=v3.1.0 if that is biting. If this message
>> doesn't help, please report to datatable-help so the root cause can be
>> fixed.
>> =============================================
>>
>>
>>
>>
>> On Fri, Sep 20, 2013 at 12:49 PM, Matthew Dowle <mdowle at mdowle.plus.com>wrote:
>>
>>>
>>> Hi,
>>>
>>> What's the warning?
>>>
>>> Matthew
>>>
>>>
>>>
>>> On 20/09/13 14:48, Ricardo Saporta wrote:
>>>
>>> I've encountered the following issue iterating over a list of
>>> data.tables.
>>> The issue is only with mapply, not with lapply .
>>>
>>>
>>> Given a list of data.table's, mapply'ing over the list directly
>>> cannot modify in place.
>>>
>>> Also if attempting to add a new column, we get an "Invalid
>>> .internal.selfref" warning.
>>> Modifying an existing column does not issue a warning, but still fails
>>> to modify-in-place
>>>
>>> WORKAROUND:
>>> ----------
>>> The workaround is to iterate over an index to the list, then to
>>> modify each data.table via list.of.DTs[[i]][ .. ]
>>>
>>> **Interestingly, this issue occurs with `mapply`, but not `lapply`.**
>>>
>>>
>>> EXAMPLE:
>>> --------
>>> # Given a list of DT's and two lists of vectors,
>>> # we want to add the corresponding vectors as columns to the DT.
>>>
>>> ## ---------------- ##
>>> ## SAMPLE DATA: ##
>>> ## ---------------- ##
>>> # list of data.tables
>>> list.DT <- list(
>>> DT1=data.table(Col1=111:115, Col2=121:125),
>>> DT2=data.table(Col1=211:215, Col2=221:225)
>>> )
>>>
>>> # lists of columns to add
>>> list.Col3 <- list(131:135, 231:235)
>>> list.Col4 <- list(141:145, 241:245)
>>>
>>>
>>> ## ------------------------------------ ##
>>> ## Iterating over the list elements ##
>>> ## adding a new column ##
>>> ## ------------------------------------ ##
>>> ## Will issue warning and ##
>>> ## will fail to modify in place ##
>>> ## ------------------------------------ ##
>>> mapply (
>>> function(DT, C3, C4)
>>> DT[, c("Col3", "Col4") := list(C3, C4)],
>>>
>>> list.DT, # iterating over the list
>>> list.Col3, list.Col4,
>>> SIMPLIFY=FALSE
>>> )
>>>
>>> ## Note the lack of change
>>> list.DT
>>>
>>>
>>> ## ------------------------------------ ##
>>> ## Iterating over an index ##
>>> ## ------------------------------------ ##
>>> mapply (
>>> function(i, C3, C4)
>>> list.DT[[i]] [, c("Col3", "Col4") := list(C3, C4)],
>>>
>>> seq(list.DT), # iterating over an index to the list
>>> list.Col3, list.Col4,
>>> SIMPLIFY=FALSE
>>> )
>>>
>>> ## Note each DT _has_ been modified
>>> list.DT
>>>
>>> ## ------------------------------------ ##
>>> ## Iterating over the list elements ##
>>> ## modifying existing column ##
>>> ## ------------------------------------ ##
>>> ## No warning issued, but ##
>>> ## Will fail to modify in place ##
>>> ## ------------------------------------ ##
>>> mapply (
>>> function(DT, C3, C4)
>>> DT[, c("Col3", "Col4") := list(Col3*1e3, Col4*1e4)],
>>>
>>> list.DT, # iterating over the list
>>> list.Col3, list.Col4,
>>> SIMPLIFY=FALSE
>>> )
>>>
>>> ## Note the lack of change (compare with output from `mapply`)
>>> list.DT
>>>
>>> ## ------------------------------------ ##
>>> ## ##
>>> ## `lapply` works as expected. ##
>>> ## ##
>>> ## ------------------------------------ ##
>>>
>>> ## NOW WITH lapply
>>> lapply(list.DT,
>>> function(DT)
>>> DT[, newCol := LETTERS[1:5]]
>>> )
>>>
>>> ## Note the new column:
>>> list.DT
>>>
>>>
>>>
>>> # ========================== #
>>>
>>> ## NON-WORKAROUNDS ##
>>> ##
>>> ## I also tried all of the following alternatives
>>> ## in hopes of being able to iterate over the list
>>> ## directly, using `mapply`.
>>> ## None of these worked.
>>>
>>> # (1) Creating the DTs First, then creating the list from them
>>> DT1 <- data.table(Col1=111:115, Col2=121:125)
>>> DT2 <- data.table(Col1=211:215, Col2=221:225)
>>>
>>> list.DT <- list(DT1=DT1,DT2=DT2 )
>>>
>>>
>>> # (2) Same as 1, and using `copy()` in the call to `list()`
>>> list.DT <- list(DT1=copy(DT1),
>>> DT2=copy(DT2) )
>>>
>>> # (3) lapply'ing `copy` and then iterating over that list
>>> list.DT <- lapply(list.DT, copy)
>>>
>>> # (4) Not naming the list elements
>>> list.DT <- list(DT1, DT2)
>>> # and tried
>>> list.DT <- list(copy(DT1), copy(DT2))
>>>
>>> ## All of the above still failed to modify in place
>>> ## (and also issued the same warning if trying to add a column)
>>> ## when iterating using mapply
>>>
>>> mapply(function(DT, C3, C4)
>>> DT[, c("Col3", "Col4") := list(C3, C4)],
>>> list.DT, list.Col3, list.Col4,
>>> SIMPLIFY=FALSE)
>>>
>>>
>>> # ========================== #
>>>
>>>
>>> Ricardo Saporta
>>> Rutgers University, New Jersey
>>> e: saporta at rutgers.edu
>>>
>>>
>>>
>>> _______________________________________________
>>> datatable-help mailing listdatatable-help at lists.r-forge.r-project.orghttps://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
>>>
>>>
>>>
>>
>>
>
>
> _______________________________________________
> datatable-help mailing listdatatable-help at lists.r-forge.r-project.orghttps://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/datatable-help/attachments/20130924/d90fc865/attachment-0001.html>
More information about the datatable-help
mailing list