[datatable-help] mapply cannot modify in place when iterating over list of DTs
Ricardo Saporta
saporta at scarletmail.rutgers.edu
Sun Sep 22 04:02:40 CEST 2013
Matthew,
I did notice the warning, but something doesnt add up:
If the issue is simply that it is being copied when created, then wouldnt
we expect the same warning to arise when we try to modify the table in
using `mapply` or `lapply`? (the latter does not produce a warning.
If on the otherhand, the issue pertains specifically to mapply (which I
assume it does), then why is it only a problem when we iterate over the
list directly, whereas iterating indirectly by using an index does not
produce any warnings.
While overall, this is minor if one is aware of the issue, I think it might
allow for unnoticed bugs to creep into someones code. Specifically if
using mapply to modify a list of DTs and the user not realizing that the
modifications are not being held.
That being said, I'm not sure how this could even be addressed if the root
is in mapply, but is it worth trying to address?
Rick
On Fri, Sep 20, 2013 at 2:18 PM, Matthew Dowle <mdowle at mdowle.plus.com>wrote:
> Does this sentence from the warning help?
>
>
> " Also, in R<v3.1.0, list(DT1,DT2) copied the entire DT1 and DT2 (R's
> list() used to copy named objects); please upgrade to R>=v3.1.0 if that is
> biting. "
>
> Matthew
>
>
> On 20/09/13 19:01, Ricardo Saporta wrote:
>
> One warning per DT in the list
> (I added the line breaks)
> -Rick
> =============================================
> Warning messages:
>
> 1: In `[.data.table`(DT, , `:=`(c("Col3", "Col4"), list(C3, C4))) :
>
> Invalid .internal.selfref detected and fixed by taking a copy of the
> whole table so that := can add this new column by reference. At an earlier
> point, this data.table has been copied by R (or been created manually using
> structure() or similar). Avoid key<-, names<- and attr<- which in R
> currently (and oddly) may copy the whole data.table. Use set* syntax
> instead to avoid copying: ?set, ?setnames and ?setattr. Also, in R<v3.1.0,
> list(DT1,DT2) copied the entire DT1 and DT2 (R's list() used to copy named
> objects); please upgrade to R>=v3.1.0 if that is biting. If this message
> doesn't help, please report to datatable-help so the root cause can be
> fixed.
>
> 2: In `[.data.table`(DT, , `:=`(c("Col3", "Col4"), list(C3, C4))) :
>
> Invalid .internal.selfref detected and fixed by taking a copy of the
> whole table so that := can add this new column by reference. At an earlier
> point, this data.table has been copied by R (or been created manually using
> structure() or similar). Avoid key<-, names<- and attr<- which in R
> currently (and oddly) may copy the whole data.table. Use set* syntax
> instead to avoid copying: ?set, ?setnames and ?setattr. Also, in R<v3.1.0,
> list(DT1,DT2) copied the entire DT1 and DT2 (R's list() used to copy named
> objects); please upgrade to R>=v3.1.0 if that is biting. If this message
> doesn't help, please report to datatable-help so the root cause can be
> fixed.
> =============================================
>
>
>
>
> On Fri, Sep 20, 2013 at 12:49 PM, Matthew Dowle <mdowle at mdowle.plus.com>wrote:
>
>>
>> Hi,
>>
>> What's the warning?
>>
>> Matthew
>>
>>
>>
>> On 20/09/13 14:48, Ricardo Saporta wrote:
>>
>> I've encountered the following issue iterating over a list of
>> data.tables.
>> The issue is only with mapply, not with lapply .
>>
>>
>> Given a list of data.table's, mapply'ing over the list directly
>> cannot modify in place.
>>
>> Also if attempting to add a new column, we get an "Invalid
>> .internal.selfref" warning.
>> Modifying an existing column does not issue a warning, but still fails to
>> modify-in-place
>>
>> WORKAROUND:
>> ----------
>> The workaround is to iterate over an index to the list, then to
>> modify each data.table via list.of.DTs[[i]][ .. ]
>>
>> **Interestingly, this issue occurs with `mapply`, but not `lapply`.**
>>
>>
>> EXAMPLE:
>> --------
>> # Given a list of DT's and two lists of vectors,
>> # we want to add the corresponding vectors as columns to the DT.
>>
>> ## ---------------- ##
>> ## SAMPLE DATA: ##
>> ## ---------------- ##
>> # list of data.tables
>> list.DT <- list(
>> DT1=data.table(Col1=111:115, Col2=121:125),
>> DT2=data.table(Col1=211:215, Col2=221:225)
>> )
>>
>> # lists of columns to add
>> list.Col3 <- list(131:135, 231:235)
>> list.Col4 <- list(141:145, 241:245)
>>
>>
>> ## ------------------------------------ ##
>> ## Iterating over the list elements ##
>> ## adding a new column ##
>> ## ------------------------------------ ##
>> ## Will issue warning and ##
>> ## will fail to modify in place ##
>> ## ------------------------------------ ##
>> mapply (
>> function(DT, C3, C4)
>> DT[, c("Col3", "Col4") := list(C3, C4)],
>>
>> list.DT, # iterating over the list
>> list.Col3, list.Col4,
>> SIMPLIFY=FALSE
>> )
>>
>> ## Note the lack of change
>> list.DT
>>
>>
>> ## ------------------------------------ ##
>> ## Iterating over an index ##
>> ## ------------------------------------ ##
>> mapply (
>> function(i, C3, C4)
>> list.DT[[i]] [, c("Col3", "Col4") := list(C3, C4)],
>>
>> seq(list.DT), # iterating over an index to the list
>> list.Col3, list.Col4,
>> SIMPLIFY=FALSE
>> )
>>
>> ## Note each DT _has_ been modified
>> list.DT
>>
>> ## ------------------------------------ ##
>> ## Iterating over the list elements ##
>> ## modifying existing column ##
>> ## ------------------------------------ ##
>> ## No warning issued, but ##
>> ## Will fail to modify in place ##
>> ## ------------------------------------ ##
>> mapply (
>> function(DT, C3, C4)
>> DT[, c("Col3", "Col4") := list(Col3*1e3, Col4*1e4)],
>>
>> list.DT, # iterating over the list
>> list.Col3, list.Col4,
>> SIMPLIFY=FALSE
>> )
>>
>> ## Note the lack of change (compare with output from `mapply`)
>> list.DT
>>
>> ## ------------------------------------ ##
>> ## ##
>> ## `lapply` works as expected. ##
>> ## ##
>> ## ------------------------------------ ##
>>
>> ## NOW WITH lapply
>> lapply(list.DT,
>> function(DT)
>> DT[, newCol := LETTERS[1:5]]
>> )
>>
>> ## Note the new column:
>> list.DT
>>
>>
>>
>> # ========================== #
>>
>> ## NON-WORKAROUNDS ##
>> ##
>> ## I also tried all of the following alternatives
>> ## in hopes of being able to iterate over the list
>> ## directly, using `mapply`.
>> ## None of these worked.
>>
>> # (1) Creating the DTs First, then creating the list from them
>> DT1 <- data.table(Col1=111:115, Col2=121:125)
>> DT2 <- data.table(Col1=211:215, Col2=221:225)
>>
>> list.DT <- list(DT1=DT1,DT2=DT2 )
>>
>>
>> # (2) Same as 1, and using `copy()` in the call to `list()`
>> list.DT <- list(DT1=copy(DT1),
>> DT2=copy(DT2) )
>>
>> # (3) lapply'ing `copy` and then iterating over that list
>> list.DT <- lapply(list.DT, copy)
>>
>> # (4) Not naming the list elements
>> list.DT <- list(DT1, DT2)
>> # and tried
>> list.DT <- list(copy(DT1), copy(DT2))
>>
>> ## All of the above still failed to modify in place
>> ## (and also issued the same warning if trying to add a column)
>> ## when iterating using mapply
>>
>> mapply(function(DT, C3, C4)
>> DT[, c("Col3", "Col4") := list(C3, C4)],
>> list.DT, list.Col3, list.Col4,
>> SIMPLIFY=FALSE)
>>
>>
>> # ========================== #
>>
>>
>> Ricardo Saporta
>> Rutgers University, New Jersey
>> e: saporta at rutgers.edu
>>
>>
>>
>> _______________________________________________
>> datatable-help mailing listdatatable-help at lists.r-forge.r-project.orghttps://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
>>
>>
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/datatable-help/attachments/20130921/425846fe/attachment-0001.html>
More information about the datatable-help
mailing list