<div dir="ltr"><br><div class="gmail_extra"><br><div class="gmail_quote">On Mon, Sep 23, 2013 at 9:42 PM, Matthew Dowle <span dir="ltr"><<a href="mailto:mdowle@mdowle.plus.com" target="_blank">mdowle@mdowle.plus.com</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div bgcolor="#FFFFFF" text="#000000">
<div><br>
Hi,<br>
Basically adding columns by reference to a data.table when it's a
member of a list of data.table, is really difficult to handle
internally. I had to special case internally to get around list()
copying, so that the binding can change inside the list on the
shallow copy when [[ is used. A for loop is the way to add
columns by reference inside a list of data.table, and that should
work ok using [[. But doing that via lapply and mapply is really
stretching it. </div></div></blockquote><div><br></div><div>That makes sense. I took a whack at it, but couldn't even come close. </div><div><br></div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div bgcolor="#FFFFFF" text="#000000"><div>Even catching user expectations in this area is
difficult. Ideally we'd catch mapply, yes, but really data.table
likes to be rbindlist()-ed and then ops to work on a single large
data.table. </div></div></blockquote><div><br></div><div>Agreed. In the application where this came up, I am dealing with a list of tables with different dims (hence not rbinding)</div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div bgcolor="#FFFFFF" text="#000000"><div>We can advice to the warning message not to use
mapply or lapply to add columns by reference to a list of
data.table (use a for loop instead) ?</div></div></blockquote><div><br></div><div>Perhaps a warning that modifications to the DT's in the list are likely to not have stuck and to use rbindlist when possible?</div>
<div> </div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div bgcolor="#FFFFFF" text="#000000"><div><span class="HOEnZb"><font color="#888888"><br>
Matthew</font></span><div><div class="h5"><br>
<br>
<br>
On 22/09/13 03:02, Ricardo Saporta wrote:<br>
</div></div></div><div><div class="h5">
<blockquote type="cite">
<div dir="ltr">Matthew,
<div><br>
</div>
<div>I did notice the warning, but something doesnt add up: </div>
<div><br>
</div>
<div>If the issue is simply that it is being copied when
created, then wouldnt we expect the same warning to arise when
we try to modify the table in using `mapply` or `lapply`? (the
latter does not produce a warning. </div>
<div><br>
</div>
<div>If on the otherhand, the issue pertains specifically to
mapply (which I assume it does), then why is it only a problem
when we iterate over the list directly, whereas iterating
indirectly by using an index does not produce any warnings. </div>
<div> </div>
<div class="gmail_extra">
<div>
<div style="color:rgb(34,34,34);font-size:13px;font-family:arial,sans-serif">
<div style="font-size:13px">While overall, this is minor
if one is aware of the issue, I think it might allow for
unnoticed bugs to creep into someones code.
Specifically if using mapply to modify a list of DTs and
the user not realizing that the modifications are not
being held. </div>
<div style="font-size:13px"><br>
</div>
<div style="font-size:13px">That being said, I'm not sure
how this could even be addressed if the root is in
mapply, but is it worth trying to address? </div>
<div style="font-size:13px">
<br>
</div>
<div style="font-size:13px">Rick</div>
<div style="font-size:13px"><br>
</div>
</div>
</div>
<br>
<div class="gmail_quote">On Fri, Sep 20, 2013 at 2:18 PM,
Matthew Dowle <span dir="ltr"><<a href="mailto:mdowle@mdowle.plus.com" target="_blank">mdowle@mdowle.plus.com</a>></span>
wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div bgcolor="#FFFFFF" text="#000000">
<div>Does this sentence from the warning help?
<div><br>
<br>
" Also, in R<v3.1.0, list(DT1,DT2) copied the
entire DT1 and DT2 (R's list() used to copy named
objects); please upgrade to R>=v3.1.0 if that is
biting. "<br>
<br>
</div>
<span><font color="#888888"> Matthew</font></span>
<div>
<div><br>
<br>
On 20/09/13 19:01, Ricardo Saporta wrote:<br>
</div>
</div>
</div>
<div>
<div>
<blockquote type="cite">
<div dir="ltr">One warning per DT in the list
<div> (I added the line breaks)
<div>-Rick</div>
<div>=============================================</div>
<div>
<div>Warning messages:</div>
<div><br>
</div>
<div>1: In `[.data.table`(DT, ,
`:=`(c("Col3", "Col4"), list(C3, C4))) :</div>
<div><br>
</div>
<div> Invalid .internal.selfref detected
and fixed by taking a copy of the whole
table so that := can add this new column
by reference. At an earlier point, this
data.table has been copied by R (or been
created manually using structure() or
similar). Avoid key<-, names<- and
attr<- which in R currently (and oddly)
may copy the whole data.table. Use set*
syntax instead to avoid copying: ?set,
?setnames and ?setattr. Also, in
R<v3.1.0, list(DT1,DT2) copied the
entire DT1 and DT2 (R's list() used to
copy named objects); please upgrade to
R>=v3.1.0 if that is biting. If this
message doesn't help, please report to
datatable-help so the root cause can be
fixed.</div>
<div><br>
</div>
<div>2: In `[.data.table`(DT, ,
`:=`(c("Col3", "Col4"), list(C3, C4))) :</div>
<div><br>
</div>
<div> Invalid .internal.selfref detected
and fixed by taking a copy of the whole
table so that := can add this new column
by reference. At an earlier point, this
data.table has been copied by R (or been
created manually using structure() or
similar). Avoid key<-, names<- and
attr<- which in R currently (and oddly)
may copy the whole data.table. Use set*
syntax instead to avoid copying: ?set,
?setnames and ?setattr. Also, in
R<v3.1.0, list(DT1,DT2) copied the
entire DT1 and DT2 (R's list() used to
copy named objects); please upgrade to
R>=v3.1.0 if that is biting. If this
message doesn't help, please report to
datatable-help so the root cause can be
fixed.</div>
</div>
<div>=============================================<br>
</div>
<div><br>
</div>
</div>
<div class="gmail_extra"><br>
<br>
<br>
<div class="gmail_quote">On Fri, Sep 20, 2013
at 12:49 PM, Matthew Dowle <span dir="ltr"><<a href="mailto:mdowle@mdowle.plus.com" target="_blank">mdowle@mdowle.plus.com</a>></span>
wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div bgcolor="#FFFFFF" text="#000000">
<div><br>
Hi,<br>
<br>
What's the warning?<br>
<br>
Matthew
<div>
<div><br>
<br>
<br>
On 20/09/13 14:48, Ricardo Saporta
wrote:<br>
</div>
</div>
</div>
<blockquote type="cite">
<div>
<div>
<div dir="ltr">
<div>
<div>I've encountered the
following issue iterating
over a list of data.tables. </div>
<div>The issue is only with
mapply, not with lapply .</div>
<div><br>
</div>
<div> </div>
<div>Given a list of
data.table's, mapply'ing
over the list directly </div>
<div>cannot modify in place. </div>
<div><br>
</div>
<div>Also if attempting to add
a new column, we get an
"Invalid .internal.selfref"
warning. </div>
<div>Modifying an existing
column does not issue a
warning, but still fails to
modify-in-place</div>
<div><br>
</div>
<div>WORKAROUND: </div>
<div>----------</div>
<div>The workaround is to
iterate over an index to the
list, then to </div>
<div> modify each data.table
via list.of.DTs[[i]][ .. ]</div>
<div><br>
</div>
<div>**Interestingly, this
issue occurs with `mapply`,
but not `lapply`.**</div>
<div><br>
</div>
<div> </div>
<div>EXAMPLE:</div>
<div>-------- </div>
<div> # Given a list of DT's
and two lists of vectors, </div>
<div> # we want to add the
corresponding vectors as
columns to the DT. </div>
<div><br>
</div>
<div>## ---------------- ##</div>
<div>## SAMPLE DATA: ##</div>
<div>## ---------------- ##</div>
<div> # list of data.tables</div>
<div> list.DT <- list(</div>
<div>
DT1=data.table(Col1=111:115,
Col2=121:125),</div>
<div>
DT2=data.table(Col1=211:215,
Col2=221:225)</div>
<div> )</div>
<div><br>
</div>
<div> # lists of columns to
add</div>
<div> list.Col3 <-
list(131:135, 231:235)</div>
<div> list.Col4 <-
list(141:145, 241:245)</div>
<div><br>
</div>
<div><br>
</div>
<div>##
------------------------------------
##</div>
<div>## Iterating over the
list elements ##</div>
<div>## adding a new
column ##</div>
<div>##
------------------------------------
##</div>
<div>## Will issue warning
and ##</div>
<div>## will fail to
modify in place ##</div>
<div>##
------------------------------------
##</div>
<div> mapply (</div>
<div> function(DT, C3,
C4)</div>
<div> DT[, c("Col3",
"Col4") := list(C3, C4)],</div>
<div> </div>
<div> list.DT, #
iterating over the list</div>
<div> list.Col3,
list.Col4,</div>
<div> SIMPLIFY=FALSE</div>
<div> ) </div>
<div><br>
</div>
<div> ## Note the lack of
change</div>
<div> list.DT</div>
<div><br>
</div>
<div><br>
</div>
<div>##
------------------------------------
##</div>
<div>## Iterating over an
index ##</div>
<div>##
------------------------------------
##</div>
<div> mapply (</div>
<div> function(i, C3, C4)</div>
<div> list.DT[[i]] [,
c("Col3", "Col4") :=
list(C3, C4)],</div>
<div> </div>
<div> seq(list.DT), #
iterating over an index to
the list</div>
<div> list.Col3,
list.Col4,</div>
<div> SIMPLIFY=FALSE</div>
<div> )</div>
<div><br>
</div>
<div> ## Note each DT _has_
been modified</div>
<div> list.DT</div>
<div><br>
</div>
<div>##
------------------------------------
##</div>
<div>## Iterating over the
list elements ##</div>
<div>## modifying existing
column ##</div>
<div>##
------------------------------------
##</div>
<div>## No warning issued,
but ##</div>
<div>## Will fail to
modify in place ##</div>
<div>##
------------------------------------
##</div>
<div> mapply (</div>
<div> function(DT, C3,
C4)</div>
<div> DT[, c("Col3",
"Col4") := list(Col3*1e3,
Col4*1e4)],</div>
<div><br>
</div>
<div> list.DT, #
iterating over the list</div>
<div> list.Col3,
list.Col4,</div>
<div> SIMPLIFY=FALSE</div>
<div> ) </div>
<div><br>
</div>
<div> ## Note the lack of
change (compare with output
from `mapply`)</div>
<div> list.DT</div>
<div><br>
</div>
<div>##
------------------------------------
##</div>
<div>##
##</div>
<div>## `lapply` works as
expected. ##</div>
<div>##
##</div>
<div>##
------------------------------------
##</div>
<div> </div>
<div> ## NOW WITH lapply</div>
<div> lapply(list.DT, </div>
<div> function(DT)</div>
<div> DT[, newCol :=
LETTERS[1:5]]</div>
<div> )</div>
<div><br>
</div>
<div> ## Note the new
column: </div>
<div> list.DT</div>
<div><br>
</div>
<div><br>
</div>
<div><br>
</div>
<div>#
==========================
# </div>
<div><br>
</div>
<div>## NON-WORKAROUNDS
## </div>
<div>##</div>
<div>## I also tried all of
the following alternatives</div>
<div>## in hopes of being
able to iterate over the
list </div>
<div>## directly, using
`mapply`. </div>
<div>## None of these worked. </div>
<div><br>
</div>
<div># (1) Creating the DTs
First, then creating the
list from them</div>
<div> DT1 <-
data.table(Col1=111:115,
Col2=121:125)</div>
<div> DT2 <-
data.table(Col1=211:215,
Col2=221:225)</div>
<div><br>
</div>
<div> list.DT <-
list(DT1=DT1,DT2=DT2 )</div>
<div><br>
</div>
<div><br>
</div>
<div># (2) Same as 1, and
using `copy()` in the call
to `list()`</div>
<div> list.DT <-
list(DT1=copy(DT1), </div>
<div>
DT2=copy(DT2) )</div>
<div><br>
</div>
<div># (3) lapply'ing `copy`
and then iterating over that
list</div>
<div> list.DT <-
lapply(list.DT, copy)</div>
<div><br>
</div>
<div># (4) Not naming the list
elements</div>
<div> list.DT <-
list(DT1, DT2)</div>
<div> # and tried</div>
<div> list.DT <-
list(copy(DT1), copy(DT2))</div>
<div><br>
</div>
<div>## All of the above still
failed to modify in place</div>
<div>## (and also issued the
same warning if trying to
add a column)</div>
<div>## when iterating
using mapply</div>
<div><br>
</div>
<div> mapply(function(DT, C3,
C4)</div>
<div> DT[, c("Col3",
"Col4") := list(C3, C4)],</div>
<div> list.DT, list.Col3,
list.Col4,</div>
<div> SIMPLIFY=FALSE)</div>
<div><br>
</div>
<div><br>
</div>
<div>#
==========================
# </div>
</div>
<div><br>
</div>
<br clear="all">
<div>
<div style="color:rgb(34,34,34);font-size:13px;font-family:arial,sans-serif">
<div style="font-size:13px">Ricardo
Saporta</div>
<div style="font-size:13px">
Rutgers University, New
Jersey<br>
</div>
<div style="font-size:13px"><span style="font-size:13px">e: </span><a href="mailto:saporta@rutgers.edu" style="color:rgb(17,85,204);font-size:13px" target="_blank">saporta@rutgers.edu</a></div>
<div><br>
</div>
</div>
</div>
</div>
<br>
<fieldset></fieldset>
<br>
</div>
</div>
<pre>_______________________________________________
datatable-help mailing list
<a href="mailto:datatable-help@lists.r-forge.r-project.org" target="_blank">datatable-help@lists.r-forge.r-project.org</a>
<a href="https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help" target="_blank">https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help</a></pre>
</blockquote>
<br>
</div>
</blockquote>
</div>
<br>
</div>
</div>
</blockquote>
<br>
</div>
</div>
</div>
</blockquote>
</div>
<br>
</div>
</div>
<br>
<fieldset></fieldset>
<br>
<pre>_______________________________________________
datatable-help mailing list
<a href="mailto:datatable-help@lists.r-forge.r-project.org" target="_blank">datatable-help@lists.r-forge.r-project.org</a>
<a href="https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help" target="_blank">https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help</a></pre>
</blockquote>
<br>
</div></div></div>
</blockquote></div><br></div></div>