<html>
<head>
<meta content="text/html; charset=ISO-8859-1"
http-equiv="Content-Type">
</head>
<body bgcolor="#FFFFFF" text="#000000">
Hi Matthew,<br>
<br>
Sorry for not filing earlier -- the behavior is not a major
annoyance as my data.tables are rather small this time around. <br>
<br>
The reason I'm using data.tables in a list, though that might seem
odd, is I'm harvesting quantities of external data files that I
eventually want to combine into one data.table, but before I can
rbind() everything, I'm running lots of validation and cleaning
tasks on the harvested files using lapply() and some indexing magic.
The combination of data.table() and lapply() makes the syntax <i>really
</i>efficient.<br>
<br>
I'm afraid I can't provide further input into a possible workaround
as the alternatives you listed below sound all good to me! Hopefully
others on the list can contribute.<br>
<br>
Best, --Mel.<br>
<div class="moz-cite-prefix"><br>
<br>
On 8/15/2012 4:30 AM, Matthew Dowle wrote:<br>
</div>
<blockquote
cite="mid:f6727a2b15befa65dd4dda732d1aef45.squirrel@webmail.plus.net"
type="cite">
<pre wrap="">
Hi,
That's interesting, thanks. I'm delighted the warning came up and that no
crash happened. This is just what .internal.selfref was designed to catch.
list() itself appears to be copying its NAM(2)-ed inputs. If you run the
following, you should see the pointer addresses show that.
X=data.table(a=1:3)
.Internal(inspect(X))
.Internal(inspect(list(X))) # list() copies X
The problem isn't just the copy, but that when R does that copy it
collapses the over-allocated vector of column vector pointers (that
data.table carefully created) down to just the columns used. Causing := a
problem if it's then asked to add a column by reference (no free slots).
Three possible dev solutions spring to mind :
1. Try again to return data.table as NAM(0) not NAM(2) [there's already a
FR for that]. Assuming that list() only copies NAM(2) inputs.
2. Add a new function to data.table (reflist()?) that doesn't copy
data.table inputs but works the same as base::list otherwise.
3. Get even more fancy inside [.data.table to inspect its caller. If
that's L[[i]] then update L's pointer to the (new) re-over-allocated
column pointer vector. The copy by list() would still happen but at least
the column would be added. The next add column by reference after that
would then work without warning.
Please file a bug report, with a link to this thread. That way you'll get
automatic updates when the status changes. Option 2 is most likely.
Is list() of data.table really needed? Could it be one data.table with an
extra first column, or an environment of data.table's perhaps?
The more significant problem is that a list column containing data.tables
is likely copying all those data.tables, then. Regardless of whether or
not := is then used to add a column by reference to those embedded tables.
Matthew
</pre>
<blockquote type="cite">
<pre wrap="">Hello,
I just noticed an odd behavior with lists of data.tables:
dt1 <- data.table(a=1:3, b=4:6, c=7:9)
dt2 <- data.table(a=10:12, b=13:15, c=16:18)
# Combine in a list
myList <- list(dt1, dt2)
# Adding a new column to first data.table -- this doesn't work
myList[[1]][, d := 4:6]
# a b c d
# 1: 1 4 7 4
# 2: 2 5 8 5
# 3: 3 6 9 6
# Warning message:
# In `[.data.table`(myList[[1]], , `:=`(d, 4:6)) :
# Invalid .internal.selfref detected and fixed by taking a copy of the
whole table,
so that := can add this new column by reference. At an earlier point, this
data.table
has been copied by R. Avoid key<-, names<- and attr<- which in R currently
(and oddly)
all copy the whole data.table. Use set* syntax instead to avoid copying:
setkey(),
setnames() and setattr(). If this message doesn't help, please report to
datatable-help so the root cause can be fixed.
myList[[1]]
# a b c
# 1: 1 4 7
# 2: 2 5 8
# 3: 3 6 9
# I need to reassign -- this works
myList[[1]] <- myList[[1]][, d := 4:6]
myList[[1]]
# a b c d
# 1: 1 4 7 4
# 2: 2 5 8 5
# 3: 3 6 9 6
# But on the other hand this works no problem
setcolorder(myList[[1]], 4:1)
myList[[1]]
# d c b a
# 1: 4 7 4 1
# 2: 5 8 5 2
# 3: 6 9 6 3
Is this normal behavior, seems a bit odd to me?
Here is my session:
> sessionInfo()
R version 2.15.1 (2012-06-22)
Platform: x86_64-redhat-linux-gnu (64-bit)
locale:
[1] C
attached base packages:
[1] stats graphics utils datasets grDevices methods base
other attached packages:
[1] foreign_0.8-50 RJDBC_0.2-0 DBI_0.2-5
[4] XLConnect_0.2-0 XLConnectJars_0.2-0 rJava_0.9-3
[7] data.table_1.8.2 rj_1.1.0-4
loaded via a namespace (and not attached):
[1] rj.gd_1.1.0-1 tools_2.15.1
Thanks very much for this fantastic package!
--Mel.
Melanie BACOU
International Food Policy Research Institute
Agricultural Economist, HarvestChoice
E-mail <a class="moz-txt-link-abbreviated" href="mailto:mel@mbacou.com">mel@mbacou.com</a> <a class="moz-txt-link-rfc2396E" href="mailto:mel@mbacou.com"><mailto:mel@mbacou.com></a>
Visit harvestchoice.org <a class="moz-txt-link-rfc2396E" href="http://www.harvestchoice.org/"><http://www.harvestchoice.org/></a>
_______________________________________________
datatable-help mailing list
<a class="moz-txt-link-abbreviated" href="mailto:datatable-help@lists.r-forge.r-project.org">datatable-help@lists.r-forge.r-project.org</a>
<a class="moz-txt-link-freetext" href="https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help">https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help</a>
</pre>
</blockquote>
<pre wrap="">
</pre>
</blockquote>
<br>
</body>
</html>