[datatable-help] := construct doesn't seem to work in lists of data.tables

Matthew Dowle mdowle at mdowle.plus.com
Wed Aug 15 10:30:16 CEST 2012


Hi,

That's interesting, thanks. I'm delighted the warning came up and that no
crash happened. This is just what .internal.selfref was designed to catch.

list() itself appears to be copying its NAM(2)-ed inputs. If you run the
following, you should see the pointer addresses show that.

    X=data.table(a=1:3)
    .Internal(inspect(X))
    .Internal(inspect(list(X)))   # list() copies X

The problem isn't just the copy, but that when R does that copy it
collapses the over-allocated vector of column vector pointers (that
data.table carefully created) down to just the columns used. Causing := a
problem if it's then asked to add a column by reference (no free slots).

Three possible dev solutions spring to mind :

1. Try again to return data.table as NAM(0) not NAM(2) [there's already a
FR for that]. Assuming that list() only copies NAM(2) inputs.

2. Add a new function to data.table (reflist()?) that doesn't copy
data.table inputs but works the same as base::list otherwise.

3. Get even more fancy inside [.data.table to inspect its caller. If
that's L[[i]] then update L's pointer to the (new) re-over-allocated
column pointer vector. The copy by list() would still happen but at least
the column would be added. The next add column by reference after that
would then work without warning.

Please file a bug report, with a link to this thread. That way you'll get
automatic updates when the status changes. Option 2 is most likely.

Is list() of data.table really needed? Could it be one data.table with an
extra first column, or an environment of data.table's perhaps?

The more significant problem is that a list column containing data.tables
is likely copying all those data.tables, then. Regardless of whether or
not := is then used to add a column by reference to those embedded tables.

Matthew


> Hello,
>
> I just noticed an odd behavior with lists of data.tables:
>
> dt1 <- data.table(a=1:3, b=4:6, c=7:9)
> dt2 <- data.table(a=10:12, b=13:15, c=16:18)
>
> # Combine in a list
> myList <- list(dt1, dt2)
>
> # Adding a new column to first data.table -- this doesn't work
> myList[[1]][, d := 4:6]
> #    a b c d
> # 1: 1 4 7 4
> # 2: 2 5 8 5
> # 3: 3 6 9 6
> # Warning message:
> # In `[.data.table`(myList[[1]], , `:=`(d, 4:6)) :
> #   Invalid .internal.selfref detected and fixed by taking a copy of the
> whole table,
> so that := can add this new column by reference. At an earlier point, this
> data.table
> has been copied by R. Avoid key<-, names<- and attr<- which in R currently
> (and oddly)
> all copy the whole data.table. Use set* syntax instead to avoid copying:
> setkey(),
> setnames() and setattr(). If this message doesn't help, please report to
> datatable-help so the root cause can be fixed.
>
> myList[[1]]
> #    a b c
> # 1: 1 4 7
> # 2: 2 5 8
> # 3: 3 6 9
>
> # I need to reassign -- this works
> myList[[1]] <- myList[[1]][, d := 4:6]
>
> myList[[1]]
> #    a b c d
> # 1: 1 4 7 4
> # 2: 2 5 8 5
> # 3: 3 6 9 6
>
> # But on the other hand this works no problem
> setcolorder(myList[[1]], 4:1)
> myList[[1]]
> #    d c b a
> # 1: 4 7 4 1
> # 2: 5 8 5 2
> # 3: 6 9 6 3
>
> Is this normal behavior, seems a bit odd to me?
>
> Here is my session:
>
>  > sessionInfo()
> R version 2.15.1 (2012-06-22)
> Platform: x86_64-redhat-linux-gnu (64-bit)
>
> locale:
> [1] C
>
> attached base packages:
> [1] stats     graphics  utils     datasets  grDevices methods base
>
> other attached packages:
> [1] foreign_0.8-50      RJDBC_0.2-0         DBI_0.2-5
> [4] XLConnect_0.2-0     XLConnectJars_0.2-0 rJava_0.9-3
> [7] data.table_1.8.2    rj_1.1.0-4
>
> loaded via a namespace (and not attached):
> [1] rj.gd_1.1.0-1 tools_2.15.1
>
>
> Thanks very much for this fantastic package!
>
> --Mel.
>
> Melanie BACOU
> International Food Policy Research Institute
> Agricultural Economist, HarvestChoice
> E-mail mel at mbacou.com <mailto:mel at mbacou.com>
> Visit harvestchoice.org <http://www.harvestchoice.org/>
>
> _______________________________________________
> datatable-help mailing list
> datatable-help at lists.r-forge.r-project.org
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
>




More information about the datatable-help mailing list