[datatable-help] Copying a data.table (subject was ':= unclarity and possible bug?')

Matthew Dowle mdowle at mdowle.plus.com
Fri Aug 5 14:37:26 CEST 2011


That is indeed odd.  Please file a bug.report().  Intended was option B.

I see why you prefer A,  but it is B so that compound syntax works; e.g. 
DT[i,done:=TRUE][,sum(done)]

Compound syntax is also why it doesn't return the number of rows updated by 
:=.  Verbosity returns the 'x rows updated' message.

However, even if it worked, the usual copy-on-write semantics would still 
not work (and that is deliberate).

The correct way to copy a data.table is (now) :

    out <- data.table(DT[,z:=10])

As per (new) examples in ?setkey.   Does that work ok in this case?

So, you if you really want to copy a (potentially very huge) data.table then 
I've (deliberately) made it harder for you (and me myself) to copy (often by 
accident).  But, being able to copy is still possible if you need to.

Thinking about it, perhaps we need a new copy() function, or even 
duplicate(),  since data.table() is a bit too heavy if all you need is a 
mere copy.

Note that force() in a function body doesn't force local copies of 
data.tables, either, even on copy-on-write.  That is deliberate, too.  If 
you really need a copy,  then you really must explicitly copy to a new 
variable name AND use data.table() to explicitly create (potentially a huge 
amount of) new memory.

It isn't actually data.table itself per se, that doesn't copy,  it's the 
functions that operate on it.   So, setkey has been changing DT by reference 
since 1.6.2,  and now := does too.

Matthew


"Chris Neff" <caneff at gmail.com> wrote in message 
news:CAAuY0RUVJjw1PJ7J-nET8yvx8nbXTfnLXs8p1gwd7-a9vGL+jA at mail.gmail.com...
Now that I've played with := for a little bit, what is the rationale
for the following?

> DT <- data.table(x=1:10, y=1:10)
> out <- DT[, z:=1:10]
> out
       x  y
 [1,]  1  1
 [2,]  2  2
 [3,]  3  3
 [4,]  4  4
 [5,]  5  5
 [6,]  6  6
 [7,]  7  7
 [8,]  8  8
 [9,]  9  9
[10,] 10 10
> DT
       x  y  z
 [1,]  1  1  1
 [2,]  2  2  2
 [3,]  3  3  3
 [4,]  4  4  4
 [5,]  5  5  5
 [6,]  6  6  6
 [7,]  7  7  7
 [8,]  8  8  8
 [9,]  9  9  9
[10,] 10 10 10


I would have expected the return from DT[, z:=1:10] to be either A)
nothing, which is what I think is the preferred thing if you are
really trying to drill home the idea of in place assignment, or B) the
newly updated version of DT with z in it (but I think that muddles
what := does).  Why does it return what it does?






More information about the datatable-help mailing list