[datatable-help] Copying a data.table (subject was ':= unclarity and possible bug?')

Chris Neff caneff at gmail.com
Fri Aug 5 14:57:42 CEST 2011


On 5 August 2011 08:37, Matthew Dowle <mdowle at mdowle.plus.com> wrote:
> That is indeed odd.  Please file a bug.report().  Intended was option B.

Done: https://r-forge.r-project.org/tracker/index.php?func=detail&aid=1496&group_id=240&atid=975

> I see why you prefer A,  but it is B so that compound syntax works; e.g.
> DT[i,done:=TRUE][,sum(done)]

Ah that makes sense.  If someone absent mindedly does DT <- DT[i,
z:=1:10] are they incurring any copying there? I'm guessing no because
<- is by reference.

>
> Compound syntax is also why it doesn't return the number of rows updated by
> :=.  Verbosity returns the 'x rows updated' message.
>
> However, even if it worked, the usual copy-on-write semantics would still
> not work (and that is deliberate).
>
> The correct way to copy a data.table is (now) :
>
>    out <- data.table(DT[,z:=10])
>
> As per (new) examples in ?setkey.   Does that work ok in this case?

Unexpected but acceptable and reasonable in the context of the
package.  I saw the examples in setkey and those work for me in terms
of making a copy.  I really wasn't even interested in making a copy I
was just doing things like DT[,z:=10] and was perplexed by the output.

However, playing a bit more I've found another weird thing with copy
vs. reference:

DT <- data.table(x=1:10, y=1:10)
DT2 <- DT          # DT2 is a reference to DT at the moment
DT2[, z := 1:10]   # Both DT and DT2 have z as 1:10
DT2$z <- 2:11      # DT2 now becomes a copy of DT with updated z
column.  DT$z is still 1:10

Is it intentional that DT2$z should convert a reference to a copy?

> So, you if you really want to copy a (potentially very huge) data.table then
> I've (deliberately) made it harder for you (and me myself) to copy (often by
> accident).  But, being able to copy is still possible if you need to.

Yeah I actually appreciate the awareness it gives you about these things.

> Thinking about it, perhaps we need a new copy() function, or even
> duplicate(),  since data.table() is a bit too heavy if all you need is a
> mere copy.

+1 to this.

> Note that force() in a function body doesn't force local copies of
> data.tables, either, even on copy-on-write.  That is deliberate, too.  If
> you really need a copy,  then you really must explicitly copy to a new
> variable name AND use data.table() to explicitly create (potentially a huge
> amount of) new memory.
>
> It isn't actually data.table itself per se, that doesn't copy,  it's the
> functions that operate on it.   So, setkey has been changing DT by reference
> since 1.6.2,  and now := does too.
>
> Matthew
>
>
> "Chris Neff" <caneff at gmail.com> wrote in message
> news:CAAuY0RUVJjw1PJ7J-nET8yvx8nbXTfnLXs8p1gwd7-a9vGL+jA at mail.gmail.com...
> Now that I've played with := for a little bit, what is the rationale
> for the following?
>
>> DT <- data.table(x=1:10, y=1:10)
>> out <- DT[, z:=1:10]
>> out
>       x  y
>  [1,]  1  1
>  [2,]  2  2
>  [3,]  3  3
>  [4,]  4  4
>  [5,]  5  5
>  [6,]  6  6
>  [7,]  7  7
>  [8,]  8  8
>  [9,]  9  9
> [10,] 10 10
>> DT
>       x  y  z
>  [1,]  1  1  1
>  [2,]  2  2  2
>  [3,]  3  3  3
>  [4,]  4  4  4
>  [5,]  5  5  5
>  [6,]  6  6  6
>  [7,]  7  7  7
>  [8,]  8  8  8
>  [9,]  9  9  9
> [10,] 10 10 10
>
>
> I would have expected the return from DT[, z:=1:10] to be either A)
> nothing, which is what I think is the preferred thing if you are
> really trying to drill home the idea of in place assignment, or B) the
> newly updated version of DT with z in it (but I think that muddles
> what := does).  Why does it return what it does?
>
>
>
>
> _______________________________________________
> datatable-help mailing list
> datatable-help at lists.r-forge.r-project.org
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
>


More information about the datatable-help mailing list