[datatable-help] Copying a data.table (subject was ':= unclarity and possible bug?')
Chris Neff
caneff at gmail.com
Fri Aug 5 14:57:42 CEST 2011
On 5 August 2011 08:37, Matthew Dowle <mdowle at mdowle.plus.com> wrote:
> That is indeed odd. Please file a bug.report(). Intended was option B.
Done: https://r-forge.r-project.org/tracker/index.php?func=detail&aid=1496&group_id=240&atid=975
> I see why you prefer A, but it is B so that compound syntax works; e.g.
> DT[i,done:=TRUE][,sum(done)]
Ah that makes sense. If someone absent mindedly does DT <- DT[i,
z:=1:10] are they incurring any copying there? I'm guessing no because
<- is by reference.
>
> Compound syntax is also why it doesn't return the number of rows updated by
> :=. Verbosity returns the 'x rows updated' message.
>
> However, even if it worked, the usual copy-on-write semantics would still
> not work (and that is deliberate).
>
> The correct way to copy a data.table is (now) :
>
> out <- data.table(DT[,z:=10])
>
> As per (new) examples in ?setkey. Does that work ok in this case?
Unexpected but acceptable and reasonable in the context of the
package. I saw the examples in setkey and those work for me in terms
of making a copy. I really wasn't even interested in making a copy I
was just doing things like DT[,z:=10] and was perplexed by the output.
However, playing a bit more I've found another weird thing with copy
vs. reference:
DT <- data.table(x=1:10, y=1:10)
DT2 <- DT # DT2 is a reference to DT at the moment
DT2[, z := 1:10] # Both DT and DT2 have z as 1:10
DT2$z <- 2:11 # DT2 now becomes a copy of DT with updated z
column. DT$z is still 1:10
Is it intentional that DT2$z should convert a reference to a copy?
> So, you if you really want to copy a (potentially very huge) data.table then
> I've (deliberately) made it harder for you (and me myself) to copy (often by
> accident). But, being able to copy is still possible if you need to.
Yeah I actually appreciate the awareness it gives you about these things.
> Thinking about it, perhaps we need a new copy() function, or even
> duplicate(), since data.table() is a bit too heavy if all you need is a
> mere copy.
+1 to this.
> Note that force() in a function body doesn't force local copies of
> data.tables, either, even on copy-on-write. That is deliberate, too. If
> you really need a copy, then you really must explicitly copy to a new
> variable name AND use data.table() to explicitly create (potentially a huge
> amount of) new memory.
>
> It isn't actually data.table itself per se, that doesn't copy, it's the
> functions that operate on it. So, setkey has been changing DT by reference
> since 1.6.2, and now := does too.
>
> Matthew
>
>
> "Chris Neff" <caneff at gmail.com> wrote in message
> news:CAAuY0RUVJjw1PJ7J-nET8yvx8nbXTfnLXs8p1gwd7-a9vGL+jA at mail.gmail.com...
> Now that I've played with := for a little bit, what is the rationale
> for the following?
>
>> DT <- data.table(x=1:10, y=1:10)
>> out <- DT[, z:=1:10]
>> out
> x y
> [1,] 1 1
> [2,] 2 2
> [3,] 3 3
> [4,] 4 4
> [5,] 5 5
> [6,] 6 6
> [7,] 7 7
> [8,] 8 8
> [9,] 9 9
> [10,] 10 10
>> DT
> x y z
> [1,] 1 1 1
> [2,] 2 2 2
> [3,] 3 3 3
> [4,] 4 4 4
> [5,] 5 5 5
> [6,] 6 6 6
> [7,] 7 7 7
> [8,] 8 8 8
> [9,] 9 9 9
> [10,] 10 10 10
>
>
> I would have expected the return from DT[, z:=1:10] to be either A)
> nothing, which is what I think is the preferred thing if you are
> really trying to drill home the idea of in place assignment, or B) the
> newly updated version of DT with z in it (but I think that muddles
> what := does). Why does it return what it does?
>
>
>
>
> _______________________________________________
> datatable-help mailing list
> datatable-help at lists.r-forge.r-project.org
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
>
More information about the datatable-help
mailing list