No subject


Mon Oct 17 11:22:36 CEST 2011


reference.  Anything in between is confusing.

How about this - add a new argument to data.table(), say max.cols.  max.cols
defaults to a couple orders of magnitude above the initial number of
columns.  data.table allocates enough memory for max.cols column pointers.
 If you try to add more than max.cols columns, it is either an error, or it
creates a copy and produces a warning.

On Fri, Oct 28, 2011 at 1:10 AM, Matthew Dowle <mdowle at mdowle.plus.com>wrote:

> Interesting one. Adding columns is a bit different to deleting and
> modifying columns. Here's how it works. Could make changes, could
> document it, or both, what do people think?
>
> Just like data.frame there is a list vector holding pointers to the
> column vectors. A delete column op is done with a memmove to budge up
> the column pointers above the column by one place. That leaves a gap at
> the end. The length attribute of that vector (ncol(DT)) is then
> decremented and the spare 4 bytes (or 8 on 64bit) are left unused at the
> end.
>
> An add column can't be fully by reference because the list vector is
> full. A new list vector has to be allocated, one slot larger, the old
> pointers memcpy'd over, and the last spot assigned the pointer to the
> new column vector.  This copying is negligible because it's a small list
> of pointers fitting well within one page. [Unless, there are many 1000's
> of columns, which is why it's done as efficiently as possible using
> memcpy].
>
> Aside : There is little known (I guess) distinction between length and
> truelength in R internals. Base R doesn't use it, but we could in
> data.table. A delete column sets length but leaves truelength one
> larger. When the next add column comes along, it could just do the budge
> up and insert the column. That may not be so advantageous for (a small
> number) of columns,  but the same logic could work for insert() and
> delete()ing rows.  Of course, this would mean whether a visible copy or
> not is taken depends on what happened previously, rather than the
> syntax. That's something we've disliked before, in the same way we
> dislike drop=TRUE behaviour and so dropped drop. One way to approach
> this might be to advise ":= add *may* not copy. Best to assume it
> doesn't; use copy()". If you get in the habbit of "DT2=copy(DT)" then
> that'll take a deep copy at the time and you're safe.
>
> To illustrate the partial (maybe shallow copy is better word), consider
> the following :
>
> > DT = data.table(1:2,3:4)
> > DT2=DT
> > DT2[,y:=10L]
>     V1 V2  y
> [1,]  1  3 10
> [2,]  2  4 10
> > DT
>     V1 V2
> [1,]  1  3
> [2,]  2  4
> > DT2
>     V1 V2  y
> [1,]  1  3 10
> [2,]  2  4 10
> > DT2[1,V1:=99L]
>     V1 V2  y
> [1,] 99  3 10
> [2,]  2  4 10
> > DT
>     V1 V2
> [1,] 99  3
> [2,]  2  4
> >
>
> Matthew
>
>
> On Thu, 2011-10-27 at 11:46 -0700, Muhammad Waliji wrote:
> > I think this is a bug.  DT.2 <- DT.1 doesn't seem to make a copy in
> > all cases.
> >
> >
> > > DT.1 <- data.table(x=1, y=1)
> > > DT.2 <- DT.1
> > >
> > > # Both DT.1 and DT.2 are changed.
> > > DT.2[, y := NULL]
> >      x
> > [1,] 1
> > > DT.1
> >      x
> > [1,] 1
> > > DT.2
> >      x
> > [1,] 1
> > >
> > > # Only DT.2 is changed
> > > DT.2[, y := x]
> >      x y
> > [1,] 1 1
> > > DT.1
> >      x
> > [1,] 1
> > > DT.2
> >      x y
> > [1,] 1 1
> >
> >
> > _______________________________________________
> > datatable-help mailing list
> > datatable-help at lists.r-forge.r-project.org
> >
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
>
>
>

--0016e64f69a2f16e7f04b05eb805
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable



More information about the datatable-help mailing list