No subject


Mon Oct 17 11:22:36 CEST 2011


r a new reference. =A0Anything in between is confusing.<div><br></div><div>=
How about this - add a new argument to data.table(), say max.cols. =A0max.c=
ols defaults to a couple orders of magnitude above the initial number of co=
lumns. =A0data.table allocates enough memory for max.cols column pointers. =
=A0If you try to add more than max.cols columns, it is either an error, or =
it creates a copy and produces a warning.<br>

<br><div class=3D"gmail_quote">On Fri, Oct 28, 2011 at 1:10 AM, Matthew Dow=
le <span dir=3D"ltr">&lt;<a href=3D"mailto:mdowle at mdowle.plus.com">mdowle at m=
dowle.plus.com</a>&gt;</span> wrote:<br><blockquote class=3D"gmail_quote" s=
tyle=3D"margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;">

Interesting one. Adding columns is a bit different to deleting and<br>
modifying columns. Here&#39;s how it works. Could make changes, could<br>
document it, or both, what do people think?<br>
<br>
Just like data.frame there is a list vector holding pointers to the<br>
column vectors. A delete column op is done with a memmove to budge up<br>
the column pointers above the column by one place. That leaves a gap at<br>
the end. The length attribute of that vector (ncol(DT)) is then<br>
decremented and the spare 4 bytes (or 8 on 64bit) are left unused at the<br=
>
end.<br>
<br>
An add column can&#39;t be fully by reference because the list vector is<br=
>
full. A new list vector has to be allocated, one slot larger, the old<br>
pointers memcpy&#39;d over, and the last spot assigned the pointer to the<b=
r>
new column vector. =A0This copying is negligible because it&#39;s a small l=
ist<br>
of pointers fitting well within one page. [Unless, there are many 1000&#39;=
s<br>
of columns, which is why it&#39;s done as efficiently as possible using<br>
memcpy].<br>
<br>
Aside : There is little known (I guess) distinction between length and<br>
truelength in R internals. Base R doesn&#39;t use it, but we could in<br>
data.table. A delete column sets length but leaves truelength one<br>
larger. When the next add column comes along, it could just do the budge<br=
>
up and insert the column. That may not be so advantageous for (a small<br>
number) of columns, =A0but the same logic could work for insert() and<br>
delete()ing rows. =A0Of course, this would mean whether a visible copy or<b=
r>
not is taken depends on what happened previously, rather than the<br>
syntax. That&#39;s something we&#39;ve disliked before, in the same way we<=
br>
dislike drop=3DTRUE behaviour and so dropped drop. One way to approach<br>
this might be to advise &quot;:=3D add *may* not copy. Best to assume it<br=
>
doesn&#39;t; use copy()&quot;. If you get in the habbit of &quot;DT2=3Dcopy=
(DT)&quot; then<br>
that&#39;ll take a deep copy at the time and you&#39;re safe.<br>
<br>
To illustrate the partial (maybe shallow copy is better word), consider<br>
the following :<br>
<br>
&gt; DT =3D data.table(1:2,3:4)<br>
&gt; DT2=3DDT<br>
&gt; DT2[,y:=3D10L]<br>
 =A0 =A0 V1 V2 =A0y<br>
[1,] =A01 =A03 10<br>
[2,] =A02 =A04 10<br>
&gt; DT<br>
 =A0 =A0 V1 V2<br>
[1,] =A01 =A03<br>
[2,] =A02 =A04<br>
&gt; DT2<br>
 =A0 =A0 V1 V2 =A0y<br>
[1,] =A01 =A03 10<br>
[2,] =A02 =A04 10<br>
&gt; DT2[1,V1:=3D99L]<br>
 =A0 =A0 V1 V2 =A0y<br>
[1,] 99 =A03 10<br>
[2,] =A02 =A04 10<br>
&gt; DT<br>
 =A0 =A0 V1 V2<br>
[1,] 99 =A03<br>
[2,] =A02 =A04<br>
&gt;<br>
<br>
Matthew<br>
<div><div class=3D"h5"><br>
<br>
On Thu, 2011-10-27 at 11:46 -0700, Muhammad Waliji wrote:<br>
&gt; I think this is a bug. =A0DT.2 &lt;- DT.1 doesn&#39;t seem to make a c=
opy in<br>
&gt; all cases.<br>
&gt;<br>
&gt;<br>
&gt; &gt; DT.1 &lt;- data.table(x=3D1, y=3D1)<br>
&gt; &gt; DT.2 &lt;- DT.1<br>
&gt; &gt;<br>
&gt; &gt; # Both DT.1 and DT.2 are changed.<br>
&gt; &gt; DT.2[, y :=3D NULL]<br>
&gt; =A0 =A0 =A0x<br>
&gt; [1,] 1<br>
&gt; &gt; DT.1<br>
&gt; =A0 =A0 =A0x<br>
&gt; [1,] 1<br>
&gt; &gt; DT.2<br>
&gt; =A0 =A0 =A0x<br>
&gt; [1,] 1<br>
&gt; &gt;<br>
&gt; &gt; # Only DT.2 is changed<br>
&gt; &gt; DT.2[, y :=3D x]<br>
&gt; =A0 =A0 =A0x y<br>
&gt; [1,] 1 1<br>
&gt; &gt; DT.1<br>
&gt; =A0 =A0 =A0x<br>
&gt; [1,] 1<br>
&gt; &gt; DT.2<br>
&gt; =A0 =A0 =A0x y<br>
&gt; [1,] 1 1<br>
&gt;<br>
&gt;<br>
</div></div>&gt; _______________________________________________<br>
&gt; datatable-help mailing list<br>
&gt; <a href=3D"mailto:datatable-help at lists.r-forge.r-project.org">datatabl=
e-help at lists.r-forge.r-project.org</a><br>
&gt; <a href=3D"https://lists.r-forge.r-project.org/cgi-bin/mailman/listinf=
o/datatable-help" target=3D"_blank">https://lists.r-forge.r-project.org/cgi=
-bin/mailman/listinfo/datatable-help</a><br>
<br>
<br>
</blockquote></div><br></div>

--0016e64f69a2f16e7f04b05eb805--


More information about the datatable-help mailing list