<div>Steve,</div><div><br></div>These are good examples of confusing statements. <div>In same cases, people might prefer to use them intentionally for certain purposes, </div><div>(even in that case, it would detract from the readability or maintainability of programs).</div>

<div>On the other side of the coin, they are masking program errors.</div><div>It is a mistake that R overlooked such usability issues (i.e., programmer usability).</div><div>And, two wrongs will not make a right.</div><div>

<br><div>I wouldn&#39;t go as much as saying that R should have been</div><div>a typed language, but I do strongly believe that R libraries can be made</div><div>more user or developer friendly (still using the command line).</div>

<div>Using appropriate warnings in the places where you suspect that, with 80-90%</div><div>probability, the user or programmer might be doing something unexpected,</div><div>just issue a warning.</div><div><br></div><div>

<br><div class="gmail_quote">On Fri, May 6, 2011 at 10:48 AM, Steve Lianoglou <span dir="ltr">&lt;<a href="mailto:mailinglist.honeypot@gmail.com">mailinglist.honeypot@gmail.com</a>&gt;</span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;">

Hi Steve,<br>

<br>

As (another :-) aside -- make sure you use &quot;reply-all&quot; when replying<br>

to messages from this (and pretty much all other R-related) mailing<br>

lists, otherwise your mail goes straight to the person, and not back<br>

to the list.<br>

<br>

Other comments in line:<br>

<br>

On Fri, May 6, 2011 at 10:29 AM, Steve Harman &lt;<a href="mailto:stvharman@gmail.com">stvharman@gmail.com</a>&gt; wrote:<br>

&gt; Steve, this works.<br>

<br>

Great! Glad to hear it.<br>

<div class="im"><br>

&gt; However, this discussion shows that we need some error or<br>

&gt; at least warning messages in this case.<br>

<br>

</div>For this particular case, I&#39;d respectfully have to disagree.<br>

<div class="im"><br>

&gt; It is important to pay attention to user (in this case programmer)<br>

&gt; experience and facilitate recovery from<br>

&gt; mistakes by providing the user with meaningful and timely messages.<br>

&gt; thanks for all your help,<br>

<br>

</div>I would argue that what happened to you is actually &quot;expected behavior.&quot;<br>

<br>

You&#39;ll find that in many contexts, if &quot;R&quot; thinks it can figure out<br>

what you intended to do with two vectors that aren&#39;t the same length,<br>

it will try to be smart and do it.<br>

<br>

For instance, this is similar to what happened to you -- notice how<br>

TRUE is recycled to be as long as the first column here:<br>

<br>

R&gt; data.frame(id=letters[1:5], huh=TRUE)<br>

  id  huh<br>

1  a TRUE<br>

2  b TRUE<br>

3  c TRUE<br>

4  d TRUE<br>

5  e TRUE<br>

<br>

Perhaps more strangely, but still &quot;R-correct&quot; (note no warning):<br>

<br>

R&gt; 1:3 + 1:6 ## == c(1:3,1:3) + 1:6<br>

[1] 2 4 6 5 7 9 8<br>

<br>

R thinks this is strange, but still does &quot;something&quot; for you (but<br>

gives a warning since the 2nd vector isn&#39;t a multiple of the first<br>

<br>

R&gt; 1:3 + 1:7<br>

[1] 2 4 6 5 7 9 8<br>

Warning message:<br>

In 1:3 + 1:7 :<br>

  longer object length is not a multiple of shorter object length<br>

<br>

Often times I actually take advantage of the situation that happened<br>

to you to expand a result into several rows (instead of just into 1)<br>

when doing split/summarize/merge stuff with data.table&#39;s [,<br>

by=&#39;something&#39;] mojo.<br>

<br>

My 2 cents,<br>

<font color="#888888"><br>

-steve<br>

</font><div><div></div><div class="h5"><br>

&gt; On Fri, May 6, 2011 at 9:44 AM, Steve Harman &lt;<a href="mailto:stvharman@gmail.com">stvharman@gmail.com</a>&gt; wrote:<br>

&gt;&gt;<br>

&gt;&gt; Thanks, I&#39;ll try it today and let you know.<br>

&gt;&gt;<br>

&gt;&gt; On Fri, May 6, 2011 at 12:22 AM, Steve Lianoglou<br>

&gt;&gt; &lt;<a href="mailto:mailinglist.honeypot@gmail.com">mailinglist.honeypot@gmail.com</a>&gt; wrote:<br>

&gt;&gt;&gt;<br>

&gt;&gt;&gt; Hi,<br>

&gt;&gt;&gt;<br>

&gt;&gt;&gt; As an aside -- in the future, please provide some data in a form that<br>

&gt;&gt;&gt; we can just copy and paste from your email into an R session so that<br>

&gt;&gt;&gt; we can get a working object up quickly.<br>

&gt;&gt;&gt;<br>

&gt;&gt;&gt; For example:<br>

&gt;&gt;&gt;<br>

&gt;&gt;&gt; R&gt; dt &lt;- data.table(coursecode=c(NA, NA, NA, 101, 102, 101, 102, 103),<br>

&gt;&gt;&gt;  student_id=c(1, 1, 1, 1, 1, 2, 2, 2),<br>

&gt;&gt;&gt;  key=&#39;student_id&#39;)<br>

&gt;&gt;&gt;<br>

&gt;&gt;&gt; On Thu, May 5, 2011 at 10:54 PM, Steve Harman &lt;<a href="mailto:stvharman@gmail.com">stvharman@gmail.com</a>&gt;<br>

&gt;&gt;&gt; wrote:<br>

&gt;&gt;&gt; &gt; Hello<br>

&gt;&gt;&gt; &gt;<br>

&gt;&gt;&gt; &gt; I have a data table called dt in which each student can have multiple<br>

&gt;&gt;&gt; &gt; records (created using data.table)<br>

&gt;&gt;&gt; &gt;<br>

&gt;&gt;&gt; &gt; coursecode    student_id<br>

&gt;&gt;&gt; &gt; ----------------    ----------------<br>

&gt;&gt;&gt; &gt; NA               1<br>

&gt;&gt;&gt; &gt; NA               1<br>

&gt;&gt;&gt; &gt; NA               1<br>

&gt;&gt;&gt; &gt; ....                1<br>

&gt;&gt;&gt; &gt; ....                1<br>

&gt;&gt;&gt; &gt; NA                2<br>

&gt;&gt;&gt; &gt; 101               2<br>

&gt;&gt;&gt; &gt; 102               2<br>

&gt;&gt;&gt; &gt; NA                2<br>

&gt;&gt;&gt; &gt; 103                2<br>

&gt;&gt;&gt; &gt;<br>

&gt;&gt;&gt; &gt; I am trying to group by student id and concatenate the coursecode<br>

&gt;&gt;&gt; &gt; strings in<br>

&gt;&gt;&gt; &gt; student records. This string is mostly NA but it can also be real<br>

&gt;&gt;&gt; &gt; course code<br>

&gt;&gt;&gt; &gt; (because of messy real life data coursecode was not always entered)<br>

&gt;&gt;&gt; &gt; There are 999999 records.<br>

&gt;&gt;&gt; &gt;<br>

&gt;&gt;&gt; &gt; So, I thought I would get results like<br>

&gt;&gt;&gt; &gt;<br>

&gt;&gt;&gt; &gt; 1 NA NA NA .....<br>

&gt;&gt;&gt; &gt; 2 NA 101 102 NA 123 ....<br>

&gt;&gt;&gt;<br>

&gt;&gt;&gt; What type of object are you expecting that result to be?<br>

&gt;&gt;&gt;<br>

&gt;&gt;&gt; &gt; However, as seen below, it  brings me a result with 999999 rows<br>

&gt;&gt;&gt; &gt; and it fails to concatenate the coursecode&#39;s.<br>

&gt;&gt;&gt; &gt;<br>

&gt;&gt;&gt; &gt;&gt;  codes &lt;- dt[,paste(coursecode),by=student_id]<br>

&gt;&gt;&gt; &gt;&gt; codes<br>

&gt;&gt;&gt; &gt;      student_id V1<br>

&gt;&gt;&gt; &gt;  [1,]          1 NA<br>

&gt;&gt;&gt; &gt;  [2,]          1 NA<br>

&gt;&gt;&gt; &gt;  [3,]          1 NA<br>

&gt;&gt;&gt; &gt;  [4,]          1 NA<br>

&gt;&gt;&gt; &gt;  [5,]          1 NA<br>

&gt;&gt;&gt; &gt;  [6,]          1 NA<br>

&gt;&gt;&gt; &gt;  [7,]          1 NA<br>

&gt;&gt;&gt; &gt;  [8,]          1 NA<br>

&gt;&gt;&gt; &gt;  [9,]          1 NA<br>

&gt;&gt;&gt; &gt; [10,]          1 NA<br>

&gt;&gt;&gt; &gt; First 10 rows of 999999 printed.<br>

&gt;&gt;&gt; &gt;<br>

&gt;&gt;&gt; &gt; If I repeat the same example for a numeric attribute and use some math<br>

&gt;&gt;&gt; &gt; aggregation functions such as sum, mean, etc., then the number of rows<br>

&gt;&gt;&gt; &gt; returned is correct, it is indeed equal to the number of students.<br>

&gt;&gt;&gt; &gt;<br>

&gt;&gt;&gt; &gt; I was wondering if the problem is with NA&#39;s or with the use of paste<br>

&gt;&gt;&gt; &gt; as the aggregation function. I can alternatively use RMySQL with MySQL<br>

&gt;&gt;&gt; &gt; to concatenate those strings but I would like to use data.table if<br>

&gt;&gt;&gt; &gt; possible.<br>

&gt;&gt;&gt;<br>

&gt;&gt;&gt; What if you try this (using my `dt` example from above):<br>

&gt;&gt;&gt;<br>

&gt;&gt;&gt; R&gt; dt[, paste(coursecode, collapse=&quot;,&quot;), by=student_id]<br>

&gt;&gt;&gt;     student_id               V1<br>

&gt;&gt;&gt; [1,]          1 NA,NA,NA,101,102<br>

&gt;&gt;&gt; [2,]          2      101,102,103<br>

&gt;&gt;&gt;<br>

&gt;&gt;&gt; Note that each element in the $V1 column is a character vector of<br>

&gt;&gt;&gt; length 1 and not individual course codes.<br>

&gt;&gt;&gt;<br>

&gt;&gt;&gt; Without using the `collapse` argument to your call to paste, you just<br>

&gt;&gt;&gt; get a character vector which is the same length as you passed in, eg:<br>

&gt;&gt;&gt;<br>

&gt;&gt;&gt; R&gt; paste(c(&#39;A&#39;, &#39;B&#39;, NA, &#39;C&#39;))<br>

&gt;&gt;&gt; [1] &quot;A&quot;  &quot;B&quot;  &quot;NA&quot; &quot;C&quot;<br>

&gt;&gt;&gt;<br>

&gt;&gt;&gt; vs.<br>

&gt;&gt;&gt;<br>

&gt;&gt;&gt; R&gt; paste(c(&#39;A&#39;, &#39;B&#39;, NA, &#39;C&#39;), collapse=&quot;,&quot;)<br>

&gt;&gt;&gt; [1] &quot;A,B,NA,C&quot;<br>

&gt;&gt;&gt;<br>

&gt;&gt;&gt; HTH,<br>

&gt;&gt;&gt;<br>

&gt;&gt;&gt; -steve<br>

&gt;&gt;&gt;<br>

&gt;&gt;&gt; --<br>

&gt;&gt;&gt; Steve Lianoglou<br>

&gt;&gt;&gt; Graduate Student: Computational Systems Biology<br>

&gt;&gt;&gt;  | Memorial Sloan-Kettering Cancer Center<br>

&gt;&gt;&gt;  | Weill Medical College of Cornell University<br>

&gt;&gt;&gt; Contact Info: <a href="http://cbio.mskcc.org/~lianos/contact" target="_blank">http://cbio.mskcc.org/~lianos/contact</a><br>

&gt;&gt;<br>

&gt;<br>

&gt;<br>

<br>

<br>

<br>

</div></div>--<br>

<div><div></div><div class="h5">Steve Lianoglou<br>

Graduate Student: Computational Systems Biology<br>

 | Memorial Sloan-Kettering Cancer Center<br>

 | Weill Medical College of Cornell University<br>

Contact Info: <a href="http://cbio.mskcc.org/~lianos/contact" target="_blank">http://cbio.mskcc.org/~lianos/contact</a><br>

</div></div></blockquote></div><br></div></div>