<html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:w="urn:schemas-microsoft-com:office:word" xmlns:x="urn:schemas-microsoft-com:office:excel" xmlns:p="urn:schemas-microsoft-com:office:powerpoint" xmlns:a="urn:schemas-microsoft-com:office:access" xmlns:dt="uuid:C2F41010-65B3-11d1-A29F-00AA00C14882" xmlns:s="uuid:BDC6E3F0-6DA3-11d1-A2A3-00AA00C14882" xmlns:rs="urn:schemas-microsoft-com:rowset" xmlns:z="#RowsetSchema" xmlns:b="urn:schemas-microsoft-com:office:publisher" xmlns:ss="urn:schemas-microsoft-com:office:spreadsheet" xmlns:c="urn:schemas-microsoft-com:office:component:spreadsheet" xmlns:odc="urn:schemas-microsoft-com:office:odc" xmlns:oa="urn:schemas-microsoft-com:office:activation" xmlns:html="http://www.w3.org/TR/REC-html40" xmlns:q="http://schemas.xmlsoap.org/soap/envelope/" xmlns:rtc="http://microsoft.com/officenet/conferencing" xmlns:D="DAV:" xmlns:Repl="http://schemas.microsoft.com/repl/" xmlns:mt="http://schemas.microsoft.com/sharepoint/soap/meetings/" xmlns:x2="http://schemas.microsoft.com/office/excel/2003/xml" xmlns:ppda="http://www.passport.com/NameSpace.xsd" xmlns:ois="http://schemas.microsoft.com/sharepoint/soap/ois/" xmlns:dir="http://schemas.microsoft.com/sharepoint/soap/directory/" xmlns:ds="http://www.w3.org/2000/09/xmldsig#" xmlns:dsp="http://schemas.microsoft.com/sharepoint/dsp" xmlns:udc="http://schemas.microsoft.com/data/udc" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:sub="http://schemas.microsoft.com/sharepoint/soap/2002/1/alerts/" xmlns:ec="http://www.w3.org/2001/04/xmlenc#" xmlns:sp="http://schemas.microsoft.com/sharepoint/" xmlns:sps="http://schemas.microsoft.com/sharepoint/soap/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:udcs="http://schemas.microsoft.com/data/udc/soap" xmlns:udcxf="http://schemas.microsoft.com/data/udc/xmlfile" xmlns:udcp2p="http://schemas.microsoft.com/data/udc/parttopart" xmlns:wf="http://schemas.microsoft.com/sharepoint/soap/workflow/" xmlns:dsss="http://schemas.microsoft.com/office/2006/digsig-setup" xmlns:dssi="http://schemas.microsoft.com/office/2006/digsig" xmlns:mdssi="http://schemas.openxmlformats.org/package/2006/digital-signature" xmlns:mver="http://schemas.openxmlformats.org/markup-compatibility/2006" xmlns:m="http://schemas.microsoft.com/office/2004/12/omml" xmlns:mrels="http://schemas.openxmlformats.org/package/2006/relationships" xmlns:spwp="http://microsoft.com/sharepoint/webpartpages" xmlns:ex12t="http://schemas.microsoft.com/exchange/services/2006/types" xmlns:ex12m="http://schemas.microsoft.com/exchange/services/2006/messages" xmlns:pptsl="http://schemas.microsoft.com/sharepoint/soap/SlideLibrary/" xmlns:spsl="http://microsoft.com/webservices/SharePointPortalServer/PublishedLinksService" xmlns:Z="urn:schemas-microsoft-com:" xmlns:st="" xmlns="http://www.w3.org/TR/REC-html40"><head><meta http-equiv=Content-Type content="text/html; charset=us-ascii"><meta name=Generator content="Microsoft Word 12 (filtered medium)"><style><!--
/* Font Definitions */
@font-face
        {font-family:"Cambria Math";
        panose-1:2 4 5 3 5 4 6 3 2 4;}
@font-face
        {font-family:Calibri;
        panose-1:2 15 5 2 2 2 4 3 2 4;}
@font-face
        {font-family:Tahoma;
        panose-1:2 11 6 4 3 5 4 4 2 4;}
/* Style Definitions */
p.MsoNormal, li.MsoNormal, div.MsoNormal
        {margin:0in;
        margin-bottom:.0001pt;
        font-size:12.0pt;
        font-family:"Times New Roman","serif";}
a:link, span.MsoHyperlink
        {mso-style-priority:99;
        color:blue;
        text-decoration:underline;}
a:visited, span.MsoHyperlinkFollowed
        {mso-style-priority:99;
        color:purple;
        text-decoration:underline;}
span.EmailStyle17
        {mso-style-type:personal-reply;
        font-family:"Calibri","sans-serif";
        color:#1F497D;}
.MsoChpDefault
        {mso-style-type:export-only;}
@page WordSection1
        {size:8.5in 11.0in;
        margin:1.0in 1.0in 1.0in 1.0in;}
div.WordSection1
        {page:WordSection1;}
--></style><!--[if gte mso 9]><xml>
<o:shapedefaults v:ext="edit" spidmax="1026" />
</xml><![endif]--><!--[if gte mso 9]><xml>
<o:shapelayout v:ext="edit">
<o:idmap v:ext="edit" data="1" />
</o:shapelayout></xml><![endif]--></head><body lang=EN-US link=blue vlink=purple><div class=WordSection1><p class=MsoNormal><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D'>Steve H, <o:p></o:p></span></p><p class=MsoNormal><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D'><o:p> </o:p></span></p><p class=MsoNormal><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D'>As a R user, I sometimes make fundamental mistakes (like forgetting to use collapse with the paste function when I want to collapse).<o:p></o:p></span></p><p class=MsoNormal><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D'><o:p> </o:p></span></p><p class=MsoNormal><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D'>However, R is a powerful language. It assumes the user knows what he or she is doing unless something is almost certainly wrong (Steve L provided some examples. This seems like the 80-90% you mentioned, but it’s probably more in the 95%-99% range.) In my opinion, it is unrealistic for you to make what are really programming mistakes on your part (for what you INTENDED—if you INTENDED something else it would not be a mistake) and then expect the software to be able to read your INTENT. <o:p></o:p></span></p><p class=MsoNormal><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D'><o:p> </o:p></span></p><p class=MsoNormal><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D'>I am not a great programmer, but having worked with software that prints out too many warnings—or worse, that will not let you do some things because the programmers decided a user would be unlikely to want to do this—I prefer R’s approach.<o:p></o:p></span></p><p class=MsoNormal><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D'><o:p> </o:p></span></p><p class=MsoNormal><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D'>Regarding the recycling note recently posted—yes, that may be a nice option. (But will you need to need to have a third option: “don’t print out recycling warnings for vectors of length 1”? That’s usually done intentionally.<o:p></o:p></span></p><p class=MsoNormal><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D'><o:p> </o:p></span></p><p class=MsoNormal><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D'>Regards,<o:p></o:p></span></p><p class=MsoNormal><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D'><o:p> </o:p></span></p><p class=MsoNormal><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D'>Joe V.<o:p></o:p></span></p><div style='border:none;border-top:solid #B5C4DF 1.0pt;padding:3.0pt 0in 0in 0in'><p class=MsoNormal><b><span style='font-size:10.0pt;font-family:"Tahoma","sans-serif"'>From:</span></b><span style='font-size:10.0pt;font-family:"Tahoma","sans-serif"'> datatable-help-bounces@r-forge.wu-wien.ac.at [mailto:datatable-help-bounces@r-forge.wu-wien.ac.at] <b>On Behalf Of </b>Steve Harman<br><b>Sent:</b> Friday, May 06, 2011 2:05 PM<br><b>To:</b> Steve Lianoglou<br><b>Cc:</b> datatable-help@r-forge.wu-wien.ac.at<br><b>Subject:</b> Re: [datatable-help] using paste function while grouping gives strange results<o:p></o:p></span></p></div><p class=MsoNormal><o:p> </o:p></p><div><p class=MsoNormal>Steve,<o:p></o:p></p></div><div><p class=MsoNormal><o:p> </o:p></p></div><p class=MsoNormal>These are good examples of confusing statements. <o:p></o:p></p><div><p class=MsoNormal>In same cases, people might prefer to use them intentionally for certain purposes, <o:p></o:p></p></div><div><p class=MsoNormal>(even in that case, it would detract from the readability or maintainability of programs).<o:p></o:p></p></div><div><p class=MsoNormal>On the other side of the coin, they are masking program errors.<o:p></o:p></p></div><div><p class=MsoNormal>It is a mistake that R overlooked such usability issues (i.e., programmer usability).<o:p></o:p></p></div><div><p class=MsoNormal>And, two wrongs will not make a right.<o:p></o:p></p></div><div><p class=MsoNormal><o:p> </o:p></p><div><p class=MsoNormal>I wouldn't go as much as saying that R should have been<o:p></o:p></p></div><div><p class=MsoNormal>a typed language, but I do strongly believe that R libraries can be made<o:p></o:p></p></div><div><p class=MsoNormal>more user or developer friendly (still using the command line).<o:p></o:p></p></div><div><p class=MsoNormal>Using appropriate warnings in the places where you suspect that, with 80-90%<o:p></o:p></p></div><div><p class=MsoNormal>probability, the user or programmer might be doing something unexpected,<o:p></o:p></p></div><div><p class=MsoNormal>just issue a warning.<o:p></o:p></p></div><div><p class=MsoNormal><o:p> </o:p></p></div><div><p class=MsoNormal><o:p> </o:p></p><div><p class=MsoNormal>On Fri, May 6, 2011 at 10:48 AM, Steve Lianoglou <<a href="mailto:mailinglist.honeypot@gmail.com">mailinglist.honeypot@gmail.com</a>> wrote:<o:p></o:p></p><p class=MsoNormal>Hi Steve,<br><br>As (another :-) aside -- make sure you use "reply-all" when replying<br>to messages from this (and pretty much all other R-related) mailing<br>lists, otherwise your mail goes straight to the person, and not back<br>to the list.<br><br>Other comments in line:<br><br>On Fri, May 6, 2011 at 10:29 AM, Steve Harman <<a href="mailto:stvharman@gmail.com">stvharman@gmail.com</a>> wrote:<br>> Steve, this works.<br><br>Great! Glad to hear it.<o:p></o:p></p><div><p class=MsoNormal style='margin-bottom:12.0pt'><br>> However, this discussion shows that we need some error or<br>> at least warning messages in this case.<o:p></o:p></p></div><p class=MsoNormal>For this particular case, I'd respectfully have to disagree.<o:p></o:p></p><div><p class=MsoNormal style='margin-bottom:12.0pt'><br>> It is important to pay attention to user (in this case programmer)<br>> experience and facilitate recovery from<br>> mistakes by providing the user with meaningful and timely messages.<br>> thanks for all your help,<o:p></o:p></p></div><p class=MsoNormal>I would argue that what happened to you is actually "expected behavior."<br><br>You'll find that in many contexts, if "R" thinks it can figure out<br>what you intended to do with two vectors that aren't the same length,<br>it will try to be smart and do it.<br><br>For instance, this is similar to what happened to you -- notice how<br>TRUE is recycled to be as long as the first column here:<br><br>R> data.frame(id=letters[1:5], huh=TRUE)<br> id huh<br>1 a TRUE<br>2 b TRUE<br>3 c TRUE<br>4 d TRUE<br>5 e TRUE<br><br>Perhaps more strangely, but still "R-correct" (note no warning):<br><br>R> 1:3 + 1:6 ## == c(1:3,1:3) + 1:6<br>[1] 2 4 6 5 7 9 8<br><br>R thinks this is strange, but still does "something" for you (but<br>gives a warning since the 2nd vector isn't a multiple of the first<br><br>R> 1:3 + 1:7<br>[1] 2 4 6 5 7 9 8<br>Warning message:<br>In 1:3 + 1:7 :<br> longer object length is not a multiple of shorter object length<br><br>Often times I actually take advantage of the situation that happened<br>to you to expand a result into several rows (instead of just into 1)<br>when doing split/summarize/merge stuff with data.table's [,<br>by='something'] mojo.<br><br>My 2 cents,<br><span style='color:#888888'><br>-steve</span><o:p></o:p></p><div><div><p class=MsoNormal style='margin-bottom:12.0pt'><br>> On Fri, May 6, 2011 at 9:44 AM, Steve Harman <<a href="mailto:stvharman@gmail.com">stvharman@gmail.com</a>> wrote:<br>>><br>>> Thanks, I'll try it today and let you know.<br>>><br>>> On Fri, May 6, 2011 at 12:22 AM, Steve Lianoglou<br>>> <<a href="mailto:mailinglist.honeypot@gmail.com">mailinglist.honeypot@gmail.com</a>> wrote:<br>>>><br>>>> Hi,<br>>>><br>>>> As an aside -- in the future, please provide some data in a form that<br>>>> we can just copy and paste from your email into an R session so that<br>>>> we can get a working object up quickly.<br>>>><br>>>> For example:<br>>>><br>>>> R> dt <- data.table(coursecode=c(NA, NA, NA, 101, 102, 101, 102, 103),<br>>>> student_id=c(1, 1, 1, 1, 1, 2, 2, 2),<br>>>> key='student_id')<br>>>><br>>>> On Thu, May 5, 2011 at 10:54 PM, Steve Harman <<a href="mailto:stvharman@gmail.com">stvharman@gmail.com</a>><br>>>> wrote:<br>>>> > Hello<br>>>> ><br>>>> > I have a data table called dt in which each student can have multiple<br>>>> > records (created using data.table)<br>>>> ><br>>>> > coursecode student_id<br>>>> > ---------------- ----------------<br>>>> > NA 1<br>>>> > NA 1<br>>>> > NA 1<br>>>> > .... 1<br>>>> > .... 1<br>>>> > NA 2<br>>>> > 101 2<br>>>> > 102 2<br>>>> > NA 2<br>>>> > 103 2<br>>>> ><br>>>> > I am trying to group by student id and concatenate the coursecode<br>>>> > strings in<br>>>> > student records. This string is mostly NA but it can also be real<br>>>> > course code<br>>>> > (because of messy real life data coursecode was not always entered)<br>>>> > There are 999999 records.<br>>>> ><br>>>> > So, I thought I would get results like<br>>>> ><br>>>> > 1 NA NA NA .....<br>>>> > 2 NA 101 102 NA 123 ....<br>>>><br>>>> What type of object are you expecting that result to be?<br>>>><br>>>> > However, as seen below, it brings me a result with 999999 rows<br>>>> > and it fails to concatenate the coursecode's.<br>>>> ><br>>>> >> codes <- dt[,paste(coursecode),by=student_id]<br>>>> >> codes<br>>>> > student_id V1<br>>>> > [1,] 1 NA<br>>>> > [2,] 1 NA<br>>>> > [3,] 1 NA<br>>>> > [4,] 1 NA<br>>>> > [5,] 1 NA<br>>>> > [6,] 1 NA<br>>>> > [7,] 1 NA<br>>>> > [8,] 1 NA<br>>>> > [9,] 1 NA<br>>>> > [10,] 1 NA<br>>>> > First 10 rows of 999999 printed.<br>>>> ><br>>>> > If I repeat the same example for a numeric attribute and use some math<br>>>> > aggregation functions such as sum, mean, etc., then the number of rows<br>>>> > returned is correct, it is indeed equal to the number of students.<br>>>> ><br>>>> > I was wondering if the problem is with NA's or with the use of paste<br>>>> > as the aggregation function. I can alternatively use RMySQL with MySQL<br>>>> > to concatenate those strings but I would like to use data.table if<br>>>> > possible.<br>>>><br>>>> What if you try this (using my `dt` example from above):<br>>>><br>>>> R> dt[, paste(coursecode, collapse=","), by=student_id]<br>>>> student_id V1<br>>>> [1,] 1 NA,NA,NA,101,102<br>>>> [2,] 2 101,102,103<br>>>><br>>>> Note that each element in the $V1 column is a character vector of<br>>>> length 1 and not individual course codes.<br>>>><br>>>> Without using the `collapse` argument to your call to paste, you just<br>>>> get a character vector which is the same length as you passed in, eg:<br>>>><br>>>> R> paste(c('A', 'B', NA, 'C'))<br>>>> [1] "A" "B" "NA" "C"<br>>>><br>>>> vs.<br>>>><br>>>> R> paste(c('A', 'B', NA, 'C'), collapse=",")<br>>>> [1] "A,B,NA,C"<br>>>><br>>>> HTH,<br>>>><br>>>> -steve<br>>>><br>>>> --<br>>>> Steve Lianoglou<br>>>> Graduate Student: Computational Systems Biology<br>>>> | Memorial Sloan-Kettering Cancer Center<br>>>> | Weill Medical College of Cornell University<br>>>> Contact Info: <a href="http://cbio.mskcc.org/~lianos/contact" target="_blank">http://cbio.mskcc.org/~lianos/contact</a><br>>><br>><br>><br><br><br><o:p></o:p></p></div></div><p class=MsoNormal>--<o:p></o:p></p><div><div><p class=MsoNormal>Steve Lianoglou<br>Graduate Student: Computational Systems Biology<br> | Memorial Sloan-Kettering Cancer Center<br> | Weill Medical College of Cornell University<br>Contact Info: <a href="http://cbio.mskcc.org/~lianos/contact" target="_blank">http://cbio.mskcc.org/~lianos/contact</a><o:p></o:p></p></div></div></div><p class=MsoNormal><o:p> </o:p></p></div></div></div></body></html>