<div dir="ltr">In fact, you should be able to skip the function altogether and just use: <div><br></div><div> db[ (has_url), str_match_all(text, url_pattern), by=id]<br></div><div><br></div><div><br></div><div>(and now, my apologies to all for the email clutter)</div>
<div>good night</div><div class="gmail_extra"><br><div class="gmail_quote">On Fri, Sep 27, 2013 at 2:41 AM, Ricardo Saporta <span dir="ltr"><<a href="mailto:saporta@scarletmail.rutgers.edu" target="_blank">saporta@scarletmail.rutgers.edu</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr">sorry, I probably should have elaborated (it's late here, in NJ)<div><br></div><div>The error you are seeing is most likely coming from your getURL function in that you are adding several ids to a data.frame of varying rows, and `R` cannot recycle it correctly. </div>
<div><br></div><div>If you instead breakdown by id, then each time you are only assigning one id and R will be able to recycle appropriately, without issue. </div><div><br></div><div>good luck! </div><div>Rick</div><div>
<br>
</div></div><div class="gmail_extra"><div class="im"><br clear="all"><div><div style="color:rgb(34,34,34);font-size:13px;font-family:arial,sans-serif"><div style="font-size:13px">Ricardo Saporta</div><div style="font-size:13px">
Graduate Student, Data Analytics</div><div style="font-size:13px"><span style="font-size:13px">Rutgers University, New Jersey</span></div><div style="font-size:13px"><span style="font-size:13px">e: </span><a href="mailto:saporta@rutgers.edu" style="color:rgb(17,85,204);font-size:13px" target="_blank">saporta@rutgers.edu</a></div>
<div><br></div></div></div>
<br><br></div><div><div class="h5"><div class="gmail_quote">On Fri, Sep 27, 2013 at 2:37 AM, Ricardo Saporta <span dir="ltr"><<a href="mailto:saporta@scarletmail.rutgers.edu" target="_blank">saporta@scarletmail.rutgers.edu</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr">Hi there, <div><br></div><div>Try inserting a `by=id` in </div><div><br></div><div> <span style="font-family:arial,sans-serif;font-size:13px">a <- db[(has_url), getUrls(text, id), by=id]</span></div>
<div>
<span style="font-family:arial,sans-serif;font-size:13px"><br></span></div><div><span style="font-family:arial,sans-serif;font-size:13px">Also, no need for "</span><span style="font-family:arial,sans-serif;font-size:13px">has_url == T"</span></div>
<div><span style="font-family:arial,sans-serif;font-size:13px">instead, use </span></div><div><span style="font-family:arial,sans-serif;font-size:13px"> (</span><span style="font-family:arial,sans-serif;font-size:13px">has_url) </span></div>
<div><span style="font-family:arial,sans-serif;font-size:13px">If the variable is alread logical. (Otherwise, you are just slowing things down ;) </span></div><div><span style="font-family:arial,sans-serif;font-size:13px"><br>
</span></div><div><span style="font-family:arial,sans-serif;font-size:13px"><br></span></div></div><div class="gmail_extra"><br clear="all"><div><div style="color:rgb(34,34,34);font-size:13px;font-family:arial,sans-serif">
<div style="font-size:13px">Ricardo Saporta</div><div style="font-size:13px">Graduate Student, Data Analytics</div><div style="font-size:13px"><span style="font-size:13px">Rutgers University, New Jersey</span></div><div style="font-size:13px">
<span style="font-size:13px">e: </span><a href="mailto:saporta@rutgers.edu" style="color:rgb(17,85,204);font-size:13px" target="_blank">saporta@rutgers.edu</a></div><div><br></div></div></div>
<br><br><div class="gmail_quote"><div><div>On Thu, Sep 26, 2013 at 11:16 PM, Stian Håklev <span dir="ltr"><<a href="mailto:shaklev@gmail.com" target="_blank">shaklev@gmail.com</a>></span> wrote:<br></div>
</div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div><div>
<div dir="ltr">I'm trying to run a function on every row fulfilling a certain criterium, which returns a data frame - the idea is then to take the list of data frames and rbindlist them together for a totally separate data.table. (I'm extracting several URL links from each forum post, and tagging them with the forum post they came from). <div>
<br></div><div>I tried doing this with a data.table</div><div><br></div><div>a <- db[has_url == T, getUrls(text, id)]</div><div><br></div><div>and get the message</div><div><br></div><div><div>Error in `$<-.data.frame`(`*tmp*`, "id", value = c(1L, 6L, 1L, 2L, 4L, : </div>
<div> replacement has 11007 rows, data has 29787 </div></div><div><br></div><div>Because some rows have several URLs... However, I don't care that these rowlengths don't match, I still want these rows :) I thought J would just let me execute arbitrary R code in the context of the rows as variable names, etc. </div>
<div><br></div><div>Here's the function it's running, but that shouldn't be relevant</div><div><br></div><div><div>getUrls <- function(text, id) {</div><div> matches <- str_match_all(text, url_pattern)</div>
<div> a <- data.frame(urls=unlist(matches))</div><div> a$id <- id</div><div> a</div><div>}</div><div><br></div><div><br></div><div>Thanks, and thanks for an amazing package - data.table has made my life so much easier. It should be part of base, I think.</div>
<div>Stian Haklev, University of Toronto</div></div><span><font color="#888888"><div><div><br></div>-- <br><a href="http://reganmian.net/blog" target="_blank">http://reganmian.net/blog</a> -- Random Stuff that Matters<br>
</div></font></span></div>
<br></div></div>_______________________________________________<br>
datatable-help mailing list<br>
<a href="mailto:datatable-help@lists.r-forge.r-project.org" target="_blank">datatable-help@lists.r-forge.r-project.org</a><br>
<a href="https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help" target="_blank">https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help</a><br></blockquote></div><br></div>
</blockquote></div><br></div></div></div>
</blockquote></div><br></div></div>