<html>
<head>
<meta content="text/html; charset=ISO-8859-1"
http-equiv="Content-Type">
</head>
<body bgcolor="#FFFFFF" text="#000000">
<div class="moz-cite-prefix"><br>
That was my thought too. I don't know what str_match_all is, but
given the unlist() in getUrls(), it seems to return a list.
Rather than unlist(), leave it as list, and data.table should
happily make a `list` column where each cell is itself a vector.
In fact each cell can be anything at all, even embedded
data.table, function definitions, or any type of object.<br>
You might need a list(list(str_match_all(...))) in j to do that.<br>
<br>
Or what Rick has suggested here might work first time. It's hard
to visualise it without a small reproducible example, so we're
having to make educated guesses.<br>
<br>
Many thanks for the kind words about data.table.<br>
<br>
Matthew<br>
<br>
<br>
On 27/09/13 07:44, Ricardo Saporta wrote:<br>
</div>
<blockquote
cite="mid:CAE7Aa4R-_DOkzC3JuJ-nbMFSQ5GLTij5rX3jEHHyjp8wL_YCwg@mail.gmail.com"
type="cite">
<div dir="ltr">In fact, you should be able to skip the function
altogether and just use:
<div><br>
</div>
<div> db[ (has_url), str_match_all(text, url_pattern), by=id]<br>
</div>
<div><br>
</div>
<div><br>
</div>
<div>(and now, my apologies to all for the email clutter)</div>
<div>good night</div>
<div class="gmail_extra"><br>
<div class="gmail_quote">On Fri, Sep 27, 2013 at 2:41 AM,
Ricardo Saporta <span dir="ltr"><<a
moz-do-not-send="true"
href="mailto:saporta@scarletmail.rutgers.edu"
target="_blank">saporta@scarletmail.rutgers.edu</a>></span>
wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0
.8ex;border-left:1px #ccc solid;padding-left:1ex">
<div dir="ltr">sorry, I probably should have elaborated
(it's late here, in NJ)
<div><br>
</div>
<div>The error you are seeing is most likely coming from
your getURL function in that you are adding several
ids to a data.frame of varying rows, and `R` cannot
recycle it correctly. </div>
<div><br>
</div>
<div>If you instead breakdown by id, then each time you
are only assigning one id and R will be able to
recycle appropriately, without issue. </div>
<div><br>
</div>
<div>good luck! </div>
<div>Rick</div>
<div>
<br>
</div>
</div>
<div class="gmail_extra">
<div class="im"><br clear="all">
<div>
<div
style="color:rgb(34,34,34);font-size:13px;font-family:arial,sans-serif">
<div style="font-size:13px">Ricardo Saporta</div>
<div style="font-size:13px">
Graduate Student, Data Analytics</div>
<div style="font-size:13px"><span
style="font-size:13px">Rutgers University, New
Jersey</span></div>
<div style="font-size:13px"><span
style="font-size:13px">e: </span><a
moz-do-not-send="true"
href="mailto:saporta@rutgers.edu"
style="color:rgb(17,85,204);font-size:13px"
target="_blank">saporta@rutgers.edu</a></div>
<div><br>
</div>
</div>
</div>
<br>
<br>
</div>
<div>
<div class="h5">
<div class="gmail_quote">On Fri, Sep 27, 2013 at
2:37 AM, Ricardo Saporta <span dir="ltr"><<a
moz-do-not-send="true"
href="mailto:saporta@scarletmail.rutgers.edu"
target="_blank">saporta@scarletmail.rutgers.edu</a>></span>
wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0
0 .8ex;border-left:1px #ccc
solid;padding-left:1ex">
<div dir="ltr">Hi there,
<div><br>
</div>
<div>Try inserting a `by=id` in </div>
<div><br>
</div>
<div> <span
style="font-family:arial,sans-serif;font-size:13px">a
<- db[(has_url), getUrls(text, id),
by=id]</span></div>
<div>
<span
style="font-family:arial,sans-serif;font-size:13px"><br>
</span></div>
<div><span
style="font-family:arial,sans-serif;font-size:13px">Also,
no need for "</span><span
style="font-family:arial,sans-serif;font-size:13px">has_url
== T"</span></div>
<div><span
style="font-family:arial,sans-serif;font-size:13px">instead,
use </span></div>
<div><span
style="font-family:arial,sans-serif;font-size:13px">
(</span><span
style="font-family:arial,sans-serif;font-size:13px">has_url) </span></div>
<div><span
style="font-family:arial,sans-serif;font-size:13px">If
the variable is alread logical.
(Otherwise, you are just slowing things
down ;) </span></div>
<div><span
style="font-family:arial,sans-serif;font-size:13px"><br>
</span></div>
<div><span
style="font-family:arial,sans-serif;font-size:13px"><br>
</span></div>
</div>
<div class="gmail_extra"><br clear="all">
<div>
<div
style="color:rgb(34,34,34);font-size:13px;font-family:arial,sans-serif">
<div style="font-size:13px">Ricardo
Saporta</div>
<div style="font-size:13px">Graduate
Student, Data Analytics</div>
<div style="font-size:13px"><span
style="font-size:13px">Rutgers
University, New Jersey</span></div>
<div style="font-size:13px">
<span style="font-size:13px">e: </span><a
moz-do-not-send="true"
href="mailto:saporta@rutgers.edu"
style="color:rgb(17,85,204);font-size:13px"
target="_blank">saporta@rutgers.edu</a></div>
<div><br>
</div>
</div>
</div>
<br>
<br>
<div class="gmail_quote">
<div>
<div>On Thu, Sep 26, 2013 at 11:16 PM,
Stian Håklev <span dir="ltr"><<a
moz-do-not-send="true"
href="mailto:shaklev@gmail.com"
target="_blank">shaklev@gmail.com</a>></span>
wrote:<br>
</div>
</div>
<blockquote class="gmail_quote"
style="margin:0 0 0 .8ex;border-left:1px
#ccc solid;padding-left:1ex">
<div>
<div>
<div dir="ltr">I'm trying to run a
function on every row fulfilling a
certain criterium, which returns a
data frame - the idea is then to
take the list of data frames and
rbindlist them together for a
totally separate data.table. (I'm
extracting several URL links from
each forum post, and tagging them
with the forum post they came
from).
<div>
<br>
</div>
<div>I tried doing this with a
data.table</div>
<div><br>
</div>
<div>a <- db[has_url == T,
getUrls(text, id)]</div>
<div><br>
</div>
<div>and get the message</div>
<div><br>
</div>
<div>
<div>Error in
`$<-.data.frame`(`*tmp*`,
"id", value = c(1L, 6L, 1L, 2L,
4L, : </div>
<div> replacement has 11007 rows,
data has 29787 </div>
</div>
<div><br>
</div>
<div>Because some rows have several
URLs... However, I don't care that
these rowlengths don't match, I
still want these rows :) I thought
J would just let me execute
arbitrary R code in the context of
the rows as variable names, etc. </div>
<div><br>
</div>
<div>Here's the function it's
running, but that shouldn't be
relevant</div>
<div><br>
</div>
<div>
<div>getUrls <- function(text,
id) {</div>
<div> matches <-
str_match_all(text, url_pattern)</div>
<div> a <-
data.frame(urls=unlist(matches))</div>
<div> a$id <- id</div>
<div> a</div>
<div>}</div>
<div><br>
</div>
<div><br>
</div>
<div>Thanks, and thanks for an
amazing package - data.table has
made my life so much easier. It
should be part of base, I think.</div>
<div>Stian Haklev, University of
Toronto</div>
</div>
<span><font color="#888888">
<div>
<div><br>
</div>
-- <br>
<a moz-do-not-send="true"
href="http://reganmian.net/blog"
target="_blank">http://reganmian.net/blog</a>
-- Random Stuff that Matters<br>
</div>
</font></span></div>
<br>
</div>
</div>
_______________________________________________<br>
datatable-help mailing list<br>
<a moz-do-not-send="true"
href="mailto:datatable-help@lists.r-forge.r-project.org"
target="_blank">datatable-help@lists.r-forge.r-project.org</a><br>
<a moz-do-not-send="true"
href="https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help"
target="_blank">https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help</a><br>
</blockquote>
</div>
<br>
</div>
</blockquote>
</div>
<br>
</div>
</div>
</div>
</blockquote>
</div>
<br>
</div>
</div>
<br>
<fieldset class="mimeAttachmentHeader"></fieldset>
<br>
<pre wrap="">_______________________________________________
datatable-help mailing list
<a class="moz-txt-link-abbreviated" href="mailto:datatable-help@lists.r-forge.r-project.org">datatable-help@lists.r-forge.r-project.org</a>
<a class="moz-txt-link-freetext" href="https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help">https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help</a></pre>
</blockquote>
<br>
</body>
</html>