<html>
<head>
<meta content="text/html; charset=ISO-8859-1"
http-equiv="Content-Type">
</head>
<body bgcolor="#FFFFFF" text="#000000">
<div class="moz-cite-prefix"><br>
Stian,<br>
<br>
datatable-help isn't really for this kind of question. It's a
very good question and belongs on S.O. where you can edit it given
comments. datatable-help is more for discussion about future
developments, notices, things that aren't allowed on S.O., etc.<br>
<br>
This was your example : <br>
<br>
> a <- c(1,2,3)<br>
> b <- c(2,3,4)<br>
> dt <- data.table(names=c("Stian", "Christian", "John"),
numbers=list(a,b, NULL))<br>
<br>
The output of that is :<br>
<br>
> dt<br>
names numbers<br>
1: Stian 1,2,3<br>
2: Christian 2,3,4<br>
3: John <br>
<br>
Are you possibly mistaken about the output of list columns? Those
commas are just how it displays. They aren't strings in the
numbers column. The `numbers` column is a list column where each
item is a vector.<br>
<br>
To get the output you asked for it's just : <br>
<br>
> dt[,unlist(numbers),by=names]<br>
names V1<br>
1: Stian 1<br>
2: Stian 2<br>
3: Stian 3<br>
4: Christian 2<br>
5: Christian 3<br>
6: Christian 4<br>
> <br>
<br>
If I've misunderstood, then please start again with a new
question on S.O.<br>
<br>
<meta http-equiv="content-type" content="text/html;
charset=ISO-8859-1">
<a href="http://stackoverflow.com/questions/tagged/data.table">http://stackoverflow.com/questions/tagged/data.table</a><br>
<br>
Thanks,<br>
Matthew<br>
<br>
<br>
<br>
<br>
On 27/09/13 18:25, Ricardo Saporta wrote:<br>
</div>
<blockquote
cite="mid:CAE7Aa4Qd3oK6PP5JzxEo8=7RYk-8=2m8S03c7-k24FnefNBN=g@mail.gmail.com"
type="cite">
<div dir="ltr">hm... not sure about `j` (sorry, I havent taken a
close look at your code), but my comment was to point out that
these two statements are different:
<div><br>
<div> DT [ TRUE, ] </div>
<div> DT [ .(TRUE), ]<br>
</div>
</div>
<div><br>
</div>
<div>The first one is giving you the whole data.table </div>
<div> DT[TRUE, ] is the same as DT</div>
<div>(since TRUE is getting recycled)</div>
<div><br>
</div>
<div>The second one is giving you all rows within DT where the
first column of the key has a value of TRUE. </div>
<div><br>
</div>
<div><br>
</div>
</div>
<div class="gmail_extra"><br clear="all">
<div>
<div
style="color:rgb(34,34,34);font-family:arial,sans-serif;font-size:13px;background-color:rgb(255,255,255)">
<div style="font-size:13px">
Ricardo Saporta</div>
<div style="font-size:13px">Graduate Student, Data Analytics</div>
<div style="font-size:13px"><span style="font-size:13px">Rutgers
University, New Jersey</span></div>
<div style="font-size:13px"><span style="font-size:13px">e: </span><a
moz-do-not-send="true" href="mailto:saporta@rutgers.edu"
style="color:rgb(17,85,204);font-size:13px"
target="_blank">saporta@rutgers.edu</a></div>
<div><br>
</div>
</div>
</div>
<br>
<br>
<div class="gmail_quote">On Fri, Sep 27, 2013 at 12:20 PM, Stian
Håklev <span dir="ltr"><<a moz-do-not-send="true"
href="mailto:shaklev@gmail.com" target="_blank">shaklev@gmail.com</a>></span>
wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0
.8ex;border-left:1px #ccc solid;padding-left:1ex">
<div dir="ltr">
<div class="im">
<div>> system.time( db[T, matches :=
str_match_all(text, url_pattern)] )</div>
<div> user system elapsed </div>
</div>
<div> 19.610 0.475 20.304 </div>
<div>> system.time( db[.(T), matches :=
str_match_all(text, url_pattern)] )</div>
<div>Error in `[.data.table`(db, .(T), `:=`(matches,
str_match_all(text, url_pattern))) : </div>
<div> All items in j=list(...) should be atomic vectors
or lists. If you are trying something like
j=list(.SD,newcol=mean(colA)) then use := by group
instead (much quicker), or cbind or merge afterwards.</div>
<div>Timing stopped at: 6.339 0.043 6.403 </div>
</div>
<div class="HOEnZb">
<div class="h5">
<div class="gmail_extra"><br>
<br>
<div class="gmail_quote">On Fri, Sep 27, 2013 at 11:48
AM, Ricardo Saporta <span dir="ltr"><<a
moz-do-not-send="true"
href="mailto:saporta@scarletmail.rutgers.edu"
target="_blank">saporta@scarletmail.rutgers.edu</a>></span>
wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0
.8ex;border-left:1px #ccc solid;padding-left:1ex">
<div dir="ltr">Hi Stian,
<div><br>
</div>
<div>Try the following two and look at the
difference: </div>
<div><br>
</div>
<div>
<div
style="font-family:arial,sans-serif;font-size:13px">
<div
style="font-family:arial;font-size:small">
<div>
<div
style="font-family:arial,sans-serif;font-size:13px"> db[T,
matches := str_match_all(text,
url_pattern)]</div>
</div>
<div>
<div>
<div
style="font-family:arial,sans-serif;font-size:13px"> db[.(T),
matches := str_match_all(text,
url_pattern)]</div>
</div>
</div>
<div
style="font-family:arial,sans-serif;font-size:13px"><br>
</div>
<div
style="font-family:arial,sans-serif;font-size:13px">;) </div>
<div
style="font-family:arial,sans-serif;font-size:13px"><br>
</div>
</div>
</div>
</div>
<div>
<div>
<div class="gmail_extra">
<br>
<br>
<div class="gmail_quote">On Fri, Sep 27,
2013 at 11:21 AM, Stian Håklev <span
dir="ltr"><<a
moz-do-not-send="true"
href="mailto:shaklev@gmail.com"
target="_blank">shaklev@gmail.com</a>></span>
wrote:<br>
<blockquote class="gmail_quote"
style="margin:0 0 0
.8ex;border-left:1px #ccc
solid;padding-left:1ex">
<div dir="ltr">I really appreciate all
your help - amazingly supportive
community. I could probably figure
out a "brute-force" way of doing
things, but since I'm going to be
writing a lot of R in the future
too, I always want to find the
"correct" way of doing it, which
both looks clear, and is quick. (I
come from a background in Ruby, and
am always interested in writing very
clear and DRY (do not repeat
yourself) code, but I find I still
spend a lot of time in R struggling
with various data formats - lists,
nested lists, vectors, matrices,
different forms of apply/ddply/for
loops etc).
<div>
<br>
</div>
<div>Anyway, a few different points.</div>
<div><br>
</div>
<div>I tried db[has_url,], but got
"object has_url not found"</div>
<div><br>
</div>
<div>I then tried setkey(db,
"has_url"), and using this, but
somehow it was a lot slower than
what I used to do (I repeated a
few times). Not sure if I'm doing
it wrong. (Not important - even 15
sec is totally fine, I'll only run
this once. But good to understand
the underlying principles).</div>
<div><br>
</div>
<div>
<div>setkey(db, "has_url")</div>
<div>> system.time( db[T,
matches := str_match_all(text,
url_pattern)] )</div>
<div> user system elapsed </div>
<div> 17.514 0.334 17.847 </div>
<div>> system.time( db[has_url
== T, matches :=
str_match_all(text,
url_pattern)] )</div>
<div> user system elapsed </div>
<div> 5.943 0.040 5.984 </div>
</div>
<div><br>
</div>
<div>The second point was how to get
out the matches. The idea was that
you have a text field which might
contain several urls, which I want
to extract, but I need each URL
tagged with the row it came from
(so I can link it back to
properties of the post and author,
look at whether certain students
are more likely to post certain
kinds of URLs etc).</div>
<div><br>
</div>
<div>Instead of a function, you'll
see above that I rewrote it to use
:=, which creates a new column
that holds a list. That worked
wonderfully, but now how do I get
these "out" of this data.table,
and into a new one.</div>
<div><br>
</div>
<div>Made-up example data:</div>
<div>
<div>a <- c(1,2,3)</div>
<div>b <- c(2,3,4)</div>
<div>dt <-
data.table(names=c("Stian",
"Christian", "John"),
numbers=list(a,b, NULL))</div>
<div><br>
</div>
<div>Now my goal is to have a new
data.table that looks like this</div>
<div>
<div>
<div>Name <span
style="white-space:pre-wrap">
</span>Number</div>
<div>Stian <span
style="white-space:pre-wrap">
</span>1</div>
<div>Stian <span
style="white-space:pre-wrap">
</span>2</div>
<div>Stian <span
style="white-space:pre-wrap">
</span>3</div>
<div>Christian <span
style="white-space:pre-wrap">
</span>2</div>
<div>Christian <span
style="white-space:pre-wrap">
</span>3</div>
<div>Christian <span
style="white-space:pre-wrap">
</span>4</div>
</div>
<div><br>
</div>
</div>
<div>Again, I'm sure I could do
this with a for() or lapply? But
I'd love to see the most elegant
solution.</div>
<div><br>
</div>
<div>Note that this:</div>
<div><br>
</div>
<div>
<div>
<div>getUrls <-
function(text, id) {</div>
<div> matches <-
str_match_all(text,
url_pattern)</div>
</div>
<div>
data.frame(urls=unlist(matches),
id=id)</div>
<div>}</div>
<div><br>
</div>
<div>system.time( a <-
db[(has_url), getUrls(text,
id), by=id] )</div>
<div><br>
</div>
<div>Works perfectly, the result
is</div>
<div>
<table
style="font-family:Times"
border="1">
<tbody>
<tr>
<th><br>
</th>
<th>id</th>
<th>urls</th>
<th>id</th>
</tr>
<tr>
<td align="right">1</td>
<td align="right">16</td>
<td><a
moz-do-not-send="true"
href="https://class.coursera.org/aboriginaled-001/forum/thread?thread_id=166"
target="_blank">https://class.coursera.org/aboriginaled-001/forum/thread?thread_id=166</a></td>
<td align="right">16</td>
</tr>
<tr>
<td align="right">2</td>
<td align="right">24</td>
<td><a
moz-do-not-send="true"
href="http://www.youtube.com/watch?v=JUiGF4TGI9w" target="_blank">http://www.youtube.com/watch?v=JUiGF4TGI9w</a></td>
<td align="right">
24</td>
</tr>
<tr>
<td align="right">3</td>
<td align="right">44</td>
<td><a
moz-do-not-send="true"
href="http://www.cbc.ca/revisionquest/blog/2010/07/21/july-21-july-24-the-metis-keeping-it-riel/"
target="_blank">http://www.cbc.ca/revisionquest/blog/2010/07/21/july-21-july-24-the-metis-keeping-it-riel/</a></td>
<td align="right">44</td>
</tr>
<tr>
<td align="right">4</td>
<td align="right">61</td>
<td><a
moz-do-not-send="true"
href="http://www.support-native-american-art.com/Native-American-Medicine-Wheels.html"
target="_blank">http://www.support-native-american-art.com/Native-American-Medicine-Wheels.html</a></td>
<td align="right">61</td>
</tr>
<tr>
<td align="right">5</td>
<td align="right">75</td>
<td><a
moz-do-not-send="true"
href="http://indigenousfoundations.arts.ubc.ca/home/government-policy/the-residential-school-system.html"
target="_blank">http://indigenousfoundations.arts.ubc.ca/home/government-policy/the-residential-school-system.html</a></td>
<td align="right">75</td>
</tr>
<tr>
<td align="right">6</td>
<td align="right">75</td>
<td><a
moz-do-not-send="true"
href="https://www.facebook.com/photo.php?fbid=10151324672623754"
target="_blank">https://www.facebook.com/photo.php?fbid=10151324672623754</a></td>
<td align="right">75</td>
</tr>
</tbody>
</table>
</div>
<div><br>
</div>
<div>which is exactly what I was
looking for. So I've really
reached my goal, but I'm
curious about the other method
as well.</div>
</div>
<div>
<br>
</div>
<div>Thanks!<span><font
color="#888888"><br>
Stian</font></span></div>
</div>
</div>
<div class="gmail_extra"><br>
<br>
<div class="gmail_quote">
<div>On Fri, Sep 27, 2013 at 8:48
AM, Matthew Dowle <span
dir="ltr"><<a
moz-do-not-send="true"
href="mailto:mdowle@mdowle.plus.com"
target="_blank">mdowle@mdowle.plus.com</a>></span>
wrote:<br>
</div>
<div>
<div>
<blockquote
class="gmail_quote"
style="margin:0 0 0
.8ex;border-left:1px #ccc
solid;padding-left:1ex">
<div bgcolor="#FFFFFF"
text="#000000">
<div><br>
That was my thought
too. I don't know what
str_match_all is, but
given the unlist() in
getUrls(), it seems to
return a list. Rather
than unlist(), leave it
as list, and data.table
should happily make a
`list` column where each
cell is itself a
vector. In fact each
cell can be anything at
all, even embedded
data.table, function
definitions, or any type
of object.<br>
You might need a
list(list(str_match_all(...)))
in j to do that.<br>
<br>
Or what Rick has
suggested here might
work first time. It's
hard to visualise it
without a small
reproducible example, so
we're having to make
educated guesses.<br>
<br>
Many thanks for the kind
words about data.table.<span><font
color="#888888"><br>
<br>
Matthew</font></span>
<div>
<div><br>
<br>
<br>
On 27/09/13 07:44,
Ricardo Saporta
wrote:<br>
</div>
</div>
</div>
<div>
<div>
<blockquote
type="cite">
<div dir="ltr">In
fact, you should
be able to skip
the function
altogether and
just use:
<div><br>
</div>
<div> db[
(has_url),
str_match_all(text,
url_pattern),
by=id]<br>
</div>
<div><br>
</div>
<div><br>
</div>
<div>(and now, my
apologies to all
for the email
clutter)</div>
<div>good night</div>
<div
class="gmail_extra"><br>
<div
class="gmail_quote">On
Fri, Sep 27,
2013 at 2:41
AM, Ricardo
Saporta <span
dir="ltr"><<a
moz-do-not-send="true" href="mailto:saporta@scarletmail.rutgers.edu"
target="_blank">saporta@scarletmail.rutgers.edu</a>></span>
wrote:<br>
<blockquote
class="gmail_quote"
style="margin:0
0 0
.8ex;border-left:1px
#ccc
solid;padding-left:1ex">
<div dir="ltr">sorry,
I probably
should have
elaborated
(it's late
here, in NJ)
<div><br>
</div>
<div>The error
you are seeing
is most likely
coming from
your getURL
function in
that you are
adding several
ids to a
data.frame of
varying rows,
and `R` cannot
recycle it
correctly. </div>
<div><br>
</div>
<div>If you
instead
breakdown by
id, then each
time you are
only assigning
one id and R
will be able
to recycle
appropriately,
without
issue. </div>
<div><br>
</div>
<div>good
luck! </div>
<div>Rick</div>
<div> <br>
</div>
</div>
<div
class="gmail_extra">
<div><br
clear="all">
<div>
<div
style="color:rgb(34,34,34);font-size:13px;font-family:arial,sans-serif">
<div
style="font-size:13px">Ricardo
Saporta</div>
<div
style="font-size:13px">
Graduate
Student, Data
Analytics</div>
<div
style="font-size:13px"><span
style="font-size:13px">Rutgers University, New Jersey</span></div>
<div
style="font-size:13px"><span
style="font-size:13px">e: </span><a moz-do-not-send="true"
href="mailto:saporta@rutgers.edu"
style="color:rgb(17,85,204);font-size:13px" target="_blank">saporta@rutgers.edu</a></div>
<div><br>
</div>
</div>
</div>
<br>
<br>
</div>
<div>
<div>
<div
class="gmail_quote">On
Fri, Sep 27,
2013 at 2:37
AM, Ricardo
Saporta <span
dir="ltr"><<a
moz-do-not-send="true" href="mailto:saporta@scarletmail.rutgers.edu"
target="_blank">saporta@scarletmail.rutgers.edu</a>></span>
wrote:<br>
<blockquote
class="gmail_quote"
style="margin:0
0 0
.8ex;border-left:1px
#ccc
solid;padding-left:1ex">
<div dir="ltr">Hi
there,
<div><br>
</div>
<div>Try
inserting a
`by=id` in </div>
<div><br>
</div>
<div> <span
style="font-family:arial,sans-serif;font-size:13px">a <-
db[(has_url),
getUrls(text,
id), by=id]</span></div>
<div> <span
style="font-family:arial,sans-serif;font-size:13px"><br>
</span></div>
<div><span
style="font-family:arial,sans-serif;font-size:13px">Also,
no need for "</span><span
style="font-family:arial,sans-serif;font-size:13px">has_url == T"</span></div>
<div><span
style="font-family:arial,sans-serif;font-size:13px">instead,
use </span></div>
<div><span
style="font-family:arial,sans-serif;font-size:13px">
(</span><span
style="font-family:arial,sans-serif;font-size:13px">has_url) </span></div>
<div><span
style="font-family:arial,sans-serif;font-size:13px">If
the variable
is alread
logical.
(Otherwise,
you are just
slowing things
down ;) </span></div>
<div><span
style="font-family:arial,sans-serif;font-size:13px"><br>
</span></div>
<div><span
style="font-family:arial,sans-serif;font-size:13px"><br>
</span></div>
</div>
<div
class="gmail_extra"><br
clear="all">
<div>
<div
style="color:rgb(34,34,34);font-size:13px;font-family:arial,sans-serif">
<div
style="font-size:13px">Ricardo
Saporta</div>
<div
style="font-size:13px">Graduate
Student, Data
Analytics</div>
<div
style="font-size:13px"><span
style="font-size:13px">Rutgers University, New Jersey</span></div>
<div
style="font-size:13px">
<span
style="font-size:13px">e: </span><a
moz-do-not-send="true" href="mailto:saporta@rutgers.edu"
style="color:rgb(17,85,204);font-size:13px"
target="_blank">saporta@rutgers.edu</a></div>
<div><br>
</div>
</div>
</div>
<br>
<br>
<div
class="gmail_quote">
<div>
<div>On Thu,
Sep 26, 2013
at 11:16 PM,
Stian Håklev <span
dir="ltr"><<a
moz-do-not-send="true" href="mailto:shaklev@gmail.com" target="_blank">shaklev@gmail.com</a>></span>
wrote:<br>
</div>
</div>
<blockquote
class="gmail_quote"
style="margin:0
0 0
.8ex;border-left:1px
#ccc
solid;padding-left:1ex">
<div>
<div>
<div dir="ltr">I'm
trying to run
a function on
every row
fulfilling a
certain
criterium,
which returns
a data frame -
the idea is
then to take
the list of
data frames
and rbindlist
them together
for a totally
separate
data.table.
(I'm
extracting
several URL
links from
each forum
post, and
tagging them
with the forum
post they came
from).
<div> <br>
</div>
<div>I tried
doing this
with a
data.table</div>
<div><br>
</div>
<div>a <-
db[has_url ==
T,
getUrls(text,
id)]</div>
<div><br>
</div>
<div>and get
the message</div>
<div><br>
</div>
<div>
<div>Error in
`$<-.data.frame`(`*tmp*`,
"id", value =
c(1L, 6L, 1L,
2L, 4L, : </div>
<div>
replacement
has 11007
rows, data has
29787 </div>
</div>
<div><br>
</div>
<div>Because
some rows have
several
URLs...
However, I
don't care
that these
rowlengths
don't match, I
still want
these rows :)
I thought J
would just let
me execute
arbitrary R
code in the
context of the
rows as
variable
names, etc. </div>
<div><br>
</div>
<div>Here's
the function
it's running,
but that
shouldn't be
relevant</div>
<div><br>
</div>
<div>
<div>getUrls
<-
function(text,
id) {</div>
<div> matches
<-
str_match_all(text,
url_pattern)</div>
<div> a <-
data.frame(urls=unlist(matches))</div>
<div> a$id
<- id</div>
<div> a</div>
<div>}</div>
<div><br>
</div>
<div><br>
</div>
<div>Thanks,
and thanks for
an amazing
package -
data.table has
made my life
so much
easier. It
should be part
of base, I
think.</div>
<div>Stian
Haklev,
University of
Toronto</div>
</div>
<span><font
color="#888888">
<div>
<div><br>
</div>
-- <br>
<a
moz-do-not-send="true"
href="http://reganmian.net/blog" target="_blank">http://reganmian.net/blog</a>
-- Random
Stuff that
Matters<br>
</div>
</font></span></div>
<br>
</div>
</div>
_______________________________________________<br>
datatable-help
mailing list<br>
<a
moz-do-not-send="true"
href="mailto:datatable-help@lists.r-forge.r-project.org" target="_blank">datatable-help@lists.r-forge.r-project.org</a><br>
<a
moz-do-not-send="true"
href="https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help"
target="_blank">https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help</a><br>
</blockquote>
</div>
<br>
</div>
</blockquote>
</div>
<br>
</div>
</div>
</div>
</blockquote>
</div>
<br>
</div>
</div>
<br>
<fieldset></fieldset>
<br>
<pre>_______________________________________________
datatable-help mailing list
<a moz-do-not-send="true" href="mailto:datatable-help@lists.r-forge.r-project.org" target="_blank">datatable-help@lists.r-forge.r-project.org</a>
<a moz-do-not-send="true" href="https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help" target="_blank">https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help</a></pre>
</blockquote>
<br>
</div>
</div>
</div>
</blockquote>
</div>
</div>
</div>
<div>
<div><br>
<br clear="all">
<div><br>
</div>
-- <br>
<a moz-do-not-send="true"
href="http://reganmian.net/blog"
target="_blank">http://reganmian.net/blog</a>
-- Random Stuff that Matters<br>
</div>
</div>
</div>
</blockquote>
</div>
<br>
</div>
</div>
</div>
</div>
</blockquote>
</div>
<br>
<br clear="all">
<div><br>
</div>
-- <br>
<a moz-do-not-send="true"
href="http://reganmian.net/blog" target="_blank">http://reganmian.net/blog</a>
-- Random Stuff that Matters<br>
</div>
</div>
</div>
</blockquote>
</div>
<br>
</div>
<br>
<fieldset class="mimeAttachmentHeader"></fieldset>
<br>
<pre wrap="">_______________________________________________
datatable-help mailing list
<a class="moz-txt-link-abbreviated" href="mailto:datatable-help@lists.r-forge.r-project.org">datatable-help@lists.r-forge.r-project.org</a>
<a class="moz-txt-link-freetext" href="https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help">https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help</a></pre>
</blockquote>
<br>
</body>
</html>