[datatable-help] Using data.table to run a function on every row
Stian Håklev
shaklev at gmail.com
Fri Sep 27 05:16:11 CEST 2013
I'm trying to run a function on every row fulfilling a certain criterium,
which returns a data frame - the idea is then to take the list of data
frames and rbindlist them together for a totally separate data.table. (I'm
extracting several URL links from each forum post, and tagging them with
the forum post they came from).
I tried doing this with a data.table
a <- db[has_url == T, getUrls(text, id)]
and get the message
Error in `$<-.data.frame`(`*tmp*`, "id", value = c(1L, 6L, 1L, 2L, 4L, :
replacement has 11007 rows, data has 29787
Because some rows have several URLs... However, I don't care that these
rowlengths don't match, I still want these rows :) I thought J would just
let me execute arbitrary R code in the context of the rows as variable
names, etc.
Here's the function it's running, but that shouldn't be relevant
getUrls <- function(text, id) {
matches <- str_match_all(text, url_pattern)
a <- data.frame(urls=unlist(matches))
a$id <- id
a
}
Thanks, and thanks for an amazing package - data.table has made my life so
much easier. It should be part of base, I think.
Stian Haklev, University of Toronto
--
http://reganmian.net/blog -- Random Stuff that Matters
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/datatable-help/attachments/20130926/1413176b/attachment.html>
More information about the datatable-help
mailing list