[datatable-help] Using data.table to run a function on every row

Stian Håklev shaklev at gmail.com
Fri Sep 27 05:16:11 CEST 2013


I'm trying to run a function on every row fulfilling a certain criterium,
which returns a data frame - the idea is then to take the list of data
frames and rbindlist them together for a totally separate data.table. (I'm
extracting several URL links from each forum post, and tagging them with
the forum post they came from).

I tried doing this with a data.table

a <- db[has_url == T, getUrls(text, id)]

and get the message

Error in `$<-.data.frame`(`*tmp*`, "id", value = c(1L, 6L, 1L, 2L, 4L,  :
  replacement has 11007 rows, data has 29787

Because some rows have several URLs... However, I don't care that these
rowlengths don't match, I still want these rows :) I thought J would just
let me execute arbitrary R code in the context of the rows as variable
names, etc.

Here's the function it's running, but that shouldn't be relevant

getUrls <- function(text, id) {
  matches <- str_match_all(text, url_pattern)
  a <- data.frame(urls=unlist(matches))
  a$id <- id
  a
}


Thanks, and thanks for an amazing package - data.table has made my life so
much easier. It should be part of base, I think.
Stian Haklev, University of Toronto

-- 
http://reganmian.net/blog -- Random Stuff that Matters
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/datatable-help/attachments/20130926/1413176b/attachment.html>


More information about the datatable-help mailing list