[datatable-help] Using data.table to run a function on every row
Ricardo Saporta
saporta at scarletmail.rutgers.edu
Fri Sep 27 08:41:19 CEST 2013
sorry, I probably should have elaborated (it's late here, in NJ)
The error you are seeing is most likely coming from your getURL function in
that you are adding several ids to a data.frame of varying rows, and `R`
cannot recycle it correctly.
If you instead breakdown by id, then each time you are only assigning one
id and R will be able to recycle appropriately, without issue.
good luck!
Rick
Ricardo Saporta
Graduate Student, Data Analytics
Rutgers University, New Jersey
e: saporta at rutgers.edu
On Fri, Sep 27, 2013 at 2:37 AM, Ricardo Saporta <
saporta at scarletmail.rutgers.edu> wrote:
> Hi there,
>
> Try inserting a `by=id` in
>
> a <- db[(has_url), getUrls(text, id), by=id]
>
> Also, no need for "has_url == T"
> instead, use
> (has_url)
> If the variable is alread logical. (Otherwise, you are just slowing
> things down ;)
>
>
>
> Ricardo Saporta
> Graduate Student, Data Analytics
> Rutgers University, New Jersey
> e: saporta at rutgers.edu
>
>
>
> On Thu, Sep 26, 2013 at 11:16 PM, Stian Håklev <shaklev at gmail.com> wrote:
>
>> I'm trying to run a function on every row fulfilling a certain criterium,
>> which returns a data frame - the idea is then to take the list of data
>> frames and rbindlist them together for a totally separate data.table. (I'm
>> extracting several URL links from each forum post, and tagging them with
>> the forum post they came from).
>>
>> I tried doing this with a data.table
>>
>> a <- db[has_url == T, getUrls(text, id)]
>>
>> and get the message
>>
>> Error in `$<-.data.frame`(`*tmp*`, "id", value = c(1L, 6L, 1L, 2L, 4L, :
>> replacement has 11007 rows, data has 29787
>>
>> Because some rows have several URLs... However, I don't care that these
>> rowlengths don't match, I still want these rows :) I thought J would just
>> let me execute arbitrary R code in the context of the rows as variable
>> names, etc.
>>
>> Here's the function it's running, but that shouldn't be relevant
>>
>> getUrls <- function(text, id) {
>> matches <- str_match_all(text, url_pattern)
>> a <- data.frame(urls=unlist(matches))
>> a$id <- id
>> a
>> }
>>
>>
>> Thanks, and thanks for an amazing package - data.table has made my life
>> so much easier. It should be part of base, I think.
>> Stian Haklev, University of Toronto
>>
>> --
>> http://reganmian.net/blog -- Random Stuff that Matters
>>
>> _______________________________________________
>> datatable-help mailing list
>> datatable-help at lists.r-forge.r-project.org
>>
>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/datatable-help/attachments/20130927/9ebaa2f9/attachment.html>
More information about the datatable-help
mailing list