[datatable-help] Using data.table to run a function on every row

Ricardo Saporta saporta at scarletmail.rutgers.edu
Fri Sep 27 08:41:19 CEST 2013


sorry, I probably should have elaborated  (it's late here, in NJ)

The error you are seeing is most likely coming from your getURL function in
that you are adding several ids to a data.frame of varying rows, and `R`
cannot recycle it correctly.

If you instead breakdown by id, then each time you are only assigning one
id and R will be able to recycle appropriately, without issue.

good luck!
Rick


Ricardo Saporta
Graduate Student, Data Analytics
Rutgers University, New Jersey
e: saporta at rutgers.edu



On Fri, Sep 27, 2013 at 2:37 AM, Ricardo Saporta <
saporta at scarletmail.rutgers.edu> wrote:

> Hi there,
>
> Try inserting a `by=id` in
>
>    a <- db[(has_url), getUrls(text, id), by=id]
>
> Also, no need for "has_url == T"
> instead, use
>   (has_url)
> If the variable is alread logical.  (Otherwise, you are just slowing
> things down ;)
>
>
>
> Ricardo Saporta
> Graduate Student, Data Analytics
> Rutgers University, New Jersey
> e: saporta at rutgers.edu
>
>
>
> On Thu, Sep 26, 2013 at 11:16 PM, Stian Håklev <shaklev at gmail.com> wrote:
>
>> I'm trying to run a function on every row fulfilling a certain criterium,
>> which returns a data frame - the idea is then to take the list of data
>> frames and rbindlist them together for a totally separate data.table. (I'm
>> extracting several URL links from each forum post, and tagging them with
>> the forum post they came from).
>>
>> I tried doing this with a data.table
>>
>> a <- db[has_url == T, getUrls(text, id)]
>>
>> and get the message
>>
>> Error in `$<-.data.frame`(`*tmp*`, "id", value = c(1L, 6L, 1L, 2L, 4L,  :
>>   replacement has 11007 rows, data has 29787
>>
>> Because some rows have several URLs... However, I don't care that these
>> rowlengths don't match, I still want these rows :) I thought J would just
>> let me execute arbitrary R code in the context of the rows as variable
>> names, etc.
>>
>> Here's the function it's running, but that shouldn't be relevant
>>
>> getUrls <- function(text, id) {
>>   matches <- str_match_all(text, url_pattern)
>>   a <- data.frame(urls=unlist(matches))
>>   a$id <- id
>>   a
>> }
>>
>>
>> Thanks, and thanks for an amazing package - data.table has made my life
>> so much easier. It should be part of base, I think.
>> Stian Haklev, University of Toronto
>>
>> --
>> http://reganmian.net/blog -- Random Stuff that Matters
>>
>> _______________________________________________
>> datatable-help mailing list
>> datatable-help at lists.r-forge.r-project.org
>>
>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/datatable-help/attachments/20130927/9ebaa2f9/attachment.html>


More information about the datatable-help mailing list