[datatable-help] [SOLVED] Something strange with scoping(?) when data.table is used inside a package I'm developing.

Matthew Dowle mdowle at mdowle.plus.com
Thu Nov 4 14:36:36 CET 2010


Yes that was an off list fix for importing data.table (Imports field in
DESCRIPTON). There is still likely a problem with the Depends field of
DESCRIPTION, though. If you need to Depend rather than Import then let me
know and I'll try again to fix it.

Note also that data.table queries don't work from a browser prompt (bug
#1131).

Matthew


> I just `svn up`'ed my data.table package to see if I can dig into the
> code and I noticed that this issue was previously reported and fixed
> in 1.5.1.
>
> Seems like I stumbled on the same issue ... sorry, I didn't see this
> in the mailing list.
>
> Thanks,
> -steve
>
> On Tue, Nov 2, 2010 at 6:20 PM, Steve Lianoglou
> <mailinglist.honeypot at gmail.com> wrote:
>> More info on my problem:
>>
>> I'm loading my package, and passing an object to a function defined
>> (and exported) in my library that does the fast dt[, SOMETHING,
>> by='entrez.id']. I call this function, and it bombs when I try to
>> access "things" (columns) of the data.table in my SOMEHTING
>> expression.
>>
>> Given the same object, I then execute the lines of the function that
>> failing one by one (I don't call the function), and it executes
>> normally.
>>
>> I wonder if this is relevant:
>>
>> (I). After the functions bombs, this is the result of traceback() --
>> is it normal for the deepest call to actually be to `[.data.frame`?
>>
>> R> traceback()
>> 4: `[.data.frame`(x, i, j)
>> 3: `[.data.table`(dt, , list(seqnames = seqnames[1], strand = strand[1],
>>       start = min(start), end = max(end)), by = "entrez.id")
>> 2: dt[, list(seqnames = seqnames[1], strand = strand[1], start =
>> min(start),
>>       end = max(end)), by = "entrez.id"]
>> 1: annotatedTxBounds(annotated)
>>
>> (II) Bioconductor packages use the S4 system. I've defined some
>> conversion functions on objects in my package so that I can convert
>> them to data.table's in the "expected way", a la:
>>
>> R> my.dt <- as(MyOwnClass, 'data.table')
>>
>> In order for me to do that, I've had to S4-ize the S3 data.table class,
>> like so:
>>
>> setOldClass("data.table")
>> (I also had `setOldClass(c('data.table', 'data.frame'))`, both have
>> the same error)
>>
>> So ... it has something to do with my function as its run from within
>> my package's environment -- but I don't know what to do about it.
>>
>> I tried adding `import(data.table)` into NAMESPACE file as well, but no
>> dice.
>>
>> Thanks,
>> -steve
>>
>> On Tue, Nov 2, 2010 at 3:14 PM, Steve Lianoglou
>> <mailinglist.honeypot at gmail.com> wrote:
>>> Hi,
>>>
>>> Sorry for what is about to be a vaguely described problem I'm having,
>>> but here goes.
>>>
>>> I'm developing some R/bioconductor packages and have been using
>>> data.table for a few things in them quite happily. Although my
>>> packages are written to be properly installed, as I develop with them,
>>> I just source their "R" directories to make my life easier so I can
>>> easily modify them and update my R environment when I find something
>>> wrong (w/o having to restart R, then call library(MyPackage), etc ..)
>>>
>>> So, I just went through my package to make sure my NAMESPACE stuffs
>>> are kosher -- that I export the classes, methods, and functions I need
>>> to export.
>>>
>>> In my DESCRIPTION file, data.table is listed in the "Depends" section.
>>>
>>> I'm only mentioning this because now that I've successfully done all
>>> that, I installed my package and am now using it by calling
>>> library(MyPackage). Now there is something strange happening with my
>>> data.table stuff.
>>>
>>> The column names of my data.table are no longer recognized in my j
>>> functions. For instance:,
>>>
>>> R> library(data.table)
>>> R> df <- structure(list(seqnames = c("chr22", "chr22", "chr22",
>>> "chr22",
>>> "chr22", "chr22", "chr22", "chr22", "chr22", "chr22"), start =
>>> c(22639026L,
>>> 22639103L, 22639574L, 22643475L, 22643596L, 28059152L, 15897460L,
>>> 15905763L, 15908214L, 15917963L), end = c(22639102L, 22639210L,
>>> 22639749L, 22643595L, 22644748L, 28059247L, 15898234L, 15905890L,
>>> 15908316L, 15919682L), width = c(77L, 108L, 176L, 121L, 1153L,
>>> 96L, 775L, 128L, 103L, 1720L), strand = structure(c(1L, 1L, 1L,
>>> 1L, 1L, 2L, 1L, 1L, 1L, 1L), .Label = c("+", "-", "*"), class =
>>> "factor"),
>>>    exon.anno = structure(c(5L, 1L, 1L, 1L, 4L, 3L, 3L, 3L, 3L,
>>>    3L), .Label = c("cds", "overlap", "utr", "utr3", "utr5"), class =
>>> "factor"),
>>>    symbol = c("DDTL", "DDTL", "DDTL", "DDTL", "DDTL", "SNORD125",
>>>    "CECR7", "CECR7", "CECR7", "CECR7"), entrez.id = c("100037417",
>>>    "100037417", "100037417", "100037417", "100037417", "100113380",
>>>    "100130418", "100130418", "100130418", "100130418")), .Names =
>>> c("seqnames",
>>> "start", "end", "width", "strand", "exon.anno", "symbol", "entrez.id"
>>> ), row.names = c(NA, -10L), class = "data.frame")
>>> R> dt <- data.table(df, key='entrez.id')
>>>
>>> Now, something like this should work (and in fact does when I have a
>>> clean environment like you would by just starting R and pasting the
>>> above code):
>>>
>>> R> bounds <- dt[, list(start=min(start), end=min(end)), by='entrez.id']
>>>
>>> But when the "bowels" of my code in my package are running this (only
>>> when it's attached with library(MyLibrary), I'm not getting this
>>> error:
>>>   Error in min(start) : invalid 'type' (closure) of argument
>>>
>>> If I try to use the .SD object in the same place, I also get an other
>>> error:
>>>
>>> R> bounds2 <- dt[, {
>>>  .sd <- .SD[1]
>>>  .sd$start <- min(start)
>>>  .sd$end <- max(end)
>>>  .sd
>>> }, by='entrez.id']
>>>
>>> (The code here is simplified, but assume I need to use .SD -- I want
>>> to get the rest of the columns in the dt data.table w/o referencing
>>> them explicitly)
>>>
>>> The error when the code is run from within my package is:
>>>
>>>   Error in `[.data.frame`(x, i, j) : object '.SD' not found
>>>
>>> Even though it works in a "clean" R environment.
>>>
>>> Can anyone take a stab at why this might be happening? I'm at a bit of
>>> a loss.
>>>
>>> For what it's worth, this is the sessionInfo of my R environment when
>>> my package is installed (my package is called GenomicFeaturesX). Most
>>> of the packages in "other attached packages" are from biocondutcor.
>>>
>>> R version 2.12.0 (2010-10-15)
>>> Platform: x86_64-unknown-linux-gnu (64-bit)
>>>
>>> locale:
>>>  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
>>>  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
>>>  [5] LC_MONETARY=C              LC_MESSAGES=en_US.UTF-8
>>>  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
>>>  [9] LC_ADDRESS=C               LC_TELEPHONE=C
>>> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>>>
>>> attached base packages:
>>> [1] stats     graphics  grDevices utils     datasets  methods   base
>>>
>>> other attached packages:
>>>  [1] BSgenome.Hsapiens.UCSC.hg18_1.3.16 BSgenome_1.18.0
>>>  [3] Biostrings_2.18.0                  doMC_1.2.1
>>>  [5] multicore_0.1-3                    foreach_1.3.0
>>>  [7] codetools_0.2-2                    iterators_1.0.3
>>>  [9] GenomicFeaturesX_0.2               data.table_1.5
>>> [11] GenomicFeatures_1.2.0              GenomicRanges_1.2.1
>>> [13] IRanges_1.8.2
>>>
>>> loaded via a namespace (and not attached):
>>>  [1] annotate_1.28.0      AnnotationDbi_1.12.0 Biobase_2.10.0
>>>  [4] biomaRt_2.6.0        DBI_0.2-5            RCurl_1.4-3
>>>  [7] RSQLite_0.9-2        rtracklayer_1.10.2   tools_2.12.0
>>> [10] XML_3.2-0            xtable_1.5-6
>>>
>>> Thanks,
>>> -steve
>>>
>>> --
>>> Steve Lianoglou
>>> Graduate Student: Computational Systems Biology
>>>  | Memorial Sloan-Kettering Cancer Center
>>>  | Weill Medical College of Cornell University
>>> Contact Info: http://cbio.mskcc.org/~lianos/contact
>>>
>>
>>
>>
>> --
>> Steve Lianoglou
>> Graduate Student: Computational Systems Biology
>>  | Memorial Sloan-Kettering Cancer Center
>>  | Weill Medical College of Cornell University
>> Contact Info: http://cbio.mskcc.org/~lianos/contact
>>
>
>
>
> --
> Steve Lianoglou
> Graduate Student: Computational Systems Biology
>  | Memorial Sloan-Kettering Cancer Center
>  | Weill Medical College of Cornell University
> Contact Info: http://cbio.mskcc.org/~lianos/contact
> _______________________________________________
> datatable-help mailing list
> datatable-help at lists.r-forge.r-project.org
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
>




More information about the datatable-help mailing list