[datatable-help] [SOLVED] Something strange with scoping(?) when data.table is used inside a package I'm developing.

Steve Lianoglou mailinglist.honeypot at gmail.com
Tue Nov 2 23:29:24 CET 2010


I just `svn up`'ed my data.table package to see if I can dig into the
code and I noticed that this issue was previously reported and fixed
in 1.5.1.

Seems like I stumbled on the same issue ... sorry, I didn't see this
in the mailing list.

Thanks,
-steve

On Tue, Nov 2, 2010 at 6:20 PM, Steve Lianoglou
<mailinglist.honeypot at gmail.com> wrote:
> More info on my problem:
>
> I'm loading my package, and passing an object to a function defined
> (and exported) in my library that does the fast dt[, SOMETHING,
> by='entrez.id']. I call this function, and it bombs when I try to
> access "things" (columns) of the data.table in my SOMEHTING
> expression.
>
> Given the same object, I then execute the lines of the function that
> failing one by one (I don't call the function), and it executes
> normally.
>
> I wonder if this is relevant:
>
> (I). After the functions bombs, this is the result of traceback() --
> is it normal for the deepest call to actually be to `[.data.frame`?
>
> R> traceback()
> 4: `[.data.frame`(x, i, j)
> 3: `[.data.table`(dt, , list(seqnames = seqnames[1], strand = strand[1],
>       start = min(start), end = max(end)), by = "entrez.id")
> 2: dt[, list(seqnames = seqnames[1], strand = strand[1], start = min(start),
>       end = max(end)), by = "entrez.id"]
> 1: annotatedTxBounds(annotated)
>
> (II) Bioconductor packages use the S4 system. I've defined some
> conversion functions on objects in my package so that I can convert
> them to data.table's in the "expected way", a la:
>
> R> my.dt <- as(MyOwnClass, 'data.table')
>
> In order for me to do that, I've had to S4-ize the S3 data.table class, like so:
>
> setOldClass("data.table")
> (I also had `setOldClass(c('data.table', 'data.frame'))`, both have
> the same error)
>
> So ... it has something to do with my function as its run from within
> my package's environment -- but I don't know what to do about it.
>
> I tried adding `import(data.table)` into NAMESPACE file as well, but no dice.
>
> Thanks,
> -steve
>
> On Tue, Nov 2, 2010 at 3:14 PM, Steve Lianoglou
> <mailinglist.honeypot at gmail.com> wrote:
>> Hi,
>>
>> Sorry for what is about to be a vaguely described problem I'm having,
>> but here goes.
>>
>> I'm developing some R/bioconductor packages and have been using
>> data.table for a few things in them quite happily. Although my
>> packages are written to be properly installed, as I develop with them,
>> I just source their "R" directories to make my life easier so I can
>> easily modify them and update my R environment when I find something
>> wrong (w/o having to restart R, then call library(MyPackage), etc ..)
>>
>> So, I just went through my package to make sure my NAMESPACE stuffs
>> are kosher -- that I export the classes, methods, and functions I need
>> to export.
>>
>> In my DESCRIPTION file, data.table is listed in the "Depends" section.
>>
>> I'm only mentioning this because now that I've successfully done all
>> that, I installed my package and am now using it by calling
>> library(MyPackage). Now there is something strange happening with my
>> data.table stuff.
>>
>> The column names of my data.table are no longer recognized in my j
>> functions. For instance:,
>>
>> R> library(data.table)
>> R> df <- structure(list(seqnames = c("chr22", "chr22", "chr22", "chr22",
>> "chr22", "chr22", "chr22", "chr22", "chr22", "chr22"), start = c(22639026L,
>> 22639103L, 22639574L, 22643475L, 22643596L, 28059152L, 15897460L,
>> 15905763L, 15908214L, 15917963L), end = c(22639102L, 22639210L,
>> 22639749L, 22643595L, 22644748L, 28059247L, 15898234L, 15905890L,
>> 15908316L, 15919682L), width = c(77L, 108L, 176L, 121L, 1153L,
>> 96L, 775L, 128L, 103L, 1720L), strand = structure(c(1L, 1L, 1L,
>> 1L, 1L, 2L, 1L, 1L, 1L, 1L), .Label = c("+", "-", "*"), class = "factor"),
>>    exon.anno = structure(c(5L, 1L, 1L, 1L, 4L, 3L, 3L, 3L, 3L,
>>    3L), .Label = c("cds", "overlap", "utr", "utr3", "utr5"), class =
>> "factor"),
>>    symbol = c("DDTL", "DDTL", "DDTL", "DDTL", "DDTL", "SNORD125",
>>    "CECR7", "CECR7", "CECR7", "CECR7"), entrez.id = c("100037417",
>>    "100037417", "100037417", "100037417", "100037417", "100113380",
>>    "100130418", "100130418", "100130418", "100130418")), .Names =
>> c("seqnames",
>> "start", "end", "width", "strand", "exon.anno", "symbol", "entrez.id"
>> ), row.names = c(NA, -10L), class = "data.frame")
>> R> dt <- data.table(df, key='entrez.id')
>>
>> Now, something like this should work (and in fact does when I have a
>> clean environment like you would by just starting R and pasting the
>> above code):
>>
>> R> bounds <- dt[, list(start=min(start), end=min(end)), by='entrez.id']
>>
>> But when the "bowels" of my code in my package are running this (only
>> when it's attached with library(MyLibrary), I'm not getting this
>> error:
>>   Error in min(start) : invalid 'type' (closure) of argument
>>
>> If I try to use the .SD object in the same place, I also get an other error:
>>
>> R> bounds2 <- dt[, {
>>  .sd <- .SD[1]
>>  .sd$start <- min(start)
>>  .sd$end <- max(end)
>>  .sd
>> }, by='entrez.id']
>>
>> (The code here is simplified, but assume I need to use .SD -- I want
>> to get the rest of the columns in the dt data.table w/o referencing
>> them explicitly)
>>
>> The error when the code is run from within my package is:
>>
>>   Error in `[.data.frame`(x, i, j) : object '.SD' not found
>>
>> Even though it works in a "clean" R environment.
>>
>> Can anyone take a stab at why this might be happening? I'm at a bit of a loss.
>>
>> For what it's worth, this is the sessionInfo of my R environment when
>> my package is installed (my package is called GenomicFeaturesX). Most
>> of the packages in "other attached packages" are from biocondutcor.
>>
>> R version 2.12.0 (2010-10-15)
>> Platform: x86_64-unknown-linux-gnu (64-bit)
>>
>> locale:
>>  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
>>  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
>>  [5] LC_MONETARY=C              LC_MESSAGES=en_US.UTF-8
>>  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
>>  [9] LC_ADDRESS=C               LC_TELEPHONE=C
>> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>>
>> attached base packages:
>> [1] stats     graphics  grDevices utils     datasets  methods   base
>>
>> other attached packages:
>>  [1] BSgenome.Hsapiens.UCSC.hg18_1.3.16 BSgenome_1.18.0
>>  [3] Biostrings_2.18.0                  doMC_1.2.1
>>  [5] multicore_0.1-3                    foreach_1.3.0
>>  [7] codetools_0.2-2                    iterators_1.0.3
>>  [9] GenomicFeaturesX_0.2               data.table_1.5
>> [11] GenomicFeatures_1.2.0              GenomicRanges_1.2.1
>> [13] IRanges_1.8.2
>>
>> loaded via a namespace (and not attached):
>>  [1] annotate_1.28.0      AnnotationDbi_1.12.0 Biobase_2.10.0
>>  [4] biomaRt_2.6.0        DBI_0.2-5            RCurl_1.4-3
>>  [7] RSQLite_0.9-2        rtracklayer_1.10.2   tools_2.12.0
>> [10] XML_3.2-0            xtable_1.5-6
>>
>> Thanks,
>> -steve
>>
>> --
>> Steve Lianoglou
>> Graduate Student: Computational Systems Biology
>>  | Memorial Sloan-Kettering Cancer Center
>>  | Weill Medical College of Cornell University
>> Contact Info: http://cbio.mskcc.org/~lianos/contact
>>
>
>
>
> --
> Steve Lianoglou
> Graduate Student: Computational Systems Biology
>  | Memorial Sloan-Kettering Cancer Center
>  | Weill Medical College of Cornell University
> Contact Info: http://cbio.mskcc.org/~lianos/contact
>



-- 
Steve Lianoglou
Graduate Student: Computational Systems Biology
 | Memorial Sloan-Kettering Cancer Center
 | Weill Medical College of Cornell University
Contact Info: http://cbio.mskcc.org/~lianos/contact


More information about the datatable-help mailing list