[datatable-help] Something strange with scoping(?) when data.table is used inside a package I'm developing.

Steve Lianoglou mailinglist.honeypot at gmail.com
Tue Nov 2 23:20:34 CET 2010


More info on my problem:

I'm loading my package, and passing an object to a function defined
(and exported) in my library that does the fast dt[, SOMETHING,
by='entrez.id']. I call this function, and it bombs when I try to
access "things" (columns) of the data.table in my SOMEHTING
expression.

Given the same object, I then execute the lines of the function that
failing one by one (I don't call the function), and it executes
normally.

I wonder if this is relevant:

(I). After the functions bombs, this is the result of traceback() --
is it normal for the deepest call to actually be to `[.data.frame`?

R> traceback()
4: `[.data.frame`(x, i, j)
3: `[.data.table`(dt, , list(seqnames = seqnames[1], strand = strand[1],
       start = min(start), end = max(end)), by = "entrez.id")
2: dt[, list(seqnames = seqnames[1], strand = strand[1], start = min(start),
       end = max(end)), by = "entrez.id"]
1: annotatedTxBounds(annotated)

(II) Bioconductor packages use the S4 system. I've defined some
conversion functions on objects in my package so that I can convert
them to data.table's in the "expected way", a la:

R> my.dt <- as(MyOwnClass, 'data.table')

In order for me to do that, I've had to S4-ize the S3 data.table class, like so:

setOldClass("data.table")
(I also had `setOldClass(c('data.table', 'data.frame'))`, both have
the same error)

So ... it has something to do with my function as its run from within
my package's environment -- but I don't know what to do about it.

I tried adding `import(data.table)` into NAMESPACE file as well, but no dice.

Thanks,
-steve

On Tue, Nov 2, 2010 at 3:14 PM, Steve Lianoglou
<mailinglist.honeypot at gmail.com> wrote:
> Hi,
>
> Sorry for what is about to be a vaguely described problem I'm having,
> but here goes.
>
> I'm developing some R/bioconductor packages and have been using
> data.table for a few things in them quite happily. Although my
> packages are written to be properly installed, as I develop with them,
> I just source their "R" directories to make my life easier so I can
> easily modify them and update my R environment when I find something
> wrong (w/o having to restart R, then call library(MyPackage), etc ..)
>
> So, I just went through my package to make sure my NAMESPACE stuffs
> are kosher -- that I export the classes, methods, and functions I need
> to export.
>
> In my DESCRIPTION file, data.table is listed in the "Depends" section.
>
> I'm only mentioning this because now that I've successfully done all
> that, I installed my package and am now using it by calling
> library(MyPackage). Now there is something strange happening with my
> data.table stuff.
>
> The column names of my data.table are no longer recognized in my j
> functions. For instance:,
>
> R> library(data.table)
> R> df <- structure(list(seqnames = c("chr22", "chr22", "chr22", "chr22",
> "chr22", "chr22", "chr22", "chr22", "chr22", "chr22"), start = c(22639026L,
> 22639103L, 22639574L, 22643475L, 22643596L, 28059152L, 15897460L,
> 15905763L, 15908214L, 15917963L), end = c(22639102L, 22639210L,
> 22639749L, 22643595L, 22644748L, 28059247L, 15898234L, 15905890L,
> 15908316L, 15919682L), width = c(77L, 108L, 176L, 121L, 1153L,
> 96L, 775L, 128L, 103L, 1720L), strand = structure(c(1L, 1L, 1L,
> 1L, 1L, 2L, 1L, 1L, 1L, 1L), .Label = c("+", "-", "*"), class = "factor"),
>    exon.anno = structure(c(5L, 1L, 1L, 1L, 4L, 3L, 3L, 3L, 3L,
>    3L), .Label = c("cds", "overlap", "utr", "utr3", "utr5"), class =
> "factor"),
>    symbol = c("DDTL", "DDTL", "DDTL", "DDTL", "DDTL", "SNORD125",
>    "CECR7", "CECR7", "CECR7", "CECR7"), entrez.id = c("100037417",
>    "100037417", "100037417", "100037417", "100037417", "100113380",
>    "100130418", "100130418", "100130418", "100130418")), .Names =
> c("seqnames",
> "start", "end", "width", "strand", "exon.anno", "symbol", "entrez.id"
> ), row.names = c(NA, -10L), class = "data.frame")
> R> dt <- data.table(df, key='entrez.id')
>
> Now, something like this should work (and in fact does when I have a
> clean environment like you would by just starting R and pasting the
> above code):
>
> R> bounds <- dt[, list(start=min(start), end=min(end)), by='entrez.id']
>
> But when the "bowels" of my code in my package are running this (only
> when it's attached with library(MyLibrary), I'm not getting this
> error:
>   Error in min(start) : invalid 'type' (closure) of argument
>
> If I try to use the .SD object in the same place, I also get an other error:
>
> R> bounds2 <- dt[, {
>  .sd <- .SD[1]
>  .sd$start <- min(start)
>  .sd$end <- max(end)
>  .sd
> }, by='entrez.id']
>
> (The code here is simplified, but assume I need to use .SD -- I want
> to get the rest of the columns in the dt data.table w/o referencing
> them explicitly)
>
> The error when the code is run from within my package is:
>
>   Error in `[.data.frame`(x, i, j) : object '.SD' not found
>
> Even though it works in a "clean" R environment.
>
> Can anyone take a stab at why this might be happening? I'm at a bit of a loss.
>
> For what it's worth, this is the sessionInfo of my R environment when
> my package is installed (my package is called GenomicFeaturesX). Most
> of the packages in "other attached packages" are from biocondutcor.
>
> R version 2.12.0 (2010-10-15)
> Platform: x86_64-unknown-linux-gnu (64-bit)
>
> locale:
>  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
>  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
>  [5] LC_MONETARY=C              LC_MESSAGES=en_US.UTF-8
>  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
>  [9] LC_ADDRESS=C               LC_TELEPHONE=C
> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>
> attached base packages:
> [1] stats     graphics  grDevices utils     datasets  methods   base
>
> other attached packages:
>  [1] BSgenome.Hsapiens.UCSC.hg18_1.3.16 BSgenome_1.18.0
>  [3] Biostrings_2.18.0                  doMC_1.2.1
>  [5] multicore_0.1-3                    foreach_1.3.0
>  [7] codetools_0.2-2                    iterators_1.0.3
>  [9] GenomicFeaturesX_0.2               data.table_1.5
> [11] GenomicFeatures_1.2.0              GenomicRanges_1.2.1
> [13] IRanges_1.8.2
>
> loaded via a namespace (and not attached):
>  [1] annotate_1.28.0      AnnotationDbi_1.12.0 Biobase_2.10.0
>  [4] biomaRt_2.6.0        DBI_0.2-5            RCurl_1.4-3
>  [7] RSQLite_0.9-2        rtracklayer_1.10.2   tools_2.12.0
> [10] XML_3.2-0            xtable_1.5-6
>
> Thanks,
> -steve
>
> --
> Steve Lianoglou
> Graduate Student: Computational Systems Biology
>  | Memorial Sloan-Kettering Cancer Center
>  | Weill Medical College of Cornell University
> Contact Info: http://cbio.mskcc.org/~lianos/contact
>



-- 
Steve Lianoglou
Graduate Student: Computational Systems Biology
 | Memorial Sloan-Kettering Cancer Center
 | Weill Medical College of Cornell University
Contact Info: http://cbio.mskcc.org/~lianos/contact


More information about the datatable-help mailing list