[datatable-help] Something strange with scoping(?) when data.table is used inside a package I'm developing.

Steve Lianoglou mailinglist.honeypot at gmail.com
Tue Nov 2 20:14:12 CET 2010


Hi,

Sorry for what is about to be a vaguely described problem I'm having,
but here goes.

I'm developing some R/bioconductor packages and have been using
data.table for a few things in them quite happily. Although my
packages are written to be properly installed, as I develop with them,
I just source their "R" directories to make my life easier so I can
easily modify them and update my R environment when I find something
wrong (w/o having to restart R, then call library(MyPackage), etc ..)

So, I just went through my package to make sure my NAMESPACE stuffs
are kosher -- that I export the classes, methods, and functions I need
to export.

In my DESCRIPTION file, data.table is listed in the "Depends" section.

I'm only mentioning this because now that I've successfully done all
that, I installed my package and am now using it by calling
library(MyPackage). Now there is something strange happening with my
data.table stuff.

The column names of my data.table are no longer recognized in my j
functions. For instance:,

R> library(data.table)
R> df <- structure(list(seqnames = c("chr22", "chr22", "chr22", "chr22",
"chr22", "chr22", "chr22", "chr22", "chr22", "chr22"), start = c(22639026L,
22639103L, 22639574L, 22643475L, 22643596L, 28059152L, 15897460L,
15905763L, 15908214L, 15917963L), end = c(22639102L, 22639210L,
22639749L, 22643595L, 22644748L, 28059247L, 15898234L, 15905890L,
15908316L, 15919682L), width = c(77L, 108L, 176L, 121L, 1153L,
96L, 775L, 128L, 103L, 1720L), strand = structure(c(1L, 1L, 1L,
1L, 1L, 2L, 1L, 1L, 1L, 1L), .Label = c("+", "-", "*"), class = "factor"),
    exon.anno = structure(c(5L, 1L, 1L, 1L, 4L, 3L, 3L, 3L, 3L,
    3L), .Label = c("cds", "overlap", "utr", "utr3", "utr5"), class =
"factor"),
    symbol = c("DDTL", "DDTL", "DDTL", "DDTL", "DDTL", "SNORD125",
    "CECR7", "CECR7", "CECR7", "CECR7"), entrez.id = c("100037417",
    "100037417", "100037417", "100037417", "100037417", "100113380",
    "100130418", "100130418", "100130418", "100130418")), .Names =
c("seqnames",
"start", "end", "width", "strand", "exon.anno", "symbol", "entrez.id"
), row.names = c(NA, -10L), class = "data.frame")
R> dt <- data.table(df, key='entrez.id')

Now, something like this should work (and in fact does when I have a
clean environment like you would by just starting R and pasting the
above code):

R> bounds <- dt[, list(start=min(start), end=min(end)), by='entrez.id']

But when the "bowels" of my code in my package are running this (only
when it's attached with library(MyLibrary), I'm not getting this
error:
   Error in min(start) : invalid 'type' (closure) of argument

If I try to use the .SD object in the same place, I also get an other error:

R> bounds2 <- dt[, {
  .sd <- .SD[1]
  .sd$start <- min(start)
  .sd$end <- max(end)
  .sd
}, by='entrez.id']

(The code here is simplified, but assume I need to use .SD -- I want
to get the rest of the columns in the dt data.table w/o referencing
them explicitly)

The error when the code is run from within my package is:

   Error in `[.data.frame`(x, i, j) : object '.SD' not found

Even though it works in a "clean" R environment.

Can anyone take a stab at why this might be happening? I'm at a bit of a loss.

For what it's worth, this is the sessionInfo of my R environment when
my package is installed (my package is called GenomicFeaturesX). Most
of the packages in "other attached packages" are from biocondutcor.

R version 2.12.0 (2010-10-15)
Platform: x86_64-unknown-linux-gnu (64-bit)

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
 [5] LC_MONETARY=C              LC_MESSAGES=en_US.UTF-8
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
 [9] LC_ADDRESS=C               LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
 [1] BSgenome.Hsapiens.UCSC.hg18_1.3.16 BSgenome_1.18.0
 [3] Biostrings_2.18.0                  doMC_1.2.1
 [5] multicore_0.1-3                    foreach_1.3.0
 [7] codetools_0.2-2                    iterators_1.0.3
 [9] GenomicFeaturesX_0.2               data.table_1.5
[11] GenomicFeatures_1.2.0              GenomicRanges_1.2.1
[13] IRanges_1.8.2

loaded via a namespace (and not attached):
 [1] annotate_1.28.0      AnnotationDbi_1.12.0 Biobase_2.10.0
 [4] biomaRt_2.6.0        DBI_0.2-5            RCurl_1.4-3
 [7] RSQLite_0.9-2        rtracklayer_1.10.2   tools_2.12.0
[10] XML_3.2-0            xtable_1.5-6

Thanks,
-steve

-- 
Steve Lianoglou
Graduate Student: Computational Systems Biology
 | Memorial Sloan-Kettering Cancer Center
 | Weill Medical College of Cornell University
Contact Info: http://cbio.mskcc.org/~lianos/contact


More information about the datatable-help mailing list