[datatable-help] [SOLVED] Something strange with scoping(?) when data.table is used inside a package I'm developing.

Matthew Dowle mdowle at mdowle.plus.com
Thu Nov 4 15:32:35 CET 2010


When cedta (Calling Environment Data.Table Aware) runs it finds the
namespace calling data.table no problem. getNamespaceImports() on that
namespace will then return "data.table" if that package Imports it,
distinguishing it as a data.table aware package. So far so good. For
Depends though, .Depends is (I think) not in the namespace environment but
the corresponding package environment for that namespace. The nut is going
from namespace to package environment.

I tried a more robust version of the following but it doesn't seem very
neat, and it didn't work. Might be just a case of trying again with fresh
eyes :

  name = getNamespaceName(te)
  pkg = as.environment(paste("package:",name,sep=""))
  "data.table" %in% get(".Depends",envir=pkg)

Also R changed in this area fairly recently (name change from .required to
.Depends).



> Thanks for the reply ... I actually moved it over to the Imports
> anyway .. but still, it's probably a good thing to fix at some point
> :-)
>
> It's kind of weird though ... do you know what "the nut" of the problem
> is?
> -steve
>
> On Thu, Nov 4, 2010 at 9:36 AM, Matthew Dowle <mdowle at mdowle.plus.com>
> wrote:
>>
>> Yes that was an off list fix for importing data.table (Imports field in
>> DESCRIPTON). There is still likely a problem with the Depends field of
>> DESCRIPTION, though. If you need to Depend rather than Import then let
>> me
>> know and I'll try again to fix it.
>>
>> Note also that data.table queries don't work from a browser prompt (bug
>> #1131).
>>
>> Matthew
>>
>>
>>> I just `svn up`'ed my data.table package to see if I can dig into the
>>> code and I noticed that this issue was previously reported and fixed
>>> in 1.5.1.
>>>
>>> Seems like I stumbled on the same issue ... sorry, I didn't see this
>>> in the mailing list.
>>>
>>> Thanks,
>>> -steve
>>>
>>> On Tue, Nov 2, 2010 at 6:20 PM, Steve Lianoglou
>>> <mailinglist.honeypot at gmail.com> wrote:
>>>> More info on my problem:
>>>>
>>>> I'm loading my package, and passing an object to a function defined
>>>> (and exported) in my library that does the fast dt[, SOMETHING,
>>>> by='entrez.id']. I call this function, and it bombs when I try to
>>>> access "things" (columns) of the data.table in my SOMEHTING
>>>> expression.
>>>>
>>>> Given the same object, I then execute the lines of the function that
>>>> failing one by one (I don't call the function), and it executes
>>>> normally.
>>>>
>>>> I wonder if this is relevant:
>>>>
>>>> (I). After the functions bombs, this is the result of traceback() --
>>>> is it normal for the deepest call to actually be to `[.data.frame`?
>>>>
>>>> R> traceback()
>>>> 4: `[.data.frame`(x, i, j)
>>>> 3: `[.data.table`(dt, , list(seqnames = seqnames[1], strand =
>>>> strand[1],
>>>>       start = min(start), end = max(end)), by = "entrez.id")
>>>> 2: dt[, list(seqnames = seqnames[1], strand = strand[1], start =
>>>> min(start),
>>>>       end = max(end)), by = "entrez.id"]
>>>> 1: annotatedTxBounds(annotated)
>>>>
>>>> (II) Bioconductor packages use the S4 system. I've defined some
>>>> conversion functions on objects in my package so that I can convert
>>>> them to data.table's in the "expected way", a la:
>>>>
>>>> R> my.dt <- as(MyOwnClass, 'data.table')
>>>>
>>>> In order for me to do that, I've had to S4-ize the S3 data.table
>>>> class,
>>>> like so:
>>>>
>>>> setOldClass("data.table")
>>>> (I also had `setOldClass(c('data.table', 'data.frame'))`, both have
>>>> the same error)
>>>>
>>>> So ... it has something to do with my function as its run from within
>>>> my package's environment -- but I don't know what to do about it.
>>>>
>>>> I tried adding `import(data.table)` into NAMESPACE file as well, but
>>>> no
>>>> dice.
>>>>
>>>> Thanks,
>>>> -steve
>>>>
>>>> On Tue, Nov 2, 2010 at 3:14 PM, Steve Lianoglou
>>>> <mailinglist.honeypot at gmail.com> wrote:
>>>>> Hi,
>>>>>
>>>>> Sorry for what is about to be a vaguely described problem I'm having,
>>>>> but here goes.
>>>>>
>>>>> I'm developing some R/bioconductor packages and have been using
>>>>> data.table for a few things in them quite happily. Although my
>>>>> packages are written to be properly installed, as I develop with
>>>>> them,
>>>>> I just source their "R" directories to make my life easier so I can
>>>>> easily modify them and update my R environment when I find something
>>>>> wrong (w/o having to restart R, then call library(MyPackage), etc ..)
>>>>>
>>>>> So, I just went through my package to make sure my NAMESPACE stuffs
>>>>> are kosher -- that I export the classes, methods, and functions I
>>>>> need
>>>>> to export.
>>>>>
>>>>> In my DESCRIPTION file, data.table is listed in the "Depends"
>>>>> section.
>>>>>
>>>>> I'm only mentioning this because now that I've successfully done all
>>>>> that, I installed my package and am now using it by calling
>>>>> library(MyPackage). Now there is something strange happening with my
>>>>> data.table stuff.
>>>>>
>>>>> The column names of my data.table are no longer recognized in my j
>>>>> functions. For instance:,
>>>>>
>>>>> R> library(data.table)
>>>>> R> df <- structure(list(seqnames = c("chr22", "chr22", "chr22",
>>>>> "chr22",
>>>>> "chr22", "chr22", "chr22", "chr22", "chr22", "chr22"), start =
>>>>> c(22639026L,
>>>>> 22639103L, 22639574L, 22643475L, 22643596L, 28059152L, 15897460L,
>>>>> 15905763L, 15908214L, 15917963L), end = c(22639102L, 22639210L,
>>>>> 22639749L, 22643595L, 22644748L, 28059247L, 15898234L, 15905890L,
>>>>> 15908316L, 15919682L), width = c(77L, 108L, 176L, 121L, 1153L,
>>>>> 96L, 775L, 128L, 103L, 1720L), strand = structure(c(1L, 1L, 1L,
>>>>> 1L, 1L, 2L, 1L, 1L, 1L, 1L), .Label = c("+", "-", "*"), class =
>>>>> "factor"),
>>>>>    exon.anno = structure(c(5L, 1L, 1L, 1L, 4L, 3L, 3L, 3L, 3L,
>>>>>    3L), .Label = c("cds", "overlap", "utr", "utr3", "utr5"), class =
>>>>> "factor"),
>>>>>    symbol = c("DDTL", "DDTL", "DDTL", "DDTL", "DDTL", "SNORD125",
>>>>>    "CECR7", "CECR7", "CECR7", "CECR7"), entrez.id = c("100037417",
>>>>>    "100037417", "100037417", "100037417", "100037417", "100113380",
>>>>>    "100130418", "100130418", "100130418", "100130418")), .Names =
>>>>> c("seqnames",
>>>>> "start", "end", "width", "strand", "exon.anno", "symbol", "entrez.id"
>>>>> ), row.names = c(NA, -10L), class = "data.frame")
>>>>> R> dt <- data.table(df, key='entrez.id')
>>>>>
>>>>> Now, something like this should work (and in fact does when I have a
>>>>> clean environment like you would by just starting R and pasting the
>>>>> above code):
>>>>>
>>>>> R> bounds <- dt[, list(start=min(start), end=min(end)),
>>>>> by='entrez.id']
>>>>>
>>>>> But when the "bowels" of my code in my package are running this (only
>>>>> when it's attached with library(MyLibrary), I'm not getting this
>>>>> error:
>>>>>   Error in min(start) : invalid 'type' (closure) of argument
>>>>>
>>>>> If I try to use the .SD object in the same place, I also get an other
>>>>> error:
>>>>>
>>>>> R> bounds2 <- dt[, {
>>>>>  .sd <- .SD[1]
>>>>>  .sd$start <- min(start)
>>>>>  .sd$end <- max(end)
>>>>>  .sd
>>>>> }, by='entrez.id']
>>>>>
>>>>> (The code here is simplified, but assume I need to use .SD -- I want
>>>>> to get the rest of the columns in the dt data.table w/o referencing
>>>>> them explicitly)
>>>>>
>>>>> The error when the code is run from within my package is:
>>>>>
>>>>>   Error in `[.data.frame`(x, i, j) : object '.SD' not found
>>>>>
>>>>> Even though it works in a "clean" R environment.
>>>>>
>>>>> Can anyone take a stab at why this might be happening? I'm at a bit
>>>>> of
>>>>> a loss.
>>>>>
>>>>> For what it's worth, this is the sessionInfo of my R environment when
>>>>> my package is installed (my package is called GenomicFeaturesX). Most
>>>>> of the packages in "other attached packages" are from biocondutcor.
>>>>>
>>>>> R version 2.12.0 (2010-10-15)
>>>>> Platform: x86_64-unknown-linux-gnu (64-bit)
>>>>>
>>>>> locale:
>>>>>  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
>>>>>  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
>>>>>  [5] LC_MONETARY=C              LC_MESSAGES=en_US.UTF-8
>>>>>  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
>>>>>  [9] LC_ADDRESS=C               LC_TELEPHONE=C
>>>>> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>>>>>
>>>>> attached base packages:
>>>>> [1] stats     graphics  grDevices utils     datasets  methods   base
>>>>>
>>>>> other attached packages:
>>>>>  [1] BSgenome.Hsapiens.UCSC.hg18_1.3.16 BSgenome_1.18.0
>>>>>  [3] Biostrings_2.18.0                  doMC_1.2.1
>>>>>  [5] multicore_0.1-3                    foreach_1.3.0
>>>>>  [7] codetools_0.2-2                    iterators_1.0.3
>>>>>  [9] GenomicFeaturesX_0.2               data.table_1.5
>>>>> [11] GenomicFeatures_1.2.0              GenomicRanges_1.2.1
>>>>> [13] IRanges_1.8.2
>>>>>
>>>>> loaded via a namespace (and not attached):
>>>>>  [1] annotate_1.28.0      AnnotationDbi_1.12.0 Biobase_2.10.0
>>>>>  [4] biomaRt_2.6.0        DBI_0.2-5            RCurl_1.4-3
>>>>>  [7] RSQLite_0.9-2        rtracklayer_1.10.2   tools_2.12.0
>>>>> [10] XML_3.2-0            xtable_1.5-6
>>>>>
>>>>> Thanks,
>>>>> -steve
>>>>>
>>>>> --
>>>>> Steve Lianoglou
>>>>> Graduate Student: Computational Systems Biology
>>>>>  | Memorial Sloan-Kettering Cancer Center
>>>>>  | Weill Medical College of Cornell University
>>>>> Contact Info: http://cbio.mskcc.org/~lianos/contact
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Steve Lianoglou
>>>> Graduate Student: Computational Systems Biology
>>>>  | Memorial Sloan-Kettering Cancer Center
>>>>  | Weill Medical College of Cornell University
>>>> Contact Info: http://cbio.mskcc.org/~lianos/contact
>>>>
>>>
>>>
>>>
>>> --
>>> Steve Lianoglou
>>> Graduate Student: Computational Systems Biology
>>>  | Memorial Sloan-Kettering Cancer Center
>>>  | Weill Medical College of Cornell University
>>> Contact Info: http://cbio.mskcc.org/~lianos/contact
>>> _______________________________________________
>>> datatable-help mailing list
>>> datatable-help at lists.r-forge.r-project.org
>>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
>>>
>>
>>
>>
>
>
>
> --
> Steve Lianoglou
> Graduate Student: Computational Systems Biology
>  | Memorial Sloan-Kettering Cancer Center
>  | Weill Medical College of Cornell University
> Contact Info: http://cbio.mskcc.org/~lianos/contact
>




More information about the datatable-help mailing list