[datatable-help] [SOLVED] Something strange with scoping(?) when data.table is used inside a package I'm developing.

Steve Lianoglou mailinglist.honeypot at gmail.com
Thu Nov 4 15:11:45 CET 2010


Thanks for the reply ... I actually moved it over to the Imports
anyway .. but still, it's probably a good thing to fix at some point
:-)

It's kind of weird though ... do you know what "the nut" of the problem is?
-steve

On Thu, Nov 4, 2010 at 9:36 AM, Matthew Dowle <mdowle at mdowle.plus.com> wrote:
>
> Yes that was an off list fix for importing data.table (Imports field in
> DESCRIPTON). There is still likely a problem with the Depends field of
> DESCRIPTION, though. If you need to Depend rather than Import then let me
> know and I'll try again to fix it.
>
> Note also that data.table queries don't work from a browser prompt (bug
> #1131).
>
> Matthew
>
>
>> I just `svn up`'ed my data.table package to see if I can dig into the
>> code and I noticed that this issue was previously reported and fixed
>> in 1.5.1.
>>
>> Seems like I stumbled on the same issue ... sorry, I didn't see this
>> in the mailing list.
>>
>> Thanks,
>> -steve
>>
>> On Tue, Nov 2, 2010 at 6:20 PM, Steve Lianoglou
>> <mailinglist.honeypot at gmail.com> wrote:
>>> More info on my problem:
>>>
>>> I'm loading my package, and passing an object to a function defined
>>> (and exported) in my library that does the fast dt[, SOMETHING,
>>> by='entrez.id']. I call this function, and it bombs when I try to
>>> access "things" (columns) of the data.table in my SOMEHTING
>>> expression.
>>>
>>> Given the same object, I then execute the lines of the function that
>>> failing one by one (I don't call the function), and it executes
>>> normally.
>>>
>>> I wonder if this is relevant:
>>>
>>> (I). After the functions bombs, this is the result of traceback() --
>>> is it normal for the deepest call to actually be to `[.data.frame`?
>>>
>>> R> traceback()
>>> 4: `[.data.frame`(x, i, j)
>>> 3: `[.data.table`(dt, , list(seqnames = seqnames[1], strand = strand[1],
>>>       start = min(start), end = max(end)), by = "entrez.id")
>>> 2: dt[, list(seqnames = seqnames[1], strand = strand[1], start =
>>> min(start),
>>>       end = max(end)), by = "entrez.id"]
>>> 1: annotatedTxBounds(annotated)
>>>
>>> (II) Bioconductor packages use the S4 system. I've defined some
>>> conversion functions on objects in my package so that I can convert
>>> them to data.table's in the "expected way", a la:
>>>
>>> R> my.dt <- as(MyOwnClass, 'data.table')
>>>
>>> In order for me to do that, I've had to S4-ize the S3 data.table class,
>>> like so:
>>>
>>> setOldClass("data.table")
>>> (I also had `setOldClass(c('data.table', 'data.frame'))`, both have
>>> the same error)
>>>
>>> So ... it has something to do with my function as its run from within
>>> my package's environment -- but I don't know what to do about it.
>>>
>>> I tried adding `import(data.table)` into NAMESPACE file as well, but no
>>> dice.
>>>
>>> Thanks,
>>> -steve
>>>
>>> On Tue, Nov 2, 2010 at 3:14 PM, Steve Lianoglou
>>> <mailinglist.honeypot at gmail.com> wrote:
>>>> Hi,
>>>>
>>>> Sorry for what is about to be a vaguely described problem I'm having,
>>>> but here goes.
>>>>
>>>> I'm developing some R/bioconductor packages and have been using
>>>> data.table for a few things in them quite happily. Although my
>>>> packages are written to be properly installed, as I develop with them,
>>>> I just source their "R" directories to make my life easier so I can
>>>> easily modify them and update my R environment when I find something
>>>> wrong (w/o having to restart R, then call library(MyPackage), etc ..)
>>>>
>>>> So, I just went through my package to make sure my NAMESPACE stuffs
>>>> are kosher -- that I export the classes, methods, and functions I need
>>>> to export.
>>>>
>>>> In my DESCRIPTION file, data.table is listed in the "Depends" section.
>>>>
>>>> I'm only mentioning this because now that I've successfully done all
>>>> that, I installed my package and am now using it by calling
>>>> library(MyPackage). Now there is something strange happening with my
>>>> data.table stuff.
>>>>
>>>> The column names of my data.table are no longer recognized in my j
>>>> functions. For instance:,
>>>>
>>>> R> library(data.table)
>>>> R> df <- structure(list(seqnames = c("chr22", "chr22", "chr22",
>>>> "chr22",
>>>> "chr22", "chr22", "chr22", "chr22", "chr22", "chr22"), start =
>>>> c(22639026L,
>>>> 22639103L, 22639574L, 22643475L, 22643596L, 28059152L, 15897460L,
>>>> 15905763L, 15908214L, 15917963L), end = c(22639102L, 22639210L,
>>>> 22639749L, 22643595L, 22644748L, 28059247L, 15898234L, 15905890L,
>>>> 15908316L, 15919682L), width = c(77L, 108L, 176L, 121L, 1153L,
>>>> 96L, 775L, 128L, 103L, 1720L), strand = structure(c(1L, 1L, 1L,
>>>> 1L, 1L, 2L, 1L, 1L, 1L, 1L), .Label = c("+", "-", "*"), class =
>>>> "factor"),
>>>>    exon.anno = structure(c(5L, 1L, 1L, 1L, 4L, 3L, 3L, 3L, 3L,
>>>>    3L), .Label = c("cds", "overlap", "utr", "utr3", "utr5"), class =
>>>> "factor"),
>>>>    symbol = c("DDTL", "DDTL", "DDTL", "DDTL", "DDTL", "SNORD125",
>>>>    "CECR7", "CECR7", "CECR7", "CECR7"), entrez.id = c("100037417",
>>>>    "100037417", "100037417", "100037417", "100037417", "100113380",
>>>>    "100130418", "100130418", "100130418", "100130418")), .Names =
>>>> c("seqnames",
>>>> "start", "end", "width", "strand", "exon.anno", "symbol", "entrez.id"
>>>> ), row.names = c(NA, -10L), class = "data.frame")
>>>> R> dt <- data.table(df, key='entrez.id')
>>>>
>>>> Now, something like this should work (and in fact does when I have a
>>>> clean environment like you would by just starting R and pasting the
>>>> above code):
>>>>
>>>> R> bounds <- dt[, list(start=min(start), end=min(end)), by='entrez.id']
>>>>
>>>> But when the "bowels" of my code in my package are running this (only
>>>> when it's attached with library(MyLibrary), I'm not getting this
>>>> error:
>>>>   Error in min(start) : invalid 'type' (closure) of argument
>>>>
>>>> If I try to use the .SD object in the same place, I also get an other
>>>> error:
>>>>
>>>> R> bounds2 <- dt[, {
>>>>  .sd <- .SD[1]
>>>>  .sd$start <- min(start)
>>>>  .sd$end <- max(end)
>>>>  .sd
>>>> }, by='entrez.id']
>>>>
>>>> (The code here is simplified, but assume I need to use .SD -- I want
>>>> to get the rest of the columns in the dt data.table w/o referencing
>>>> them explicitly)
>>>>
>>>> The error when the code is run from within my package is:
>>>>
>>>>   Error in `[.data.frame`(x, i, j) : object '.SD' not found
>>>>
>>>> Even though it works in a "clean" R environment.
>>>>
>>>> Can anyone take a stab at why this might be happening? I'm at a bit of
>>>> a loss.
>>>>
>>>> For what it's worth, this is the sessionInfo of my R environment when
>>>> my package is installed (my package is called GenomicFeaturesX). Most
>>>> of the packages in "other attached packages" are from biocondutcor.
>>>>
>>>> R version 2.12.0 (2010-10-15)
>>>> Platform: x86_64-unknown-linux-gnu (64-bit)
>>>>
>>>> locale:
>>>>  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
>>>>  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
>>>>  [5] LC_MONETARY=C              LC_MESSAGES=en_US.UTF-8
>>>>  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
>>>>  [9] LC_ADDRESS=C               LC_TELEPHONE=C
>>>> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>>>>
>>>> attached base packages:
>>>> [1] stats     graphics  grDevices utils     datasets  methods   base
>>>>
>>>> other attached packages:
>>>>  [1] BSgenome.Hsapiens.UCSC.hg18_1.3.16 BSgenome_1.18.0
>>>>  [3] Biostrings_2.18.0                  doMC_1.2.1
>>>>  [5] multicore_0.1-3                    foreach_1.3.0
>>>>  [7] codetools_0.2-2                    iterators_1.0.3
>>>>  [9] GenomicFeaturesX_0.2               data.table_1.5
>>>> [11] GenomicFeatures_1.2.0              GenomicRanges_1.2.1
>>>> [13] IRanges_1.8.2
>>>>
>>>> loaded via a namespace (and not attached):
>>>>  [1] annotate_1.28.0      AnnotationDbi_1.12.0 Biobase_2.10.0
>>>>  [4] biomaRt_2.6.0        DBI_0.2-5            RCurl_1.4-3
>>>>  [7] RSQLite_0.9-2        rtracklayer_1.10.2   tools_2.12.0
>>>> [10] XML_3.2-0            xtable_1.5-6
>>>>
>>>> Thanks,
>>>> -steve
>>>>
>>>> --
>>>> Steve Lianoglou
>>>> Graduate Student: Computational Systems Biology
>>>>  | Memorial Sloan-Kettering Cancer Center
>>>>  | Weill Medical College of Cornell University
>>>> Contact Info: http://cbio.mskcc.org/~lianos/contact
>>>>
>>>
>>>
>>>
>>> --
>>> Steve Lianoglou
>>> Graduate Student: Computational Systems Biology
>>>  | Memorial Sloan-Kettering Cancer Center
>>>  | Weill Medical College of Cornell University
>>> Contact Info: http://cbio.mskcc.org/~lianos/contact
>>>
>>
>>
>>
>> --
>> Steve Lianoglou
>> Graduate Student: Computational Systems Biology
>>  | Memorial Sloan-Kettering Cancer Center
>>  | Weill Medical College of Cornell University
>> Contact Info: http://cbio.mskcc.org/~lianos/contact
>> _______________________________________________
>> datatable-help mailing list
>> datatable-help at lists.r-forge.r-project.org
>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
>>
>
>
>



-- 
Steve Lianoglou
Graduate Student: Computational Systems Biology
 | Memorial Sloan-Kettering Cancer Center
 | Weill Medical College of Cornell University
Contact Info: http://cbio.mskcc.org/~lianos/contact


More information about the datatable-help mailing list