[datatable-help] use of data.table in an S4 class

Matthew Forrest matthew.forrest at senckenberg.de
Wed Jan 20 19:18:04 CET 2016


Hi all,

I want to use data.table as either a data element (slot) of an S4 class 
or as a super class (ie. the new class inherits from data.table using 
"contains").  The reason for this is that I want to build an S4 class 
with data.table as the main data component of the class, but with some 
rather complex meta-data (specifically more S4 classes) associated.  I 
then want to operate on this data.table (mostly with ":=") inside some 
functions

The second option (using a superclass of data.table) looks perfect, if 
it worked I would just be able to treat the new S4 class exactly as a 
data.table.  One could pass a data.table superclass object into a 
function which could operate on the data.table superclass using ":=", 
and then (by the reference goodness of data.table) the data.table 
superclass would be still be modified outside the function.  But when 
data.table is used as a super class, the normal operations just don't 
work.  See my issue flagged on github (with a simple code snippet to 
demonstrate) here:

https://github.com/Rdatatable/data.table/issues/1504

Maybe this can work, which would be fantastic, but let's see.


Then there is the idea of using data.table a regular slot.  Problem is 
that accessing the data.table slot in the S4 object and modifying it 
(either inside a function or using a class method) results in the type 
of copying that data.table works so hard to avoid!  Disaster! For an 
example, run this code:

# simple test object
setClass("TestObj",
          slots = c(id = "character",
                    dt = "data.table"
          )
)

# define a method
setGeneric(name="testMethod",
            def=function(theObject,new.col.name, cols.to.add)
            {
              standardGeneric("testMethod")
            }
)
setMethod(f="testMethod",
           signature="TestObj",
           definition=function(theObject,new.col.name, cols.to.add)
           {
             theObject at dt <- theObject at dt[,paste(new.col.name):= 
rowSums(.SD), .SDcols = cols.to.add]
             return(theObject)
           }
)

# create a TestObj
lala <- new("TestObj", id = "test", dt = data.table(a=1:10, b=11:20))

# accessing the data.table slot results in a copy :-(
lala at dt <- lala at dt[, c1 := a + b]

# using a method also makes a copy :'-(
testMethod(lala, new.col.name = "c2",  cols.to.add = c("a","b"))
lala <- testMethod(lala, new.col.name = "c2",  cols.to.add = c("a","b"))


So you can see the problem.  I want to use a data.table as past of S4 
class, and process it in keeping with data.table principles, but I can't 
find a way.  It is possible that I can just suck up the performance cost 
of the copy, but some of my data.tables are pretty large so that might 
not be viable.

Any help greatly appreciated!

Thanks,

Matt




-- 
Dr Matthew Forrest
Biodiversity and Climate Research Centre (BiK-F)
Visiting address: Georg-Voigt-Straße 14-16, room 3.04, D-60325 Frankfurt am Main
Postal address: Senckenberganlage 25, D-60325 Frankfurt am Main
Tel.: +49-69-7542-1867
Fax: +49-69-7542-7904
E-mail: matthew.forrest at senckenberg.de
Homepage: http://www.bik-f.de/root/index.php?page_id=709

Senckenberg Gesellschaft für Naturforschung
Rechtsfähiger Verein gemäß § 22 BGB
Senckenberganlage 25
60325 Frankfurt

Direktorium: Prof. Dr. Dr. h.c. Volker Mosbrugger, Prof. Dr. Andreas Mulch, Stephanie Schwedhelm, Prof. Dr. Katrin Böhning-Gaese, Prof. Dr. Uwe Fritz,  PD Dr. Ingrid Kröncke
Präsidentin: Dr. h.c. Beate Heraeus
Aufsichtsbehörde: Magistrat der Stadt Frankfurt am Main (Ordnungsamt)



More information about the datatable-help mailing list