[datatable-help] use of data.table in an S4 class
Matthew Forrest
matthew.forrest at senckenberg.de
Wed Jan 20 19:18:04 CET 2016
Hi all,
I want to use data.table as either a data element (slot) of an S4 class
or as a super class (ie. the new class inherits from data.table using
"contains"). The reason for this is that I want to build an S4 class
with data.table as the main data component of the class, but with some
rather complex meta-data (specifically more S4 classes) associated. I
then want to operate on this data.table (mostly with ":=") inside some
functions
The second option (using a superclass of data.table) looks perfect, if
it worked I would just be able to treat the new S4 class exactly as a
data.table. One could pass a data.table superclass object into a
function which could operate on the data.table superclass using ":=",
and then (by the reference goodness of data.table) the data.table
superclass would be still be modified outside the function. But when
data.table is used as a super class, the normal operations just don't
work. See my issue flagged on github (with a simple code snippet to
demonstrate) here:
https://github.com/Rdatatable/data.table/issues/1504
Maybe this can work, which would be fantastic, but let's see.
Then there is the idea of using data.table a regular slot. Problem is
that accessing the data.table slot in the S4 object and modifying it
(either inside a function or using a class method) results in the type
of copying that data.table works so hard to avoid! Disaster! For an
example, run this code:
# simple test object
setClass("TestObj",
slots = c(id = "character",
dt = "data.table"
)
)
# define a method
setGeneric(name="testMethod",
def=function(theObject,new.col.name, cols.to.add)
{
standardGeneric("testMethod")
}
)
setMethod(f="testMethod",
signature="TestObj",
definition=function(theObject,new.col.name, cols.to.add)
{
theObject at dt <- theObject at dt[,paste(new.col.name):=
rowSums(.SD), .SDcols = cols.to.add]
return(theObject)
}
)
# create a TestObj
lala <- new("TestObj", id = "test", dt = data.table(a=1:10, b=11:20))
# accessing the data.table slot results in a copy :-(
lala at dt <- lala at dt[, c1 := a + b]
# using a method also makes a copy :'-(
testMethod(lala, new.col.name = "c2", cols.to.add = c("a","b"))
lala <- testMethod(lala, new.col.name = "c2", cols.to.add = c("a","b"))
So you can see the problem. I want to use a data.table as past of S4
class, and process it in keeping with data.table principles, but I can't
find a way. It is possible that I can just suck up the performance cost
of the copy, but some of my data.tables are pretty large so that might
not be viable.
Any help greatly appreciated!
Thanks,
Matt
--
Dr Matthew Forrest
Biodiversity and Climate Research Centre (BiK-F)
Visiting address: Georg-Voigt-Straße 14-16, room 3.04, D-60325 Frankfurt am Main
Postal address: Senckenberganlage 25, D-60325 Frankfurt am Main
Tel.: +49-69-7542-1867
Fax: +49-69-7542-7904
E-mail: matthew.forrest at senckenberg.de
Homepage: http://www.bik-f.de/root/index.php?page_id=709
Senckenberg Gesellschaft für Naturforschung
Rechtsfähiger Verein gemäß § 22 BGB
Senckenberganlage 25
60325 Frankfurt
Direktorium: Prof. Dr. Dr. h.c. Volker Mosbrugger, Prof. Dr. Andreas Mulch, Stephanie Schwedhelm, Prof. Dr. Katrin Böhning-Gaese, Prof. Dr. Uwe Fritz, PD Dr. Ingrid Kröncke
Präsidentin: Dr. h.c. Beate Heraeus
Aufsichtsbehörde: Magistrat der Stadt Frankfurt am Main (Ordnungsamt)
More information about the datatable-help
mailing list