[datatable-help] Is there any overhead to converting back and forth from a data.table to a data.frame?

Steve Lianoglou lianoglou.steve at gene.com
Mon Apr 7 20:50:00 CEST 2014


+1 on exporting setDF

On Mon, Apr 7, 2014 at 11:25 AM, Arunkumar Srinivasan
<aragorn168b at gmail.com> wrote:
> as.data.frame is a S3 with .data.table method and is definitely faster than
> data.frame(). But it still does copy(.). data.frame(.) would also convert
> strings to factors by default (if stringsAsFactors=TRUE).
>
> The most efficient way to convert data.table to data.frame would be to do
> things by reference (in place). The code is already available in
> as.data.frame, just remove the copy(.):
>
> # convert data.table to data.frame by reference
> setDF <- function(x) {
>     if (!is.data.table(x))
>         stop("x must be a data.table")
>     setattr(x, "row.names", .set_row_names(nrow(x)))
>     setattr(x, "class", "data.frame")
>     setattr(x, "sorted", NULL)
>     setattr(x, ".internal.selfref", NULL)
> }
>
> Now you've a function that'll convert a data.table to data.frame by
> reference.
>
> require(data.table)
> dat <- data.table(x=1:5, y=6:10)
> setDF(dat) # dat is now a data.frame
>
> Probably we should export this function as well, like setDT so that users
> can switch between the two as they desire without hitting performance?
>
>
> Arun
>
> From: Chris Neff caneff at gmail.com
> Reply: Chris Neff caneff at gmail.com
> Date: April 7, 2014 at 5:32:47 PM
> To: datatable-help at lists.r-forge.r-project.org
> datatable-help at lists.r-forge.r-project.org
> Subject:  [datatable-help] Is there any overhead to converting back and
> forth from a data.table to a data.frame?
>
> I prefer data.tables for all the code processing I do.  But others on my
> team using my functions aren't comfortable with data.tables, so most of the
> libraries I write end with
>
>  return(data.frame(DT))
>
> Is there any copying or other overhead happening there? Since it inherits
> from data.frame, I think the answer is no.
>
> Now, if I have a function that does such a return, but I wrap that itself in
> a data.table call:
>
> data.table(func_that_returns_df())
>
> Is there any inefficiency there?  Is there a difference between data.table()
> and as.data.table() here?
> _______________________________________________
> datatable-help mailing list
> datatable-help at lists.r-forge.r-project.org
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
>
>
> _______________________________________________
> datatable-help mailing list
> datatable-help at lists.r-forge.r-project.org
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help



-- 
Steve Lianoglou
Computational Biologist
Genentech


More information about the datatable-help mailing list