[datatable-help] Is there any overhead to converting back and forth from a data.table to a data.frame?
Chris Neff
caneff at gmail.com
Mon Apr 7 20:29:04 CEST 2014
I would appreciate such a function, yes. Thanks for the explanation.
On Mon, Apr 7, 2014 at 2:25 PM, Arunkumar Srinivasan
<aragorn168b at gmail.com>wrote:
> as.data.frame is a S3 with .data.table method and is definitely faster
> than data.frame(). But it still does copy(.). data.frame(.) would also
> convert strings to factors by default (if stringsAsFactors=TRUE).
>
> The most efficient way to convert data.table to data.frame would be to do
> things by reference (in place). The code is already available in
> as.data.frame, just remove the copy(.):
>
> # convert data.table to data.frame by reference
> setDF <- function(x) {
> if (!is.data.table(x))
> stop("x must be a data.table")
> setattr(x, "row.names", .set_row_names(nrow(x)))
> setattr(x, "class", "data.frame")
> setattr(x, "sorted", NULL)
> setattr(x, ".internal.selfref", NULL)
> }
>
> Now you've a function that'll convert a data.table to data.frame *by
> reference*.
>
> require(data.table)
> dat <- data.table(x=1:5, y=6:10)
> setDF(dat) # dat is now a data.frame
>
> Probably we should export this function as well, like setDT so that users
> can switch between the two as they desire without hitting performance?
>
>
> Arun
>
> From: Chris Neff caneff at gmail.com
> Reply: Chris Neff caneff at gmail.com
> Date: April 7, 2014 at 5:32:47 PM
> To: datatable-help at lists.r-forge.r-project.org
> datatable-help at lists.r-forge.r-project.org
> Subject: [datatable-help] Is there any overhead to converting back and
> forth from a data.table to a data.frame?
>
> I prefer data.tables for all the code processing I do. But others on my
> team using my functions aren't comfortable with data.tables, so most of the
> libraries I write end with
>
> return(data.frame(DT))
>
> Is there any copying or other overhead happening there? Since it inherits
> from data.frame, I think the answer is no.
>
> Now, if I have a function that does such a return, but I wrap that itself
> in a data.table call:
>
> data.table(func_that_returns_df())
>
> Is there any inefficiency there? Is there a difference between
> data.table() and as.data.table() here?
> _______________________________________________
> datatable-help mailing list
> datatable-help at lists.r-forge.r-project.org
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/datatable-help/attachments/20140407/c356dc3b/attachment.html>
More information about the datatable-help
mailing list