[datatable-help] Is there any overhead to converting back and forth from a data.table to a data.frame?

Kevin Ushey kevinushey at gmail.com
Mon Apr 7 20:40:07 CEST 2014


I agree; this would be very useful.

On Mon, Apr 7, 2014 at 11:29 AM, Chris Neff <caneff at gmail.com> wrote:
> I would appreciate such a function, yes. Thanks for the explanation.
>
>
> On Mon, Apr 7, 2014 at 2:25 PM, Arunkumar Srinivasan <aragorn168b at gmail.com>
> wrote:
>>
>> as.data.frame is a S3 with .data.table method and is definitely faster
>> than data.frame(). But it still does copy(.). data.frame(.) would also
>> convert strings to factors by default (if stringsAsFactors=TRUE).
>>
>> The most efficient way to convert data.table to data.frame would be to do
>> things by reference (in place). The code is already available in
>> as.data.frame, just remove the copy(.):
>>
>> # convert data.table to data.frame by reference
>> setDF <- function(x) {
>>     if (!is.data.table(x))
>>         stop("x must be a data.table")
>>     setattr(x, "row.names", .set_row_names(nrow(x)))
>>     setattr(x, "class", "data.frame")
>>     setattr(x, "sorted", NULL)
>>     setattr(x, ".internal.selfref", NULL)
>> }
>>
>> Now you've a function that'll convert a data.table to data.frame by
>> reference.
>>
>> require(data.table)
>> dat <- data.table(x=1:5, y=6:10)
>> setDF(dat) # dat is now a data.frame
>>
>> Probably we should export this function as well, like setDT so that users
>> can switch between the two as they desire without hitting performance?
>>
>>
>> Arun
>>
>> From: Chris Neff caneff at gmail.com
>> Reply: Chris Neff caneff at gmail.com
>> Date: April 7, 2014 at 5:32:47 PM
>> To: datatable-help at lists.r-forge.r-project.org
>> datatable-help at lists.r-forge.r-project.org
>> Subject:  [datatable-help] Is there any overhead to converting back and
>> forth from a data.table to a data.frame?
>>
>> I prefer data.tables for all the code processing I do.  But others on my
>> team using my functions aren't comfortable with data.tables, so most of the
>> libraries I write end with
>>
>>  return(data.frame(DT))
>>
>> Is there any copying or other overhead happening there? Since it inherits
>> from data.frame, I think the answer is no.
>>
>> Now, if I have a function that does such a return, but I wrap that itself
>> in a data.table call:
>>
>> data.table(func_that_returns_df())
>>
>> Is there any inefficiency there?  Is there a difference between
>> data.table() and as.data.table() here?
>> _______________________________________________
>> datatable-help mailing list
>> datatable-help at lists.r-forge.r-project.org
>>
>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
>
>
>
> _______________________________________________
> datatable-help mailing list
> datatable-help at lists.r-forge.r-project.org
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help


More information about the datatable-help mailing list