[datatable-help] Auto-convert characters to factors when settings keys?

mdowle at mdowle.plus.com mdowle at mdowle.plus.com
Tue May 25 19:10:00 CEST 2010


I'm not up to speed with S4 either. Hm.

I don't mind the dtbl approach other than it gets ugly when compounding
statements :

   dt[where,select,by][having][join,...][where][join,...]...

translates to

   dtbl(dtbl(dtbl(dtbl(dtbl(dt,where,select,by),having),join),where),join)

Wouldn't be a problem if we could allow both and let the user chose.


> I don't know much about S4, but I think it'd be tough in practice to
> make it work. If a package didn't have a namespace or the user loads
> functions with "source", those functions would end up dispatching on
> [.data.table, and you'd probably have confusing errors.
>
> I've thought about a lite verson of data.table that operated on data
> frames. You'd have to get rid of [.data.table and use some alias to
> [.data.table (pick dtbl which acts like a twisted combination of "with"
> and "subset"), so instead of:
>
>   dt[, sum(x), by = a]
>
> It'd be:
>
>   dtbl(dt, j = sum(x), by = a)
>
> An advantage of this approach is that it helps newcomers and eases folks
> into more complicated indexing/merging/subsetting possible with data
> tables.
>
> You could still have setkey operate on data frames. A big downside is
> that it'd be hard to keep the key in sync if the user re-orders the data
> frame. One option is to have a very light [.data.table that just does
> [.data.frame and zaps the key.
>
> - Tom
>
>
>> -----Original Message-----
>> From: datatable-help-bounces at lists.r-forge.r-project.org
>> [mailto:datatable-help-bounces at lists.r-forge.r-project.org]
>> On Behalf Of mdowle at mdowle.plus.com
>> Sent: Tuesday, May 25, 2010 11:23
>> To: Steve Lianoglou
>> Cc: datatable-help at lists.r-forge.r-project.org
>> Subject: Re: [datatable-help] Auto-convert characters to
>> factors when settings keys?
>>
>> Perhaps radical suggestion ... should/could we move to S4?
>> Only user code, or packages that call data.table would see
>> the [.data.table signature, otherwise the object would appear
>> to be a data.frame to packages that require data.frame.  Not
>> sure if this is possible or not - I did try once.
>>
>> Also tried to inherit data.table from data.frame in S3 using
>> class(df)=c("data.table","data.frame"). As long as the
>> package that only works with data.frame uses is.data.frame()
>> or inherits(), and not class(x)=="data.frame", then it gets
>> past that point but then [.data.table is still dispatched
>> rather than [.data.frame.  I'm thinking S4 signatures might
>> another option if they can be local. Advantage being that no
>> class conversion would then be needed.
>>
>> Above is from memory a few years back so may well be wrong.
>> Plus R has moved on in the meantime.
>>
>> Tom,  do you have a good example to hand where the
>> data.table<=>data.frame conversion causes grief (syntax as
>> well as speed) that we could use for testing ?
>>
>> This also links to the not yet implemented .SDF object. The
>> way I was thinking to do that wouldn't allow use of both .SD
>> and .SDF in the same j, but it would be fast. Would be nicer
>> not to need .SDF though if S4 works.
>>
>>
>> > On Tue, May 25, 2010 at 9:39 AM, Short, Tom <TShort at epri.com> wrote:
>> >
>> >>> I guess I don't understand why you'd want to make setdf and setdt
>> >>> instead of using the as.data.frame/as.data.table functions?
>> >>>
>> >>> Isn't the as.* more idiomatic S3-OOized R?
>> >>
>> >> You're right, it's not very standard for R, but if I have
>> a 9-GB data
>> >> table, I may not want the memory consumption of keeping
>> both copies
>> >> around. Because it's nonstandard, maybe it shouldn't be in
>> >> data.table, but it's worth discussing.
>> >
>> > Ahh ... good point.
>> >
>> > I see their use case now ;-)
>> >
>> > +1 for having something like these so people who know what they're
>> > doing can use them.
>> >
>> > --
>> > Steve Lianoglou
>> > Graduate Student: Computational Systems Biology  | Memorial
>> > Sloan-Kettering Cancer Center  | Weill Medical College of Cornell
>> > University Contact Info: http://cbio.mskcc.org/~lianos/contact
>> >
>>
>>
>> _______________________________________________
>> datatable-help mailing list
>> datatable-help at lists.r-forge.r-project.org
>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/d
> atatable-help
>>
>




More information about the datatable-help mailing list