<br><br><div class="gmail_quote">On 20 July 2011 10:42, Matthew Dowle <span dir="ltr"><<a href="mailto:mdowle@mdowle.plus.com">mdowle@mdowle.plus.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;">
<br>
Thanks, makes sense. Yes, as.data.frame.data.table currently removes the<br>
'sorted' attribute, which is all a key is. I suppose that line could be<br>
removed so the key would be left on the data.frame. You would then need<br>
to change the class back to data.table at the end of the function, though,<br>
and make sure you didn't change the order of the rows otherwise that key<br>
would be invalid.<br>
<br>
However, packages I use, use other packages I don't use directly and know<br>
nothing about. I don't see the issue. Disk space? Memory space? The<br>
banner?<br></blockquote><div><br></div><div>Behaving nicely in a build environment that is more complicated than a normal R thing. </div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;">
There is also this related FR :<br>
<a href="https://r-forge.r-project.org/tracker/index.php?func=detail&aid=984&group_id=240&atid=978" target="_blank">https://r-forge.r-project.org/tracker/index.php?func=detail&aid=984&group_id=240&atid=978</a><br>
<br>
Just to check you know that the result of j in data.table can happily be a<br>
data.frame? So if your user is using data.table to call your function, he<br>
won't mind. If he's passing the entire data.table to your function, then<br>
he's not going to be wanting to retain the key anyway. You're returning<br>
some statistical result to him (not the orginal data back) so why does the<br>
key make sense to retain?<br>
<br></blockquote><div><br></div><div>Well I explicitly crafted an example where I return the entire data frame. Now, in this dumb example I ruined the ordering so the key leaves anyway. But I think i have cases where I want to take an entire data.(table|frame), do some processing, and return the full data.(table|frame) back like it was.</div>
<div><br></div><div> Noticing that strictly, your MyFunc 'returned' two columns, so it might be</div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;">
written like this :<br>
<br>
MyFunc <- function(numerator,denominator)<br>
{<br>
o = order(numerator)<br>
data.frame(numerator[o], cumsum((numerator/denomintor)[o])<br>
}<br>
<br>
Then the user can decide if he wants to cbind it to his data.frame, or<br>
fast assign it into a data.table, or by group, or whatever. That seems<br>
to me to be up to your user. Perhaps, the job of MyFunc is to return it's<br>
output given the input (and that's all).<br></blockquote><div><br></div><div><br></div><div>I think my issues are coming more from inexperience/uneasiness with some of the data.table idioms still. When you list it all out like that it becomes crystal clear though, and I think refactoring of my code is correct. I'm just not in the data.table mindset yet I guess.</div>
<div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;"><font color="#888888">
Matthew<br>
</font><div><div></div><div class="h5"><br>
<br>
> Mainly it is that I am writing some library functions that I and a few<br>
> others may be using. I don't want those functions to have to depend on<br>
> data.table because I don't want it to need to be installed for a purpose<br>
> that has nothing to do with it. But I use data.tables as input. Here is a<br>
> psuedo example<br>
><br>
> MyFunc <- function(data, numerator.var, denominator.var)<br>
> {<br>
> data <- data[order(data[,numerator.var])]<br>
> data$metric <- data[, numerator.var] / data[, denominator.var]<br>
> data$cum.metric <- cumsum(data$metric)<br>
><br>
> return(data)<br>
> }<br>
><br>
> I make this example to show that I need to preserve the whole data<br>
> variable<br>
> the whole way through and return a modified version. If I do<br>
><br>
> data <- as.data.frame(data)<br>
><br>
> as the first line of that function, then I lose the keys in a potential<br>
> data.table that is passed in. If I use<br>
><br>
> data <- as.data.table(data)<br>
><br>
> and change the subsetting to be data.table compliant, then I am forcing<br>
> someone to have a whole package loaded for something that can be done in<br>
> the<br>
> base language fine. There must be an agnostic way to do this. Apparently<br>
> subset doesn't do it either if keys get lost.<br>
><br>
> -Chris<br>
><br>
> On 20 July 2011 08:48, Matthew Dowle <<a href="mailto:mdowle@mdowle.plus.com">mdowle@mdowle.plus.com</a>> wrote:<br>
><br>
>><br>
>> Hi Chris,<br>
>><br>
>> If you're writing a package and don't want to worry if someone passes<br>
>> your<br>
>> package a data.table, then don't worry; just use data.frame syntax and<br>
>> your non-datatable-aware package will work fine.<br>
>><br>
>> If you're writing your own code you're in control of, just embrace the<br>
>> data.table ;)<br>
>><br>
>> If you're writing a function in an environment which is data.table<br>
>> aware,<br>
>> but you want your function to accept either data.frame or data.table,<br>
>> then<br>
>> at the beginning of your function just do :<br>
>><br>
>> f = myfunction(x) {<br>
>> x = as.data.table(x)<br>
>> # proceed with data.table syntax<br>
>> }<br>
>><br>
>> or<br>
>><br>
>> f = myfunction(x) {<br>
>> x = as.data.frame(x)<br>
>> # proceed with data.frame syntax<br>
>> }<br>
>><br>
>> Some of the CRAN packages that depend on data.table are doing that, I<br>
>> think.<br>
>><br>
>> In R itself it is common practice to coerce arguments to a common type<br>
>> and<br>
>> then proceed with the appropriate syntax for that type. Consider that<br>
>> matrix syntax is different syntax to data.frame syntax. You often see<br>
>> as.classiwant() at the beginning of functions, or switches depending on<br>
>> the type of object.<br>
>><br>
>> Remember that is.data.frame() is TRUE for both data.frame and<br>
>> data.table,<br>
>> but is.data.table() is TRUE only for data.table. as.data.table() does<br>
>> nothing if x is already a data.table, and is an efficient class change<br>
>> if<br>
>> x is a data.frame. Is efficiency the issue?<br>
>><br>
>> Does that help? If not, more info about the problem will be needed<br>
>> please.<br>
>><br>
>> Matthew<br>
>><br>
>><br>
>> > I'm used to seeing the column names at the bottom of the column too,<br>
>> but<br>
>> > that is only if the data.table is long enough. My example was too<br>
>> short<br>
>> > for<br>
>> > that, so I made the same sort of mistake you did :(<br>
>> ><br>
>> > Okay, that is a way, but is it a good way? Not sure...<br>
>> ><br>
>> > 2011/7/20 Timothée Carayol <<a href="mailto:timothee.carayol@gmail.com">timothee.carayol@gmail.com</a>><br>
>> ><br>
>> >> Sorry my mistake -- subset does return a data.table.<br>
>> >> (I was using as an example a data.table with 100 rows, and stupidly<br>
>> >> using<br>
>> >> the fact that it printed the whole thing rather than the 10 first<br>
>> rows<br>
>> >> only<br>
>> >> as my criterion for whether it worked or not.. Omitting that<br>
>> >> print.data.table does print up to 100 rows. I feel a bit stupid.)<br>
>> >><br>
>> >> Why doesn't it work for you if that is the case?<br>
>> >><br>
>> >> DF <- data.frame(a=1:200, b=1:10)<br>
>> >> DT <- as.data.table(DF)<br>
>> >> subDT <- subset(DT, select=a)<br>
>> >> class(DT)<br>
>> >> subDF <- subset(DF, select=a)<br>
>> >> class(DF)<br>
>> >> identical(as.data.frame(DT), DF)<br>
>> >><br>
>> >><br>
>> >><br>
>> >> On Wed, Jul 20, 2011 at 12:50 PM, Chris Neff <<a href="mailto:caneff@gmail.com">caneff@gmail.com</a>><br>
>> wrote:<br>
>> >><br>
>> >>> Yeah I realized that myself.<br>
>> >>><br>
>> >>> Another one: the function "with" doesn't seem to do what I want...<br>
>> but<br>
>> >>> at<br>
>> >>> least it is consistent!<br>
>> >>><br>
>> >>><br>
>> >>> 2011/7/20 Timothée Carayol <<a href="mailto:timothee.carayol@gmail.com">timothee.carayol@gmail.com</a>><br>
>> >>><br>
>> >>>> Sorry --<br>
>> >>>><br>
>> >>>> subset() was a poor idea, as it will return a data.frame even if<br>
>> the<br>
>> >>>> argument is a data.table..<br>
>> >>>><br>
>> >>>><br>
>> >>>><br>
>> >>>> 2011/7/20 Timothée Carayol <<a href="mailto:timothee.carayol@gmail.com">timothee.carayol@gmail.com</a>><br>
>> >>>><br>
>> >>>>> Hi--<br>
>> >>>>><br>
>> >>>>> You can use the subset() command with the select= option; not sure<br>
>> >>>>> it's<br>
>> >>>>> the best solution, though.<br>
>> >>>>><br>
>> >>>>> Timothee<br>
>> >>>>><br>
>> >>>>><br>
>> >>>>> On Wed, Jul 20, 2011 at 12:26 PM, Chris Neff <<a href="mailto:caneff@gmail.com">caneff@gmail.com</a>><br>
>> >>>>> wrote:<br>
>> >>>>><br>
>> >>>>>> I have a function where I pass a data frame and some variable<br>
>> names<br>
>> >>>>>> to<br>
>> >>>>>> calculate statistics on. However, I am at a loss as to how to<br>
>> write<br>
>> >>>>>> it<br>
>> >>>>>> correctly so that both data.frame and data.table work with it. If<br>
>> I<br>
>> >>>>>> have:<br>
>> >>>>>><br>
>> >>>>>> DF = data.frame(x=1:10,y=2:11,z=3:12)<br>
>> >>>>>><br>
>> >>>>>> DT = data.table(DF)<br>
>> >>>>>><br>
>> >>>>>> var.names = c("x","y")<br>
>> >>>>>><br>
>> >>>>>><br>
>> >>>>>> I can do the following things to subset:<br>
>> >>>>>><br>
>> >>>>>> DT[,var.names,with=FALSE]<br>
>> >>>>>> DF[,var.names]<br>
>> >>>>>><br>
>> >>>>>><br>
>> >>>>>> but of course DT[,var.names] won't give me back what I want, and<br>
>> >>>>>> DF[,var.names,with=FALSE] returns an error because with doesn't<br>
>> >>>>>> exist there.<br>
>> >>>>>> So how do I do this?<br>
>> >>>>>><br>
>> >>>>>> Thanks,<br>
>> >>>>>> -Chris<br>
>> >>>>>><br>
>> >>>>>><br>
>> >>>>>><br>
>> >>>>>> _______________________________________________<br>
>> >>>>>> datatable-help mailing list<br>
>> >>>>>> <a href="mailto:datatable-help@lists.r-forge.r-project.org">datatable-help@lists.r-forge.r-project.org</a><br>
>> >>>>>><br>
>> >>>>>><br>
>> <a href="https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help" target="_blank">https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help</a><br>
>> >>>>>><br>
>> >>>>>><br>
>> >>>>><br>
>> >>>><br>
>> >>><br>
>> >><br>
>> > _______________________________________________<br>
>> > datatable-help mailing list<br>
>> > <a href="mailto:datatable-help@lists.r-forge.r-project.org">datatable-help@lists.r-forge.r-project.org</a><br>
>> ><br>
>> <a href="https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help" target="_blank">https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help</a><br>
>> ><br>
>><br>
>><br>
>><br>
><br>
<br>
<br>
</div></div></blockquote></div><br>