[datatable-help] Variable labels suggestion
Matthew Dowle
mdowle at mdowle.plus.com
Tue Aug 9 14:42:46 CEST 2011
Agreed, but that seems easier said than done. How does a global R issue such
as this, get done? I for one do not relish posting to r-devel.
"Joseph Voelkel" <jgvcqa at rit.edu> wrote in message
news:70EFCDD908F9264785FA08EC3A4713202C8E982996 at ex02mail01.ad.rit.edu...
> This seems to be outside the scope of data.table. It is really a global R
> issue, and one that should be addressed at that level (for example,
> natural addition of these attributes to data frames (and of course data
> tables :) ), with easy usage in functions such as plot.
>
> -----Original Message-----
> From: datatable-help-bounces at r-forge.wu-wien.ac.at
> [mailto:datatable-help-bounces at r-forge.wu-wien.ac.at] On Behalf Of Bacou,
> Melanie
> Sent: Friday, July 29, 2011 12:16 AM
> To: 'Griffith Rees'; mdowle at mdowle.plus.com
> Cc: datatable-help at r-forge.wu-wien.ac.at
> Subject: Re: [datatable-help] Variable labels suggestion
>
> Griff, Matt,
>
> I agree that codebook support or more generally support for maintaining
> meta-data is very poor in R. I also use Hmisc and end up maintaining my
> codebook in separate files. Often times I need to carry over not just
> variable labels, but also units, type, category, etc..
>
> I'm forced to use inefficient and wordy procedures, the likes of:
>
> ## Add variable labels and units from codebook file (usually some dump
> from STATA)
> i <- 1
> for (x in names(df)) {
> label(df[, x]) <- codebook [i, "varName"]
> units(df[, x]) <- codebook [i, "varUnit"]
> type(df[, x]) <- codebook [i, "varType"]
> i <- i + 1
> }
>
> [...some variable recoding...]
>
> ## Save codebook to CSV
> codebook <- data.frame(names(df), label(df), sapply(df, units), sapply(df,
> type))
> names(codebook) <- c("varCode", "varName", "varUnit", "varType")
> write.csv(codebook, file="codebook.csv")
>
> Any optimization for data.table that would facilitate read/write of
> meta-data would make a lot sense.
>
> --Mel.
>
>
>
>
> -----Original Message-----
> From: datatable-help-bounces at r-forge.wu-wien.ac.at
> [mailto:datatable-help-bounces at r-forge.wu-wien.ac.at] On Behalf Of
> Griffith Rees
> Sent: Thursday, July 28, 2011 6:56 PM
> To: mdowle at mdowle.plus.com
> Cc: datatable-help at r-forge.wu-wien.ac.at
> Subject: Re: [datatable-help] Variable labels suggestion
>
> Indeed, making such labels useful is only is highly dependent on their
> ability to be used with functions like toLatex. I think the first step
> would be to provide a way of adding labels and then consider functions
> that could help use them in formatting contexts, but kind of leave the
> last mile up to users for the time being. If it catches on, people
> will start to write wrappers that do the extra work.
>
> For example: the mtable function, which is what I primarily use to
> format tables for latex, can be used with the relabel function (also
> from the memisc package) to replace variable names in tables (see the
> relabel example in:
> http://www.oga-lab.net/RGM2/func.php?rd_id=memisc:mtable). A method
> which returns those labels appropriately could be called directly when
> mtable is used. It's not the prettiest solution, but it's a start.
>
> Obviously there's a mindshare aspect to this: the more people using
> data.table and find variable labels useful, the more likely they are
> to alter other packages to allow them to take advantage of those
> labels. The way to accrue that advantage is to make it simple but
> useful initially, and then wrappers can be added to make better use of
> it. Obviously, the prior art in the Hmisc package failed to garner
> enough mindshare for it to be used in other contexts, and data.table
> succeeds here by retaining interoperability with everything else.
>
> I know the first thing I would probably do: write a wrapper around
> read.dta which would read a stata file and return a data.table with
> the stata labels.
>
> just an idea. Oh and an optimized data.table save format as well but
> that's icing ;)
>
> -griff
>
> On Thu, Jul 28, 2011 at 8:11 PM, Matthew Dowle <mdowle at mdowle.plus.com>
> wrote:
>>
>> The toLatex aspect struck a chord. I sometimes embed the string 'PCT'
>> into the column name and then gsub("PCT","\%") just before output to
>> latex. Maybe a label would be more robust and could allow more complex
>> latex expressions in the column heading. Long column names with spaces
>> are ok, but that may make it cumbersome to follow the advice to use
>> names not positions in j expressions. But how would the latex output
>> command know to use the labels rather than the names? And would
>> data.table need to know about column labels to carry them through
>> subsets and joins etc?
>>
>> Matthew
>>
>>
>> On Thu, 2011-07-28 at 13:51 -0400, Chris Neff wrote:
>>> I think this is definitely out of the scope of data.table.
>>>
>>> On 28 July 2011 13:43, Tom Short <tshort.rlists at gmail.com> wrote:
>>> On Thu, Jul 28, 2011 at 8:26 AM, Griffith Rees
>>> <griffith.rees at sociology.ox.ac.uk> wrote:
>>> > I think this page quite succinctly describes this issue:
>>> > http://www.statmethods.net/input/variablelables.html
>>>
>>>
>>> It would be easy to add to data.table. You could also add
>>> support
>>> outside of data.table by writing label.data.table and similar
>>> functions. Actually using the labels for useful things is more
>>> difficult. I often find it useful just to use more verbose
>>> variable
>>> names that include spaces as follows:
>>>
>>> > dt <- data.table(`My first column` = 1:3, `A character
>>> column` = letters[1:3], check.names = FALSE)
>>> > str(dt)
>>> Classes 'data.table' and 'data.frame': 3 obs. of 2
>>> variables:
>>> $ My first column : int 1 2 3
>>> $ A character column: Factor w/ 3 levels "a","b","c": 1 2 3
>>>
>>> That way, columns look better with automatic plotting and with
>>> lattice
>>> or ggplot legends.
>>>
>>> - Tom
>>>
>>> _______________________________________________
>>> datatable-help mailing list
>>> datatable-help at lists.r-forge.r-project.org
>>>
>>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
>>>
>>>
>>> _______________________________________________
>>> datatable-help mailing list
>>> datatable-help at lists.r-forge.r-project.org
>>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
>>
>>
>>
>
>
>
> --
> Griffith Rees
> Sociology DPhil Candidate
> Oxford University
> CABDyN Complexity Centre
> http://www.cabdyn.ox.ac.uk
> _______________________________________________
> datatable-help mailing list
> datatable-help at lists.r-forge.r-project.org
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
>
> _______________________________________________
> datatable-help mailing list
> datatable-help at lists.r-forge.r-project.org
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
More information about the datatable-help
mailing list