[datatable-help] Variable labels suggestion
Joseph Voelkel
jgvcqa at rit.edu
Fri Jul 29 16:25:31 CEST 2011
This seems to be outside the scope of data.table. It is really a global R issue, and one that should be addressed at that level (for example, natural addition of these attributes to data frames (and of course data tables :) ), with easy usage in functions such as plot.
-----Original Message-----
From: datatable-help-bounces at r-forge.wu-wien.ac.at [mailto:datatable-help-bounces at r-forge.wu-wien.ac.at] On Behalf Of Bacou, Melanie
Sent: Friday, July 29, 2011 12:16 AM
To: 'Griffith Rees'; mdowle at mdowle.plus.com
Cc: datatable-help at r-forge.wu-wien.ac.at
Subject: Re: [datatable-help] Variable labels suggestion
Griff, Matt,
I agree that codebook support or more generally support for maintaining meta-data is very poor in R. I also use Hmisc and end up maintaining my codebook in separate files. Often times I need to carry over not just variable labels, but also units, type, category, etc..
I'm forced to use inefficient and wordy procedures, the likes of:
## Add variable labels and units from codebook file (usually some dump from STATA)
i <- 1
for (x in names(df)) {
label(df[, x]) <- codebook [i, "varName"]
units(df[, x]) <- codebook [i, "varUnit"]
type(df[, x]) <- codebook [i, "varType"]
i <- i + 1
}
[...some variable recoding...]
## Save codebook to CSV
codebook <- data.frame(names(df), label(df), sapply(df, units), sapply(df, type))
names(codebook) <- c("varCode", "varName", "varUnit", "varType")
write.csv(codebook, file="codebook.csv")
Any optimization for data.table that would facilitate read/write of meta-data would make a lot sense.
--Mel.
-----Original Message-----
From: datatable-help-bounces at r-forge.wu-wien.ac.at [mailto:datatable-help-bounces at r-forge.wu-wien.ac.at] On Behalf Of Griffith Rees
Sent: Thursday, July 28, 2011 6:56 PM
To: mdowle at mdowle.plus.com
Cc: datatable-help at r-forge.wu-wien.ac.at
Subject: Re: [datatable-help] Variable labels suggestion
Indeed, making such labels useful is only is highly dependent on their
ability to be used with functions like toLatex. I think the first step
would be to provide a way of adding labels and then consider functions
that could help use them in formatting contexts, but kind of leave the
last mile up to users for the time being. If it catches on, people
will start to write wrappers that do the extra work.
For example: the mtable function, which is what I primarily use to
format tables for latex, can be used with the relabel function (also
from the memisc package) to replace variable names in tables (see the
relabel example in:
http://www.oga-lab.net/RGM2/func.php?rd_id=memisc:mtable). A method
which returns those labels appropriately could be called directly when
mtable is used. It's not the prettiest solution, but it's a start.
Obviously there's a mindshare aspect to this: the more people using
data.table and find variable labels useful, the more likely they are
to alter other packages to allow them to take advantage of those
labels. The way to accrue that advantage is to make it simple but
useful initially, and then wrappers can be added to make better use of
it. Obviously, the prior art in the Hmisc package failed to garner
enough mindshare for it to be used in other contexts, and data.table
succeeds here by retaining interoperability with everything else.
I know the first thing I would probably do: write a wrapper around
read.dta which would read a stata file and return a data.table with
the stata labels.
just an idea. Oh and an optimized data.table save format as well but
that's icing ;)
-griff
On Thu, Jul 28, 2011 at 8:11 PM, Matthew Dowle <mdowle at mdowle.plus.com> wrote:
>
> The toLatex aspect struck a chord. I sometimes embed the string 'PCT'
> into the column name and then gsub("PCT","\%") just before output to
> latex. Maybe a label would be more robust and could allow more complex
> latex expressions in the column heading. Long column names with spaces
> are ok, but that may make it cumbersome to follow the advice to use
> names not positions in j expressions. But how would the latex output
> command know to use the labels rather than the names? And would
> data.table need to know about column labels to carry them through
> subsets and joins etc?
>
> Matthew
>
>
> On Thu, 2011-07-28 at 13:51 -0400, Chris Neff wrote:
>> I think this is definitely out of the scope of data.table.
>>
>> On 28 July 2011 13:43, Tom Short <tshort.rlists at gmail.com> wrote:
>> On Thu, Jul 28, 2011 at 8:26 AM, Griffith Rees
>> <griffith.rees at sociology.ox.ac.uk> wrote:
>> > I think this page quite succinctly describes this issue:
>> > http://www.statmethods.net/input/variablelables.html
>>
>>
>> It would be easy to add to data.table. You could also add
>> support
>> outside of data.table by writing label.data.table and similar
>> functions. Actually using the labels for useful things is more
>> difficult. I often find it useful just to use more verbose
>> variable
>> names that include spaces as follows:
>>
>> > dt <- data.table(`My first column` = 1:3, `A character
>> column` = letters[1:3], check.names = FALSE)
>> > str(dt)
>> Classes 'data.table' and 'data.frame': 3 obs. of 2
>> variables:
>> $ My first column : int 1 2 3
>> $ A character column: Factor w/ 3 levels "a","b","c": 1 2 3
>>
>> That way, columns look better with automatic plotting and with
>> lattice
>> or ggplot legends.
>>
>> - Tom
>>
>> _______________________________________________
>> datatable-help mailing list
>> datatable-help at lists.r-forge.r-project.org
>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
>>
>>
>> _______________________________________________
>> datatable-help mailing list
>> datatable-help at lists.r-forge.r-project.org
>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
>
>
>
--
Griffith Rees
Sociology DPhil Candidate
Oxford University
CABDyN Complexity Centre
http://www.cabdyn.ox.ac.uk
_______________________________________________
datatable-help mailing list
datatable-help at lists.r-forge.r-project.org
https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
_______________________________________________
datatable-help mailing list
datatable-help at lists.r-forge.r-project.org
https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
More information about the datatable-help
mailing list