[datatable-help] Variable labels suggestion

Griffith Rees griffith.rees at sociology.ox.ac.uk
Fri Jul 29 00:56:07 CEST 2011


Indeed, making such labels useful is only is highly dependent on their
ability to be used with functions like toLatex. I think the first step
would be to provide a way of adding labels and then consider functions
that could help use them in formatting contexts, but kind of leave the
last mile up to users for the time being. If it catches on, people
will start to write wrappers that do the extra work.

For example: the mtable function, which is what I primarily use to
format tables for latex, can be used with the relabel function (also
from the memisc package) to replace variable names in tables (see the
relabel example in:
http://www.oga-lab.net/RGM2/func.php?rd_id=memisc:mtable). A method
which returns those labels appropriately could be called directly when
mtable is used. It's not the prettiest solution, but it's a start.

Obviously there's a mindshare aspect to this: the more people using
data.table and find variable labels useful, the more likely they are
to alter other packages to allow them to take advantage of those
labels. The way to accrue that advantage is to make it simple but
useful initially, and then wrappers can be added to make better use of
it. Obviously, the prior art in the Hmisc package failed to garner
enough mindshare for it to be used in other contexts, and data.table
succeeds here by retaining interoperability with everything else.

I know the first thing I would probably do: write a wrapper around
read.dta which would read a stata file and return a data.table with
the stata labels.

just an idea. Oh and an optimized data.table save format as well but
that's icing ;)

-griff

On Thu, Jul 28, 2011 at 8:11 PM, Matthew Dowle <mdowle at mdowle.plus.com> wrote:
>
> The toLatex aspect struck a chord. I sometimes embed the string 'PCT'
> into the column name and then gsub("PCT","\%") just before output to
> latex. Maybe a label would be more robust and could allow more complex
> latex expressions in the column heading.  Long column names with spaces
> are ok, but that may make it cumbersome to follow the advice to use
> names not positions in j expressions.  But how would the latex output
> command know to use the labels rather than the names? And would
> data.table need to know about column labels to carry them through
> subsets and joins etc?
>
> Matthew
>
>
> On Thu, 2011-07-28 at 13:51 -0400, Chris Neff wrote:
>> I think this is definitely out of the scope of data.table.
>>
>> On 28 July 2011 13:43, Tom Short <tshort.rlists at gmail.com> wrote:
>>         On Thu, Jul 28, 2011 at 8:26 AM, Griffith Rees
>>         <griffith.rees at sociology.ox.ac.uk> wrote:
>>         > I think this page quite succinctly describes this issue:
>>         > http://www.statmethods.net/input/variablelables.html
>>
>>
>>         It would be easy to add to data.table. You could also add
>>         support
>>         outside of data.table by writing label.data.table and similar
>>         functions. Actually using the labels for useful things is more
>>         difficult. I often find it useful just to use more verbose
>>         variable
>>         names that include spaces as follows:
>>
>>         > dt <- data.table(`My first column` = 1:3, `A character
>>         column` = letters[1:3], check.names = FALSE)
>>         > str(dt)
>>         Classes 'data.table' and 'data.frame':  3 obs. of  2
>>         variables:
>>          $ My first column   : int  1 2 3
>>          $ A character column: Factor w/ 3 levels "a","b","c": 1 2 3
>>
>>         That way, columns look better with automatic plotting and with
>>         lattice
>>         or ggplot legends.
>>
>>         - Tom
>>
>>         _______________________________________________
>>         datatable-help mailing list
>>         datatable-help at lists.r-forge.r-project.org
>>         https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
>>
>>
>> _______________________________________________
>> datatable-help mailing list
>> datatable-help at lists.r-forge.r-project.org
>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
>
>
>



-- 
Griffith Rees
Sociology DPhil Candidate
Oxford University
CABDyN Complexity Centre
http://www.cabdyn.ox.ac.uk


More information about the datatable-help mailing list