[datatable-help] Coercian to character

Matthew Dowle mdowle at mdowle.plus.com
Sun Apr 15 19:20:13 CEST 2012


I thought I'd added something to FAQ 2.17 about that, but seems not.
Will add, thanks. Maybe I only wrote it up in the comment when closing
the related feature request. It's deliberately different since my guess
is that most people most of the time (now) want characters left as
characters and keep setting stringsAsFactors to FALSE. Think the default
for data.frame was TRUE as a hang over from old versions of R before the
global string cache was added.

It's not set in stone though so could be changed. In particular there
could be global default like we've done for other arguments so you could
change the default if need be.

It won't cause a compatibility issue (same as other differences in faq
2.17) or any issues down the road as far I can think, but let me know if
you think of anything.

Matthew

On Sun, 2012-04-15 at 04:40 -0500, Damian Betebenner wrote:
> I started having character vectors popping up in places I never had before but upon further investigation that turned out to be an issue with my own setup, not data.table.
> 
> With regard to characters (and data.tables ability to handle them as a key now), I did notice that data.table and data.frame default to using
> stringsAsFactors differently:
> 
> DF <- data.frame(X=letters[1:10], Y=rnorm(10))
> sapply(DF, class)
> 
>         X         Y 
>  "factor" "numeric"
> 
> DT <- data.table(X=letters[1:10], Y=rnorm(10)) 
> sapply(DT, class)
> 
> > DT <- data.table(X=rep(letters[1:10], each=2), Y=rnorm(20)) 
> > sapply(DT, class)
>           X           Y 
> "character"   "numeric"
> 
> 
> Will this inconsistency cause problems down the road?
> 
> Thanks for all your help,
> 
> Damian
> 
> 
> Damian Betebenner
> Center for Assessment
> PO Box 351
> Dover, NH   03821-0351
>  
> Phone (office): (603) 516-7900
> Phone (cell): (857) 234-2474
> Fax: (603) 516-7910
> 
> dbetebenner at nciea.org
> www.nciea.org
> 
> 
> 
> 
> -----Original Message-----
> From: Matthew Dowle [mailto:mdowlenoreply at virginmedia.com] On Behalf Of Matthew Dowle
> Sent: Thursday, April 12, 2012 5:50 PM
> To: Damian Betebenner
> Cc: datatable-help at lists.r-forge.r-project.org
> Subject: Re: [datatable-help] Coercian to character
> 
> It shouldn't coerce. What makes you think it does?
> 
> > DT = data.table(a=factor(c("a","b","b","c")),b=1:4)
> > DT[,sum(b),by=a]
>      a V1
> [1,] a  1
> [2,] b  5
> [3,] c  4
> > str(DT[,sum(b),by=a])
> Classes ‘data.table’ and 'data.frame':	3 obs. of  2 variables:
>  $ a : Factor w/ 3 levels "a","b","c": 1 2 3  $ V1: int  1 5 4
> 
> 
> 
> On Thu, 2012-04-12 at 14:57 -0500, Damian Betebenner wrote:
> > Data tablers
> > 
> >  
> > 
> > Does data.table now coerce factors to character variables when doing 
> > by summaries?
> > 
> >  
> > 
> > If so, is there any way to not allow this coercion?
> > 
> >  
> > 
> > Thanks,
> > 
> >  
> > 
> > Damian Betebenner
> > 
> > Center for Assessment
> > 
> > PO Box 351
> > 
> > Dover, NH   03821-0351
> > 
> >  
> > 
> > Phone (office): (603) 516-7900
> > 
> > Phone (cell): (857) 234-2474
> > 
> > Fax: (603) 516-7910
> > 
> >  
> > 
> > dbetebenner at nciea.org
> > 
> > www.nciea.org
> > 
> >  
> > 
> >  
> > 
> >  
> > 
> > 
> > _______________________________________________
> > datatable-help mailing list
> > datatable-help at lists.r-forge.r-project.org
> > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable
> > -help
> 
> 




More information about the datatable-help mailing list