[datatable-help] New function fread() in v1.8.7

patricknic patricknic at gmail.com
Sat Jan 5 22:05:43 CET 2013


Hit a snag reading some imperfect data. I'm not sure what it was exported
from, but the file has some lines with consecutive quotation marks (i.e., a
character field actually contained quotation marks before it was written to
a text file). Not sure if this is a known issue. A reproducible example:

text <- paste(rep(c('a,b,c,d,e,f\na,b,c,"d",e,f\na,b,c,""d"",e,f'), 10000),
collapse="\n")
f <- tempfile()
writeLines(text, f)

df <- read.table(f, sep=",")
dt <- fread(f, sep=",", header=FALSE)


No error for read.table, but I get this error for fread: 

Error in fread(f, sep = ",", header = FALSE) : 
  Expected sep (',') but 'd' ends field 4 on line 30 when detecting types:
a,b,c,""d


This also gave me an idea for a suggestion: text replacement in readfile.c.
(I'm no C programmer, so I don't know if this would be more trouble than
it's worth. Also, not sure if it is in your project scope.) An R mock-up
(still using fread) of this would be something like:

freadWrapper <- function(input=f, eliminate='"', ...) {
  A <- readLines(f)
  B <- gsub(eliminate, "", A)
  C <- paste(B, collapse="\n")
  fread(C, ...)
}
freadWrapper(f, sep=",", stringsAsFactors=FALSE, header=FALSE)





--
View this message in context: http://r.789695.n4.nabble.com/New-function-fread-in-v1-8-7-tp4653745p4654754.html
Sent from the datatable-help mailing list archive at Nabble.com.


More information about the datatable-help mailing list