[datatable-help] Reading corrupt csv and replace wrong value

Allan Engelhardt allane at cybaea.com
Sat Jun 18 01:11:14 CEST 2011


Probably easier to change it outside of R, e.g.

perl -pe 's{0742076391\?39524}{whatever}g' file > newfile

but you may want to check that it really is a '?' character and not just 
printed that way.

You could of course write this in R along the lines of

while (length(line <- readLines(in, 1L)) > 0) {
   line <- sub("0,0742076391?39524", "whatever", line, fixed = TRUE)
   writeLines(line, out)
}

for suitable connections in and out.

HTH

Allan

On 16/06/11 22:40, DanMik wrote:
> Im fairly new to R.
>
> I have a huge csv file, of 400.000+ K, and now it looks like one of the
> values is corrupt. (it contains a ?, so one value becomes:
> "0,0742076391?39524")
> Because of the size i can't edit it in a text editor, and the file took
> several days to create (many calculations)
>
> When i read the file it cant be converted to numbers because of this one
> value which i found with scan() and have found the coordinates of.
>
> I'm reading the file with:
>
> x<- read.csv2("filename.csv", stringsAsFactor= FALSE)
>
> Can i read the file with everything as numeric, and replace non numeric
> values with 0 ?
>
> or somehow correct this one value?
>
> I have tried first reading the file, then set the value to 0 and then use
> as.matrix and afterwards as.numeric. This just creates a lot of NA
>
> --
> View this message in context: http://r.789695.n4.nabble.com/Reading-corrupt-csv-and-replace-wrong-value-tp3603848p3603848.html
> Sent from the datatable-help mailing list archive at Nabble.com.
> _______________________________________________
> datatable-help mailing list
> datatable-help at lists.r-forge.r-project.org
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help


More information about the datatable-help mailing list