[datatable-help] Fwd: fread: Handling NAs with ",," not working?

Akhil Behl akhil at igidr.ac.in
Thu Jan 3 21:50:11 CET 2013


So, here is a `head' of my dataset. Note the `,,' in the 2nd last column.

02-FEB-2009,09:55:04:962,26022009,2500,PE,36,500,44,200,11850,1100,,2865.60
02-FEB-2009,09:55:04:987,26022009,2800,PE,108.75,200,111,50,11700,1450,,2865.60
02-FEB-2009,09:55:04:939,26022009,3100,CE,31.1,3000,36.55,200,3500,5250,,2865.60
02-FEB-2009,09:55:04:989,26022009,2600,PE,52.05,500,57,400,16050,1150,,2865.60
02-FEB-2009,09:55:04:981,26022009,3000,CE,56.25,2000,67,150,21500,13750,,2865.60
02-FEB-2009,09:55:04:991,26022009,2900,CE,81,1000,100,100,18100,4550,1000,2865.60
02-FEB-2009,09:55:04:953,26022009,2800,CE,150,50,159.7,5000,13400,15500,,2865.60
02-FEB-2009,09:55:04:987,26022009,2700,PE,72.15,3000,79,50,19200,5100,,2865.60
02-FEB-2009,09:55:04:615,26022009,2450,CE,256.9,500,678,500,500,500,,2865.60
02-FEB-2009,09:55:04:894,26022009,3300,CE,6,7000,10.8,2000,7000,2550,,2865.60

The documentation says that ",," should be read as "". But instead the
function throws an error (one I can not understand). See here:

R> library(data.table)
data.table 1.8.7  For help type: help("data.table")

R> tt <- fread("sample.csv", verbose=TRUE)

Detected eol as \n only (no \r afterwards), the UNIX and Mac standard.
Starting format detection on line 30 (the last non blank line in the first 30)
Detected sep as ',' and 13 columns
Type codes: 3300320200002
Found first row with 13 fields occuring on line 1 (either column names
or first row of data)
Error in fread("sample.csv", verbose = TRUE) : Unexpected character (
02-F) ending field 12 of line 1

Using na.strings="" does not work either. But I guess that should not
have made a difference anyway?

Then I opened the file in GVim and converted all `,,' to `,NA,' and
re-read the file. This time it works.

R> tt <- fread("sample-with-NA.csv", verbose=TRUE)

Detected eol as \n only (no \r afterwards), the UNIX and Mac standard.
Starting format detection on line 30 (the last non blank line in the first 30)
Detected sep as ',' and 13 columns
Type codes: 3300320200002
Found first row with 13 fields occuring on line 1 (either column names
or first row of data)
The first data row has some non character fields. Treating as a data
row and using default column names.
Count of eol after pos: 101
Subtracted 1 for last eol and any trailing empty lines, leaving 100 data rows
   0.000s (  6%) Memory map (quicker if you rerun)
   0.000s ( 40%) Format detection
   0.000s (  7%) Count rows (wc -l)
   0.000s (  2%) Allocation of 100x13 result (xMB) in RAM
   0.000s ( 41%) Reading data
   0.000s (  0%) Bumping column type midread and coercing data already read
   0.000s (  3%) Changing na.strings to NA
   0.001s        Total

I've attached a 100 row sample.csv and a sample-with-NA.csv here for
you to replicate the issue.

Maybe, it is just that I am missing something. Can you explain?

Thanks a lot!

--
ASB.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: sample.csv
Type: text/csv
Size: 7823 bytes
Desc: not available
URL: <http://lists.r-forge.r-project.org/pipermail/datatable-help/attachments/20130104/f187f30f/attachment-0002.csv>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: sample-with-NA.csv
Type: text/csv
Size: 7953 bytes
Desc: not available
URL: <http://lists.r-forge.r-project.org/pipermail/datatable-help/attachments/20130104/f187f30f/attachment-0003.csv>


More information about the datatable-help mailing list