[datatable-help] Fwd: fread: Handling NAs with ",," not working?

Matthew Dowle mdowle at mdowle.plus.com
Fri Jan 4 10:38:12 CET 2013


Great, thanks for this. I reproduced and fixed and will commit later
tonight.  It was the part of the code that loops through the header
row to test if it contains column names (if every field is character).

On 03.01.2013 20:50, Akhil Behl wrote:
> So, here is a `head' of my dataset. Note the `,,' in the 2nd last 
> column.
>
> 
> 02-FEB-2009,09:55:04:962,26022009,2500,PE,36,500,44,200,11850,1100,,2865.60
> 
> 02-FEB-2009,09:55:04:987,26022009,2800,PE,108.75,200,111,50,11700,1450,,2865.60
> 
> 02-FEB-2009,09:55:04:939,26022009,3100,CE,31.1,3000,36.55,200,3500,5250,,2865.60
> 
> 02-FEB-2009,09:55:04:989,26022009,2600,PE,52.05,500,57,400,16050,1150,,2865.60
> 
> 02-FEB-2009,09:55:04:981,26022009,3000,CE,56.25,2000,67,150,21500,13750,,2865.60
> 
> 02-FEB-2009,09:55:04:991,26022009,2900,CE,81,1000,100,100,18100,4550,1000,2865.60
> 
> 02-FEB-2009,09:55:04:953,26022009,2800,CE,150,50,159.7,5000,13400,15500,,2865.60
> 
> 02-FEB-2009,09:55:04:987,26022009,2700,PE,72.15,3000,79,50,19200,5100,,2865.60
> 
> 02-FEB-2009,09:55:04:615,26022009,2450,CE,256.9,500,678,500,500,500,,2865.60
> 
> 02-FEB-2009,09:55:04:894,26022009,3300,CE,6,7000,10.8,2000,7000,2550,,2865.60
>
> The documentation says that ",," should be read as "". But instead 
> the
> function throws an error (one I can not understand). See here:
>
> R> library(data.table)
> data.table 1.8.7  For help type: help("data.table")
>
> R> tt <- fread("sample.csv", verbose=TRUE)
>
> Detected eol as \n only (no \r afterwards), the UNIX and Mac 
> standard.
> Starting format detection on line 30 (the last non blank line in the
> first 30)
> Detected sep as ',' and 13 columns
> Type codes: 3300320200002
> Found first row with 13 fields occuring on line 1 (either column 
> names
> or first row of data)
> Error in fread("sample.csv", verbose = TRUE) : Unexpected character (
> 02-F) ending field 12 of line 1
>
> Using na.strings="" does not work either. But I guess that should not
> have made a difference anyway?
>
> Then I opened the file in GVim and converted all `,,' to `,NA,' and
> re-read the file. This time it works.
>
> R> tt <- fread("sample-with-NA.csv", verbose=TRUE)
>
> Detected eol as \n only (no \r afterwards), the UNIX and Mac 
> standard.
> Starting format detection on line 30 (the last non blank line in the
> first 30)
> Detected sep as ',' and 13 columns
> Type codes: 3300320200002
> Found first row with 13 fields occuring on line 1 (either column 
> names
> or first row of data)
> The first data row has some non character fields. Treating as a data
> row and using default column names.
> Count of eol after pos: 101
> Subtracted 1 for last eol and any trailing empty lines, leaving 100 
> data rows
>    0.000s (  6%) Memory map (quicker if you rerun)
>    0.000s ( 40%) Format detection
>    0.000s (  7%) Count rows (wc -l)
>    0.000s (  2%) Allocation of 100x13 result (xMB) in RAM
>    0.000s ( 41%) Reading data
>    0.000s (  0%) Bumping column type midread and coercing data 
> already read
>    0.000s (  3%) Changing na.strings to NA
>    0.001s        Total
>
> I've attached a 100 row sample.csv and a sample-with-NA.csv here for
> you to replicate the issue.
>
> Maybe, it is just that I am missing something. Can you explain?
>
> Thanks a lot!
>
> --
> ASB.



More information about the datatable-help mailing list