[datatable-help] Fwd: fread: Handling NAs with ",," not working?

Matthew Dowle mdowle at mdowle.plus.com
Fri Jan 4 22:07:51 CET 2013


Ok this is fixed and committed (786) with your example as tests.
Thanks again.

On 04.01.2013 09:38, Matthew Dowle wrote:
> Great, thanks for this. I reproduced and fixed and will commit later
> tonight.  It was the part of the code that loops through the header
> row to test if it contains column names (if every field is 
> character).
>
> On 03.01.2013 20:50, Akhil Behl wrote:
>> So, here is a `head' of my dataset. Note the `,,' in the 2nd last 
>> column.
>>
>> 
>> 02-FEB-2009,09:55:04:962,26022009,2500,PE,36,500,44,200,11850,1100,,2865.60
>> 
>> 02-FEB-2009,09:55:04:987,26022009,2800,PE,108.75,200,111,50,11700,1450,,2865.60
>> 
>> 02-FEB-2009,09:55:04:939,26022009,3100,CE,31.1,3000,36.55,200,3500,5250,,2865.60
>> 
>> 02-FEB-2009,09:55:04:989,26022009,2600,PE,52.05,500,57,400,16050,1150,,2865.60
>> 
>> 02-FEB-2009,09:55:04:981,26022009,3000,CE,56.25,2000,67,150,21500,13750,,2865.60
>> 
>> 02-FEB-2009,09:55:04:991,26022009,2900,CE,81,1000,100,100,18100,4550,1000,2865.60
>> 
>> 02-FEB-2009,09:55:04:953,26022009,2800,CE,150,50,159.7,5000,13400,15500,,2865.60
>> 
>> 02-FEB-2009,09:55:04:987,26022009,2700,PE,72.15,3000,79,50,19200,5100,,2865.60
>> 
>> 02-FEB-2009,09:55:04:615,26022009,2450,CE,256.9,500,678,500,500,500,,2865.60
>> 
>> 02-FEB-2009,09:55:04:894,26022009,3300,CE,6,7000,10.8,2000,7000,2550,,2865.60
>>
>> The documentation says that ",," should be read as "". But instead 
>> the
>> function throws an error (one I can not understand). See here:
>>
>> R> library(data.table)
>> data.table 1.8.7  For help type: help("data.table")
>>
>> R> tt <- fread("sample.csv", verbose=TRUE)
>>
>> Detected eol as \n only (no \r afterwards), the UNIX and Mac 
>> standard.
>> Starting format detection on line 30 (the last non blank line in the
>> first 30)
>> Detected sep as ',' and 13 columns
>> Type codes: 3300320200002
>> Found first row with 13 fields occuring on line 1 (either column 
>> names
>> or first row of data)
>> Error in fread("sample.csv", verbose = TRUE) : Unexpected character 
>> (
>> 02-F) ending field 12 of line 1
>>
>> Using na.strings="" does not work either. But I guess that should 
>> not
>> have made a difference anyway?
>>
>> Then I opened the file in GVim and converted all `,,' to `,NA,' and
>> re-read the file. This time it works.
>>
>> R> tt <- fread("sample-with-NA.csv", verbose=TRUE)
>>
>> Detected eol as \n only (no \r afterwards), the UNIX and Mac 
>> standard.
>> Starting format detection on line 30 (the last non blank line in the
>> first 30)
>> Detected sep as ',' and 13 columns
>> Type codes: 3300320200002
>> Found first row with 13 fields occuring on line 1 (either column 
>> names
>> or first row of data)
>> The first data row has some non character fields. Treating as a data
>> row and using default column names.
>> Count of eol after pos: 101
>> Subtracted 1 for last eol and any trailing empty lines, leaving 100 
>> data rows
>>    0.000s (  6%) Memory map (quicker if you rerun)
>>    0.000s ( 40%) Format detection
>>    0.000s (  7%) Count rows (wc -l)
>>    0.000s (  2%) Allocation of 100x13 result (xMB) in RAM
>>    0.000s ( 41%) Reading data
>>    0.000s (  0%) Bumping column type midread and coercing data 
>> already read
>>    0.000s (  3%) Changing na.strings to NA
>>    0.001s        Total
>>
>> I've attached a 100 row sample.csv and a sample-with-NA.csv here for
>> you to replicate the issue.
>>
>> Maybe, it is just that I am missing something. Can you explain?
>>
>> Thanks a lot!
>>
>> --
>> ASB.



More information about the datatable-help mailing list