[datatable-help] Fwd: fread: Handling NAs with ", , " not working?

Akhil Behl akhil at igidr.ac.in
Fri Jan 4 22:28:39 CET 2013


Thank you. :)

On Sat, Jan 5, 2013 at 2:37 AM, Matthew Dowle <mdowle at mdowle.plus.com> wrote:
>
> Ok this is fixed and committed (786) with your example as tests.
> Thanks again.
>
>
> On 04.01.2013 09:38, Matthew Dowle wrote:
>>
>> Great, thanks for this. I reproduced and fixed and will commit later
>> tonight.  It was the part of the code that loops through the header
>> row to test if it contains column names (if every field is character).
>>
>> On 03.01.2013 20:50, Akhil Behl wrote:
>>>
>>> So, here is a `head' of my dataset. Note the `,,' in the 2nd last column.
>>>
>>>
>>>
>>> 02-FEB-2009,09:55:04:962,26022009,2500,PE,36,500,44,200,11850,1100,,2865.60
>>>
>>>
>>> 02-FEB-2009,09:55:04:987,26022009,2800,PE,108.75,200,111,50,11700,1450,,2865.60
>>>
>>>
>>> 02-FEB-2009,09:55:04:939,26022009,3100,CE,31.1,3000,36.55,200,3500,5250,,2865.60
>>>
>>>
>>> 02-FEB-2009,09:55:04:989,26022009,2600,PE,52.05,500,57,400,16050,1150,,2865.60
>>>
>>>
>>> 02-FEB-2009,09:55:04:981,26022009,3000,CE,56.25,2000,67,150,21500,13750,,2865.60
>>>
>>>
>>> 02-FEB-2009,09:55:04:991,26022009,2900,CE,81,1000,100,100,18100,4550,1000,2865.60
>>>
>>>
>>> 02-FEB-2009,09:55:04:953,26022009,2800,CE,150,50,159.7,5000,13400,15500,,2865.60
>>>
>>>
>>> 02-FEB-2009,09:55:04:987,26022009,2700,PE,72.15,3000,79,50,19200,5100,,2865.60
>>>
>>>
>>> 02-FEB-2009,09:55:04:615,26022009,2450,CE,256.9,500,678,500,500,500,,2865.60
>>>
>>>
>>> 02-FEB-2009,09:55:04:894,26022009,3300,CE,6,7000,10.8,2000,7000,2550,,2865.60
>>>
>>> The documentation says that ",," should be read as "". But instead the
>>> function throws an error (one I can not understand). See here:
>>>
>>> R> library(data.table)
>>> data.table 1.8.7  For help type: help("data.table")
>>>
>>> R> tt <- fread("sample.csv", verbose=TRUE)
>>>
>>> Detected eol as \n only (no \r afterwards), the UNIX and Mac standard.
>>> Starting format detection on line 30 (the last non blank line in the
>>> first 30)
>>> Detected sep as ',' and 13 columns
>>> Type codes: 3300320200002
>>> Found first row with 13 fields occuring on line 1 (either column names
>>> or first row of data)
>>> Error in fread("sample.csv", verbose = TRUE) : Unexpected character (
>>> 02-F) ending field 12 of line 1
>>>
>>> Using na.strings="" does not work either. But I guess that should not
>>> have made a difference anyway?
>>>
>>> Then I opened the file in GVim and converted all `,,' to `,NA,' and
>>> re-read the file. This time it works.
>>>
>>> R> tt <- fread("sample-with-NA.csv", verbose=TRUE)
>>>
>>> Detected eol as \n only (no \r afterwards), the UNIX and Mac standard.
>>> Starting format detection on line 30 (the last non blank line in the
>>> first 30)
>>> Detected sep as ',' and 13 columns
>>> Type codes: 3300320200002
>>> Found first row with 13 fields occuring on line 1 (either column names
>>> or first row of data)
>>> The first data row has some non character fields. Treating as a data
>>> row and using default column names.
>>> Count of eol after pos: 101
>>> Subtracted 1 for last eol and any trailing empty lines, leaving 100 data
>>> rows
>>>    0.000s (  6%) Memory map (quicker if you rerun)
>>>    0.000s ( 40%) Format detection
>>>    0.000s (  7%) Count rows (wc -l)
>>>    0.000s (  2%) Allocation of 100x13 result (xMB) in RAM
>>>    0.000s ( 41%) Reading data
>>>    0.000s (  0%) Bumping column type midread and coercing data already
>>> read
>>>    0.000s (  3%) Changing na.strings to NA
>>>    0.001s        Total
>>>
>>> I've attached a 100 row sample.csv and a sample-with-NA.csv here for
>>> you to replicate the issue.
>>>
>>> Maybe, it is just that I am missing something. Can you explain?
>>>
>>> Thanks a lot!
>>>
>>> --
>>> ASB.
>
>


More information about the datatable-help mailing list