[datatable-help] Unexpected Result Reading in Data File using fread

Arunkumar Srinivasan aragorn168b at gmail.com
Sat Sep 6 01:20:39 CEST 2014


Hi Martin,

I'd recommend first to try with the current development version to see if this has already been fixed… Matt's already fixed some fread bugs that were recurring.
You can get it from here: https://github.com/Rdatatable/data.table Please scroll down to see the installation instructions.

And if you still get the error, could you please file a bug report https://github.com/Rdatatable/data.table/issues with a *reproducible example* please? If necessary, you can also link to a *minimal* file that can reproduce the issue; it'd be much helpful.

Thanks,
Arun

From: Martin Watts <martin.dunelm at gmail.com>
Reply: Martin Watts <martin.dunelm at gmail.com>>
Date: September 4, 2014 at 3:09:13 PM
To: datatable-help at lists.r-forge.r-project.org <datatable-help at lists.r-forge.r-project.org>>
Subject:  [datatable-help] Unexpected Result Reading in Data File using fread  

All

I am trying to read in a data file using fread()

I am getting several warnings indicating that a non-numeric entry was found in a numeric field and as a result the column is being converted to a character vector, however the non-numeric entry is one of the declared na.strings and indeed the specific entry is returned as NA.

I expected that the "?" entry would been recognised as NA and column to be read as numeric vector.  I have tried the same action with read.table() and it works as I was expecting.

I am using:
R version 3.1.1 (pre-compiled)
RStudio Version 0.98.983
data.table package v1.92
locale is: en_GB.UTF-8
on:
 OS-X Version 10.9.4

the code I am using is:

"library("data.table")

column.class <- c(rep("character",2), rep("numeric",7))
data2 <- fread("./data/household_power_consumption.txt",
               sep=";",
               na.strings=c("?",""),
               colClasses=column.class,
               header=TRUE,
               nrows=7000,
               verbose=TRUE
)"

the 1st line in the data file causing the problem + the one before are:
21/12/2006;11:22:00;0.244;0.000;242.290;1.000;0.000;0.000;0.000
21/12/2006;11:23:00;?;?;?;?;?;?;

The 1st warning is:
1: In fread("./data/household_power_consumption.txt", na.strings = "?") :
  Bumped column 3 to type character on data row 6840, field contains '?'. Coercing previously read values in this column from integer or numeric back to character which may not be lossless; e.g., if '00' and '000' occurred before they will now be just '0', and there may be inconsistencies with treatment of ',,' and ',NA,' too (if they occurred in this column before the bump). If this matters please rerun and set 'colClasses' to 'character' for this column. Please note that column type detection uses the first 5 rows, the middle 5 rows and the last 5 rows, so hopefully this message should be very rare. If reporting to datatable-help, please rerun and include the output from verbose=TRUE.


Martin

_______________________________________________  
datatable-help mailing list  
datatable-help at lists.r-forge.r-project.org  
https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/datatable-help/attachments/20140906/d069e15c/attachment.html>


More information about the datatable-help mailing list