[datatable-help] Unexpected Result Reading in Data File using fread

Martin Watts martin.dunelm at gmail.com
Thu Sep 4 15:09:00 CEST 2014


All

I am trying to read in a data file using fread()

I am getting several warnings indicating that a non-numeric entry was found
in a numeric field and as a result the column is being converted to a
character vector, however the non-numeric entry is one of the declared
na.strings and indeed the specific entry is returned as NA.

I expected that the "?" entry would been recognised as NA and column to be
read as numeric vector.  I have tried the same action with read.table() and
it works as I was expecting.

I am using:
R version 3.1.1 (pre-compiled)
RStudio Version 0.98.983
data.table package v1.92
locale is: en_GB.UTF-8
on:
 OS-X Version 10.9.4

the code I am using is:

"library("data.table")

column.class <- c(rep("character",2), rep("numeric",7))
data2 <- fread("./data/household_power_consumption.txt",
               sep=";",
               na.strings=c("?",""),
               colClasses=column.class,
               header=TRUE,
               nrows=7000,
               verbose=TRUE
)"

the 1st line in the data file causing the problem + the one before are:
21/12/2006;11:22:00;0.244;0.000;242.290;1.000;0.000;0.000;0.000
21/12/2006;11:23:00;?;?;?;?;?;?;

The 1st warning is:
1: In fread("./data/household_power_consumption.txt", na.strings = "?") :
  Bumped column 3 to type character on data row 6840, field contains '?'.
Coercing previously read values in this column from integer or numeric back
to character which may not be lossless; e.g., if '00' and '000' occurred
before they will now be just '0', and there may be inconsistencies with
treatment of ',,' and ',NA,' too (if they occurred in this column before
the bump). If this matters please rerun and set 'colClasses' to 'character'
for this column. Please note that column type detection uses the first 5
rows, the middle 5 rows and the last 5 rows, so hopefully this message
should be very rare. If reporting to datatable-help, please rerun and
include the output from verbose=TRUE.

Martin
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/datatable-help/attachments/20140904/32b85dc1/attachment.html>


More information about the datatable-help mailing list