[datatable-help] fread(colClasses = "factor")

Gerhard Nachtmann kpm.nachtmann at gmail.com
Mon Oct 14 08:57:03 CEST 2013


Hi there!

Thanks for the great data.table package first!

I tried fread and got one of the rare errors of unknown colClasses:

##########
R version 3.0.1 (2013-05-16)
Platform: powerpc64-unknown-linux-gnu (64-bit)

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
 [7] LC_PAPER=C                 LC_NAME=C
 [9] LC_ADDRESS=C               LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
[1] data.table_1.8.10

##> ab1 <- fread("./daten/out_
abschluesse.csv", verbose = TRUE)

Detected eol as \r\n (CRLF) in that order, the Windows standard.
Using line 30 to detect sep (the last non blank line in the first
'autostart') ... sep=';'
Found 30 columns
First row with 30 fields occurs on line 1 (either column names or
first row of data) All the fields on line 1 are character fields.
Treating as the column names.
Count of eol after first data row: 289491 Subtracted 1 for last eol
and any trailing empty lines, leaving 289490 data rows Type codes:
000300000000000303003303030000 (first 5 rows) Type codes:
000300000000000303003303030000 (+middle 5 rows) Type codes:
000300000000300303003303030000 (+last 5 rows) Bumping column 28 from
INT to INT64 on data row 12, field contains 'O'
Bumping column 28 from INT64 to REAL on data row 12, field contains 'O'
Bumping column 28 from REAL to STR on data row 12, field contains 'O'
Bumping column 29 from INT to INT64 on data row 12, field contains 'E'
Bumping column 29 from INT64 to REAL on data row 12, field contains 'E'
Bumping column 29 from REAL to STR on data row 12, field contains 'E'
Bumping column 30 from INT to INT64 on data row 12, field contains 'E'
Bumping column 30 from INT64 to REAL on data row 12, field contains 'E'
Bumping column 30 from REAL to STR on data row 12, field contains 'E'
Bumping column 1 from INT to INT64 on data row 132736, field contains '2.2e+07'
Bumping column 1 from INT64 to REAL on data row 132736, field contains '2.2e+07'
   0.000s (  0%) Memory map (rerun may be quicker)
   0.000s (  0%) sep and header detection
   0.030s ( 10%) Count rows (wc -l)
   0.000s (  0%) Column type detection (first, middle and last 5 rows)
   0.050s ( 17%) Allocation of 289490x30 result (xMB) in RAM
   0.190s ( 66%) Reading data
   0.000s (  0%) Allocation for type bumps (if any), including gc time
if triggered
   0.010s (  3%) Coercing data already read in type bumps (if any)
   0.010s (  3%) Changing na.strings to NA
   0.290s        Total

Warning messages:
1: In fread("./daten/out_abschluesse.csv", verbose = TRUE) :
  Bumped column 28 to type character on data row 12, field contains
'O'. Coercing previously read values in this column from integer or
numeric back to character which may not be lossless; e.g., if '00' and
'000' occurred before they will now be just '0', and there may be
inconsistencies with treatment of ',,' and ',NA,' too (if they
occurred in this column before the bump). If this matters please rerun
and set 'colClasses' to 'character' for this column. Please note that
column type detection uses the first 5 rows, the middle 5 rows and the
last 5 rows, so hopefully this message should be very rare. If
reporting to datatable-help, please rerun and include the output from
verbose=TRUE.
2: In fread("./daten/out_abschluesse.csv", verbose = TRUE) :
  Bumped column 29 to type character on data row 12, field contains
'E'. Coercing previously read values in this column from integer or
numeric back to character which may not be lossless; e.g., if '00' and
'000' occurred before they will now be just '0', and there may be
inconsistencies with treatment of ',,' and ',NA,' too (if they
occurred in this column before the bump). If this matters please rerun
and set 'colClasses' to 'character' for this column. Please note that
column type detection uses the first 5 rows, the middle 5 rows and the
last 5 rows, so hopefully this message should be very rare. If
reporting to datatable-help, please rerun and include the output from
verbose=TRUE.
3: In fread("./daten/out_abschluesse.csv", verbose = TRUE) :
  Bumped column 30 to type character on data row 12, field contains
'E'. Coercing previously read values in this column from integer or
numeric back to character which may not be lossless; e.g., if '00' and
'000' occurred before they will now be just '0', and there may be
inconsistencies with treatment of ',,' and ',NA,' too (if they
occurred in this column before the bump). If this matters please rerun
and set 'colClasses' to 'character' for this column. Please note that
column type detection uses the first 5 rows, the middle 5 rows and the
last 5 rows, so hopefully this message should be very rare. If
reporting to datatable-help, please rerun and include the output from
verbose=TRUE.

##> ab1 <- fread("./daten/out_abschluesse.csv", verbose = TRUE,
colClasses = "character")

##### worked
##### fread(..., stringsAsFactors = TRUE) seems to be unused: I could
not find colClasses in fread.c
##### fread(..., colClasses = "factor") is unknown, but results in "character"

#####  in Windows 7 using data.table 1.8.8 it was the same warning,
but colClasses was unknown:

R version 3.0.1 (2013-05-16)
Platform: x86_64-w64-mingw32/x64 (64-bit)

locale:
[1] LC_COLLATE=German_Austria.1252  LC_CTYPE=German_Austria.1252
[3] LC_MONETARY=German_Austria.1252 LC_NUMERIC=C
[5] LC_TIME=German_Austria.1252

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
[1] data.table_1.8.8

##> ab1 <- fread("./daten/out_abschluesse.csv", verbose = TRUE,
colClasses = "character") Error in
fread("./daten/out_abschluesse.csv", verbose = TRUE, colClasses =
"character") :
  unused argument (colClasses = "character")
##########

Is there a possibility to read all columns as factors directly?

Have a nice day,
Gerhard


More information about the datatable-help mailing list