[datatable-help] Odd problem using fread to read in a csv file: no data, just headers

Kevin Ushey kevinushey at gmail.com
Thu Mar 6 06:19:36 CET 2014


I think Matt and Arun will have more information -- IIUC, fread is
only now gaining support for reading from URLs on Windows.

Something strange: I get different output on the file structure with
fread. Posting in case it's useful:

> statagecdc <- fread("http://www.cdc.gov/growthcharts/data/zscore/statage.csv", verbose=T)
Input contains no \n. Taking this to be a filename to open
File opened, filesize is 0.000 GB
File is opened and mapped ok
Detected eol as \r\n (CRLF) in that order, the Windows standard.
Using line 30 to detect sep (the last non blank line in the first
'autostart') ... sep=','
Found 14 columns
First row with 14 fields occurs on line 1 (either column names or
first row of data)
All the fields on line 1 are character fields. Treating as the column names.
Count of eol after first data row: 437
Subtracted 1 for last eol and any trailing empty lines, leaving 436 data rows
Type codes: 13333333333333 (first 5 rows)
Type codes: 13333333333333 (+middle 5 rows)
Type codes: 13333333333333 (+last 5 rows)
Type codes: 13333333333333 (after applying colClasses and integer64)
Type codes: 13333333333333 (after applying drop or select (if supplied)
Allocating 14 column slots (14 - 0 NULL)
   0.000s ( 13%) Memory map (rerun may be quicker)
   0.000s (  4%) sep and header detection
   0.000s ( 13%) Count rows (wc -l)
   0.001s ( 49%) Column type detection (first, middle and last 5 rows)
   0.000s (  1%) Allocation of 436x14 result (xMB) in RAM
   0.000s ( 19%) Reading data
   0.000s (  0%) Allocation for type bumps (if any), including gc time
if triggered
   0.000s (  0%) Coercing data already read in type bumps (if any)
   0.000s (  0%) Changing na.strings to NA
   0.002s        Total

Note that fread sees \r\n as newlines for me.

> sessionInfo()
R Under development (unstable) (2014-02-12 r64976)
Platform: x86_64-apple-darwin13.0.0 (64-bit)

locale:
[1] en_CA.UTF-8/en_CA.UTF-8/en_CA.UTF-8/C/en_CA.UTF-8/en_CA.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
[1] data.table_1.9.1     knitr_1.5.15         devtools_1.4.1.99
BiocInstaller_1.13.3

loaded via a namespace (and not attached):
 [1] compiler_3.1.0    digest_0.6.4      evaluate_0.5.1
formatR_0.10      httr_0.2          memoise_0.1
 [7] parallel_3.1.0    plyr_1.8          Rcpp_0.11.0.3
RCurl_1.95-4.1    reshape2_1.3.0.99 stringr_0.6.2
[13] tools_3.1.0       whisker_0.3-2

Kevin

On Wed, Mar 5, 2014 at 9:04 PM, Farrel Buchinsky <fjbuch at gmail.com> wrote:
>> sessionInfo()
> R version 3.0.2 (2013-09-25)
> Platform: x86_64-w64-mingw32/x64 (64-bit)
>
> locale:
> [1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United
> States.1252    LC_MONETARY=English_United States.1252
> [4] LC_NUMERIC=C                           LC_TIME=English_United
> States.1252
>
> attached base packages:
> [1] grid      stats     graphics  grDevices utils     datasets  methods
> base
>
> other attached packages:
> [1] reshape2_1.2.2    data.table_1.9.2  gridExtra_0.9.1   ggplot2_0.9.3.1
> RGoogleDocs_0.7-0
>
> loaded via a namespace (and not attached):
>  [1] colorspace_1.2-4   dichromat_2.0-0    digest_0.6.4       gtable_0.1.2
> labeling_0.2       MASS_7.3-29        munsell_0.4.2
>  [8] plyr_1.8.1         proto_0.3-10       RColorBrewer_1.0-5 Rcpp_0.11.0
> RCurl_1.95-4.1     scales_0.2.3       stringr_0.6.2
> [15] tools_3.0.2        XML_3.98-1.1
>
> Farrel Buchinsky
> Google Voice Tel: (412) 567-7870
>
>
> On Wed, Mar 5, 2014 at 10:55 PM, Kevin Ushey <kevinushey at gmail.com> wrote:
>>
>> Works fine for me with data.table 1.9.1 on OS X. What is your
>> sessionInfo()?
>>
>> Kevin
>>
>> On Wed, Mar 5, 2014 at 7:53 PM, Farrel Buchinsky <fjbuch at gmail.com> wrote:
>> > Any idea why I am getting a data.table with headers only and zero data?
>> > How
>> > can I get around the problem.
>> >
>> > fread("http://www.cdc.gov/growthcharts/data/zscore/statage.csv",
>> > verbose=T)
>> > fails
>> > read.csv("http://www.cdc.gov/growthcharts/data/zscore/statage.csv")
>> > succeeds
>> >
>> >> statagecdc <-
>> >> fread("http://www.cdc.gov/growthcharts/data/zscore/statage.csv",
>> >> verbose=T)
>> > trying URL 'http://www.cdc.gov/growthcharts/data/zscore/statage.csv'
>> > Content type 'application/octet-stream' length 66087 bytes (64 Kb)
>> > opened URL
>> > downloaded 64 Kb
>> >
>> > Input contains no \n. Taking this to be a filename to open
>> > File opened, filesize is  6.2E-05B
>> > File is opened and mapped ok
>> > Detected eol as \r only (no \n afterwards). An old Mac 9 standard,
>> > discontinued in 2002 according to Wikipedia.
>> > Using line 1 to detect sep (the last non blank line in the first
>> > 'autostart') ... sep=','
>> > Found 14 columns
>> > First row with 14 fields occurs on line 1 (either column names or first
>> > row
>> > of data)
>> > All the fields on line 1 are character fields. Treating as the column
>> > names.
>> > Byte after header row is eof or eol, 0 data rows present.
>> > Type codes: 00000000000000 (first 5 rows)
>> > Type codes: 00000000000000 (after applying colClasses and integer64)
>> > Type codes: 00000000000000 (after applying drop or select (if supplied)
>> > Allocating 14 column slots (14 - 0 NULL)
>> >    0.000s (  0%) Memory map (rerun may be quicker)
>> >    0.000s (  0%) sep and header detection
>> >    0.001s (100%) Count rows (wc -l)
>> >    0.000s (  0%) Column type detection (first, middle and last 5 rows)
>> >    0.000s (  0%) Allocation of 0x14 result (xMB) in RAM
>> >    0.000s (  0%) Reading data
>> >    0.000s (  0%) Allocation for type bumps (if any), including gc time
>> > if
>> > triggered
>> >    0.000s (  0%) Coercing data already read in type bumps (if any)
>> >    0.000s (  0%) Changing na.strings to NA
>> >    0.001s        Total
>> >
>> >
>> > Thanks a lot.
>> >
>> > Farrel Buchinsky
>> > Google Voice Tel: (412) 567-7870
>> >
>> > _______________________________________________
>> > datatable-help mailing list
>> > datatable-help at lists.r-forge.r-project.org
>> >
>> > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
>
>


More information about the datatable-help mailing list