[datatable-help] Odd problem using fread to read in a csv file: no data, just headers

Matt Dowle mdowle at mdowle.plus.com
Thu Mar 6 13:51:56 CET 2014


Yes, thanks.  Are other files reading ok on Windows or is it just this 
particular file?
e.g. does this work :
fread("http://www.stats.ox.ac.uk/pub/datasets/csb/ch11b.dat")

[ I don't have Windows within easy reach. ]

On 06/03/14 12:43, carrieromichele wrote:
> I quickly read the last mail, Is this the test you needed guys?
>
> > fread("http://www.cdc.gov/growthcharts/data/zscore/statage.csv", 
> verbose=FALSE)
> trying URL 'http://www.cdc.gov/growthcharts/data/zscore/statage.csv'
> Content type 'application/octet-stream' length 66087 bytes (64 Kb)
> opened URL
> downloaded 64 Kb
>
> Empty data.table (0 rows) of 14 cols: Sex,Agemos,L,M,S,P3...
> > sessionInfo()
> R version 3.0.2 (2013-09-25)
> Platform: x86_64-w64-mingw32/x64 (64-bit)
>
> locale:
> [1] LC_COLLATE=English_United Kingdom.1252  LC_CTYPE=English_United 
> Kingdom.1252
> [3] LC_MONETARY=English_United Kingdom.1252 LC_NUMERIC=C
> [5] LC_TIME=English_United Kingdom.1252
>
> attached base packages:
> [1] stats     graphics  grDevices utils     datasets  methods   base
>
> other attached packages:
> [1] data.table_1.9.3
>
> loaded via a namespace (and not attached):
> [1] plyr_1.8.1     Rcpp_0.11.0    reshape2_1.2.2 Rook_1.0-9     
> stringr_0.6.2  tools_3.0.2
> > fread("http://www.cdc.gov/growthcharts/data/zscore/statage.csv", 
> verbose=FALSE)
> trying URL 'http://www.cdc.gov/growthcharts/data/zscore/statage.csv'
> Content type 'application/octet-stream' length 66087 bytes (64 Kb)
> opened URL
> downloaded 64 Kb
>
> Empty data.table (0 rows) of 14 cols: Sex,Agemos,L,M,S,P3...
>
>
> On 6 March 2014 12:34, Matt Dowle <mdowle at mdowle.plus.com 
> <mailto:mdowle at mdowle.plus.com>> wrote:
>
>
>     Works for me as well on linux,  same output as Kevin's.
>
>     I was perplexed as to why Farrel's output has :
>
>        File opened, filesize is 6.2E-05B
>     but we see :
>
>        File opened, filesize is 0.000 GB
>     That line is switched depending on Windows or not. Comparing them :
>
>     // On Windows :
>     if (verbose) Rprintf("File opened, filesize is %.3 GB\n",
>     1.0*filesize/(1024*1024*1024));
>
>     // On non-Windows :
>     if (verbose) Rprintf("File opened, filesize is %.3f GB\n",
>     1.0*filesize/(1024*1024*1024));
>
>     So, a missing "f". Just committed a fix for that (r1223). That
>     line is part of a block that is necessarily different on Windows
>     because its file and mmap commands are different.  The missing 'f'
>     could have feasibly corrupted memory somehow (strange that the "G"
>     of "GB" got overwritten) and if so would explain why it thought it
>     got to the end of the file before seeing the \n after the \r.
>
>     Farrel - does v1.9.2 work for you on Windows with verbose=FALSE?
>     If yes, then very likely verbose=TRUE will now work with commit
>     1223.  Best to start with a new R session to clear any possible
>     memory corruption and then try :
>
>      
>      fread("http://www.cdc.gov/growthcharts/data/zscore/statage.csv",
>     verbose=FALSE)
>
>     If not, can anyone else reproduce on Windows? If so, I'll need to
>     debug it on Windows.
>
>     Thanks,
>     Matt
>
>
>
>     On 06/03/14 05:19, Kevin Ushey wrote:
>
>         I think Matt and Arun will have more information -- IIUC, fread is
>         only now gaining support for reading from URLs on Windows.
>
>         Something strange: I get different output on the file
>         structure with
>         fread. Posting in case it's useful:
>
>             statagecdc <-
>             fread("http://www.cdc.gov/growthcharts/data/zscore/statage.csv",
>             verbose=T)
>
>         Input contains no \n. Taking this to be a filename to open
>         File opened, filesize is 0.000 GB
>         File is opened and mapped ok
>         Detected eol as \r\n (CRLF) in that order, the Windows standard.
>         Using line 30 to detect sep (the last non blank line in the first
>         'autostart') ... sep=','
>         Found 14 columns
>         First row with 14 fields occurs on line 1 (either column names or
>         first row of data)
>         All the fields on line 1 are character fields. Treating as the
>         column names.
>         Count of eol after first data row: 437
>         Subtracted 1 for last eol and any trailing empty lines,
>         leaving 436 data rows
>         Type codes: 13333333333333 (first 5 rows)
>         Type codes: 13333333333333 (+middle 5 rows)
>         Type codes: 13333333333333 (+last 5 rows)
>         Type codes: 13333333333333 (after applying colClasses and
>         integer64)
>         Type codes: 13333333333333 (after applying drop or select (if
>         supplied)
>         Allocating 14 column slots (14 - 0 NULL)
>             0.000s ( 13%) Memory map (rerun may be quicker)
>             0.000s (  4%) sep and header detection
>             0.000s ( 13%) Count rows (wc -l)
>             0.001s ( 49%) Column type detection (first, middle and
>         last 5 rows)
>             0.000s (  1%) Allocation of 436x14 result (xMB) in RAM
>             0.000s ( 19%) Reading data
>             0.000s (  0%) Allocation for type bumps (if any),
>         including gc time
>         if triggered
>             0.000s (  0%) Coercing data already read in type bumps (if
>         any)
>             0.000s (  0%) Changing na.strings to NA
>             0.002s        Total
>
>         Note that fread sees \r\n as newlines for me.
>
>             sessionInfo()
>
>         R Under development (unstable) (2014-02-12 r64976)
>         Platform: x86_64-apple-darwin13.0.0 (64-bit)
>
>         locale:
>         [1] en_CA.UTF-8/en_CA.UTF-8/en_CA.UTF-8/C/en_CA.UTF-8/en_CA.UTF-8
>
>         attached base packages:
>         [1] stats     graphics  grDevices utils     datasets  methods
>           base
>
>         other attached packages:
>         [1] data.table_1.9.1     knitr_1.5.15 devtools_1.4.1.99
>         BiocInstaller_1.13.3
>
>         loaded via a namespace (and not attached):
>           [1] compiler_3.1.0    digest_0.6.4  evaluate_0.5.1
>         formatR_0.10      httr_0.2          memoise_0.1
>           [7] parallel_3.1.0    plyr_1.8  Rcpp_0.11.0.3
>         RCurl_1.95-4.1    reshape2_1.3.0.99 stringr_0.6.2
>         [13] tools_3.1.0       whisker_0.3-2
>
>         Kevin
>
>         On Wed, Mar 5, 2014 at 9:04 PM, Farrel Buchinsky
>         <fjbuch at gmail.com <mailto:fjbuch at gmail.com>> wrote:
>
>                 sessionInfo()
>
>             R version 3.0.2 (2013-09-25)
>             Platform: x86_64-w64-mingw32/x64 (64-bit)
>
>             locale:
>             [1] LC_COLLATE=English_United States.1252
>              LC_CTYPE=English_United
>             States.1252    LC_MONETARY=English_United States.1252
>             [4] LC_NUMERIC=C LC_TIME=English_United
>             States.1252
>
>             attached base packages:
>             [1] grid      stats     graphics  grDevices utils  
>             datasets  methods
>             base
>
>             other attached packages:
>             [1] reshape2_1.2.2    data.table_1.9.2  gridExtra_0.9.1  
>             ggplot2_0.9.3.1
>             RGoogleDocs_0.7-0
>
>             loaded via a namespace (and not attached):
>               [1] colorspace_1.2-4   dichromat_2.0-0  digest_0.6.4    
>               gtable_0.1.2
>             labeling_0.2       MASS_7.3-29        munsell_0.4.2
>               [8] plyr_1.8.1         proto_0.3-10 RColorBrewer_1.0-5
>             Rcpp_0.11.0
>             RCurl_1.95-4.1     scales_0.2.3       stringr_0.6.2
>             [15] tools_3.0.2        XML_3.98-1.1
>
>             Farrel Buchinsky
>             Google Voice Tel: (412) 567-7870 <tel:%28412%29%20567-7870>
>
>
>             On Wed, Mar 5, 2014 at 10:55 PM, Kevin Ushey
>             <kevinushey at gmail.com <mailto:kevinushey at gmail.com>> wrote:
>
>                 Works fine for me with data.table 1.9.1 on OS X. What
>                 is your
>                 sessionInfo()?
>
>                 Kevin
>
>                 On Wed, Mar 5, 2014 at 7:53 PM, Farrel Buchinsky
>                 <fjbuch at gmail.com <mailto:fjbuch at gmail.com>> wrote:
>
>                     Any idea why I am getting a data.table with
>                     headers only and zero data?
>                     How
>                     can I get around the problem.
>
>                     fread("http://www.cdc.gov/growthcharts/data/zscore/statage.csv",
>                     verbose=T)
>                     fails
>                     read.csv("http://www.cdc.gov/growthcharts/data/zscore/statage.csv")
>                     succeeds
>
>                         statagecdc <-
>                         fread("http://www.cdc.gov/growthcharts/data/zscore/statage.csv",
>                         verbose=T)
>
>                     trying URL
>                     'http://www.cdc.gov/growthcharts/data/zscore/statage.csv'
>                     Content type 'application/octet-stream' length
>                     66087 bytes (64 Kb)
>                     opened URL
>                     downloaded 64 Kb
>
>                     Input contains no \n. Taking this to be a filename
>                     to open
>                     File opened, filesize is  6.2E-05B
>                     File is opened and mapped ok
>                     Detected eol as \r only (no \n afterwards). An old
>                     Mac 9 standard,
>                     discontinued in 2002 according to Wikipedia.
>                     Using line 1 to detect sep (the last non blank
>                     line in the first
>                     'autostart') ... sep=','
>                     Found 14 columns
>                     First row with 14 fields occurs on line 1 (either
>                     column names or first
>                     row
>                     of data)
>                     All the fields on line 1 are character fields.
>                     Treating as the column
>                     names.
>                     Byte after header row is eof or eol, 0 data rows
>                     present.
>                     Type codes: 00000000000000 (first 5 rows)
>                     Type codes: 00000000000000 (after applying
>                     colClasses and integer64)
>                     Type codes: 00000000000000 (after applying drop or
>                     select (if supplied)
>                     Allocating 14 column slots (14 - 0 NULL)
>                         0.000s (  0%) Memory map (rerun may be quicker)
>                         0.000s (  0%) sep and header detection
>                         0.001s (100%) Count rows (wc -l)
>                         0.000s (  0%) Column type detection (first,
>                     middle and last 5 rows)
>                         0.000s (  0%) Allocation of 0x14 result (xMB)
>                     in RAM
>                         0.000s (  0%) Reading data
>                         0.000s (  0%) Allocation for type bumps (if
>                     any), including gc time
>                     if
>                     triggered
>                         0.000s (  0%) Coercing data already read in
>                     type bumps (if any)
>                         0.000s (  0%) Changing na.strings to NA
>                         0.001s        Total
>
>
>                     Thanks a lot.
>
>                     Farrel Buchinsky
>                     Google Voice Tel: (412) 567-7870
>                     <tel:%28412%29%20567-7870>
>
>                     _______________________________________________
>                     datatable-help mailing list
>                     datatable-help at lists.r-forge.r-project.org
>                     <mailto:datatable-help at lists.r-forge.r-project.org>
>
>                     https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
>
>
>         _______________________________________________
>         datatable-help mailing list
>         datatable-help at lists.r-forge.r-project.org
>         <mailto:datatable-help at lists.r-forge.r-project.org>
>         https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
>
>
>     _______________________________________________
>     datatable-help mailing list
>     datatable-help at lists.r-forge.r-project.org
>     <mailto:datatable-help at lists.r-forge.r-project.org>
>     https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
>
>
>
>
> -- 
>
> *PRIVATE
> **T:*+44 (0)77 3248 1517 *|**E:*carrieromichele at gmail.com 
> <mailto:carrieromichele at gmail.com><http://@gmail.com>
>
> *OFFICE
> T:*+44 (0)20 8236 8992 *|**E:*michele.carriero at evolve-analytics.com 
> <mailto:michele.carriero at evolve-analytics.com>_
> _*T:*www.evolve-analytics.com <http://www.evolve-analytics.com>
>
>
> <http://www.evolve-analytics.com>
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/datatable-help/attachments/20140306/490c3ceb/attachment-0001.html>


More information about the datatable-help mailing list