[datatable-help] Odd problem using fread to read in a csv file: no data, just headers
Matt Dowle
mdowle at mdowle.plus.com
Thu Mar 6 13:51:56 CET 2014
Yes, thanks. Are other files reading ok on Windows or is it just this
particular file?
e.g. does this work :
fread("http://www.stats.ox.ac.uk/pub/datasets/csb/ch11b.dat")
[ I don't have Windows within easy reach. ]
On 06/03/14 12:43, carrieromichele wrote:
> I quickly read the last mail, Is this the test you needed guys?
>
> > fread("http://www.cdc.gov/growthcharts/data/zscore/statage.csv",
> verbose=FALSE)
> trying URL 'http://www.cdc.gov/growthcharts/data/zscore/statage.csv'
> Content type 'application/octet-stream' length 66087 bytes (64 Kb)
> opened URL
> downloaded 64 Kb
>
> Empty data.table (0 rows) of 14 cols: Sex,Agemos,L,M,S,P3...
> > sessionInfo()
> R version 3.0.2 (2013-09-25)
> Platform: x86_64-w64-mingw32/x64 (64-bit)
>
> locale:
> [1] LC_COLLATE=English_United Kingdom.1252 LC_CTYPE=English_United
> Kingdom.1252
> [3] LC_MONETARY=English_United Kingdom.1252 LC_NUMERIC=C
> [5] LC_TIME=English_United Kingdom.1252
>
> attached base packages:
> [1] stats graphics grDevices utils datasets methods base
>
> other attached packages:
> [1] data.table_1.9.3
>
> loaded via a namespace (and not attached):
> [1] plyr_1.8.1 Rcpp_0.11.0 reshape2_1.2.2 Rook_1.0-9
> stringr_0.6.2 tools_3.0.2
> > fread("http://www.cdc.gov/growthcharts/data/zscore/statage.csv",
> verbose=FALSE)
> trying URL 'http://www.cdc.gov/growthcharts/data/zscore/statage.csv'
> Content type 'application/octet-stream' length 66087 bytes (64 Kb)
> opened URL
> downloaded 64 Kb
>
> Empty data.table (0 rows) of 14 cols: Sex,Agemos,L,M,S,P3...
>
>
> On 6 March 2014 12:34, Matt Dowle <mdowle at mdowle.plus.com
> <mailto:mdowle at mdowle.plus.com>> wrote:
>
>
> Works for me as well on linux, same output as Kevin's.
>
> I was perplexed as to why Farrel's output has :
>
> File opened, filesize is 6.2E-05B
> but we see :
>
> File opened, filesize is 0.000 GB
> That line is switched depending on Windows or not. Comparing them :
>
> // On Windows :
> if (verbose) Rprintf("File opened, filesize is %.3 GB\n",
> 1.0*filesize/(1024*1024*1024));
>
> // On non-Windows :
> if (verbose) Rprintf("File opened, filesize is %.3f GB\n",
> 1.0*filesize/(1024*1024*1024));
>
> So, a missing "f". Just committed a fix for that (r1223). That
> line is part of a block that is necessarily different on Windows
> because its file and mmap commands are different. The missing 'f'
> could have feasibly corrupted memory somehow (strange that the "G"
> of "GB" got overwritten) and if so would explain why it thought it
> got to the end of the file before seeing the \n after the \r.
>
> Farrel - does v1.9.2 work for you on Windows with verbose=FALSE?
> If yes, then very likely verbose=TRUE will now work with commit
> 1223. Best to start with a new R session to clear any possible
> memory corruption and then try :
>
>
> fread("http://www.cdc.gov/growthcharts/data/zscore/statage.csv",
> verbose=FALSE)
>
> If not, can anyone else reproduce on Windows? If so, I'll need to
> debug it on Windows.
>
> Thanks,
> Matt
>
>
>
> On 06/03/14 05:19, Kevin Ushey wrote:
>
> I think Matt and Arun will have more information -- IIUC, fread is
> only now gaining support for reading from URLs on Windows.
>
> Something strange: I get different output on the file
> structure with
> fread. Posting in case it's useful:
>
> statagecdc <-
> fread("http://www.cdc.gov/growthcharts/data/zscore/statage.csv",
> verbose=T)
>
> Input contains no \n. Taking this to be a filename to open
> File opened, filesize is 0.000 GB
> File is opened and mapped ok
> Detected eol as \r\n (CRLF) in that order, the Windows standard.
> Using line 30 to detect sep (the last non blank line in the first
> 'autostart') ... sep=','
> Found 14 columns
> First row with 14 fields occurs on line 1 (either column names or
> first row of data)
> All the fields on line 1 are character fields. Treating as the
> column names.
> Count of eol after first data row: 437
> Subtracted 1 for last eol and any trailing empty lines,
> leaving 436 data rows
> Type codes: 13333333333333 (first 5 rows)
> Type codes: 13333333333333 (+middle 5 rows)
> Type codes: 13333333333333 (+last 5 rows)
> Type codes: 13333333333333 (after applying colClasses and
> integer64)
> Type codes: 13333333333333 (after applying drop or select (if
> supplied)
> Allocating 14 column slots (14 - 0 NULL)
> 0.000s ( 13%) Memory map (rerun may be quicker)
> 0.000s ( 4%) sep and header detection
> 0.000s ( 13%) Count rows (wc -l)
> 0.001s ( 49%) Column type detection (first, middle and
> last 5 rows)
> 0.000s ( 1%) Allocation of 436x14 result (xMB) in RAM
> 0.000s ( 19%) Reading data
> 0.000s ( 0%) Allocation for type bumps (if any),
> including gc time
> if triggered
> 0.000s ( 0%) Coercing data already read in type bumps (if
> any)
> 0.000s ( 0%) Changing na.strings to NA
> 0.002s Total
>
> Note that fread sees \r\n as newlines for me.
>
> sessionInfo()
>
> R Under development (unstable) (2014-02-12 r64976)
> Platform: x86_64-apple-darwin13.0.0 (64-bit)
>
> locale:
> [1] en_CA.UTF-8/en_CA.UTF-8/en_CA.UTF-8/C/en_CA.UTF-8/en_CA.UTF-8
>
> attached base packages:
> [1] stats graphics grDevices utils datasets methods
> base
>
> other attached packages:
> [1] data.table_1.9.1 knitr_1.5.15 devtools_1.4.1.99
> BiocInstaller_1.13.3
>
> loaded via a namespace (and not attached):
> [1] compiler_3.1.0 digest_0.6.4 evaluate_0.5.1
> formatR_0.10 httr_0.2 memoise_0.1
> [7] parallel_3.1.0 plyr_1.8 Rcpp_0.11.0.3
> RCurl_1.95-4.1 reshape2_1.3.0.99 stringr_0.6.2
> [13] tools_3.1.0 whisker_0.3-2
>
> Kevin
>
> On Wed, Mar 5, 2014 at 9:04 PM, Farrel Buchinsky
> <fjbuch at gmail.com <mailto:fjbuch at gmail.com>> wrote:
>
> sessionInfo()
>
> R version 3.0.2 (2013-09-25)
> Platform: x86_64-w64-mingw32/x64 (64-bit)
>
> locale:
> [1] LC_COLLATE=English_United States.1252
> LC_CTYPE=English_United
> States.1252 LC_MONETARY=English_United States.1252
> [4] LC_NUMERIC=C LC_TIME=English_United
> States.1252
>
> attached base packages:
> [1] grid stats graphics grDevices utils
> datasets methods
> base
>
> other attached packages:
> [1] reshape2_1.2.2 data.table_1.9.2 gridExtra_0.9.1
> ggplot2_0.9.3.1
> RGoogleDocs_0.7-0
>
> loaded via a namespace (and not attached):
> [1] colorspace_1.2-4 dichromat_2.0-0 digest_0.6.4
> gtable_0.1.2
> labeling_0.2 MASS_7.3-29 munsell_0.4.2
> [8] plyr_1.8.1 proto_0.3-10 RColorBrewer_1.0-5
> Rcpp_0.11.0
> RCurl_1.95-4.1 scales_0.2.3 stringr_0.6.2
> [15] tools_3.0.2 XML_3.98-1.1
>
> Farrel Buchinsky
> Google Voice Tel: (412) 567-7870 <tel:%28412%29%20567-7870>
>
>
> On Wed, Mar 5, 2014 at 10:55 PM, Kevin Ushey
> <kevinushey at gmail.com <mailto:kevinushey at gmail.com>> wrote:
>
> Works fine for me with data.table 1.9.1 on OS X. What
> is your
> sessionInfo()?
>
> Kevin
>
> On Wed, Mar 5, 2014 at 7:53 PM, Farrel Buchinsky
> <fjbuch at gmail.com <mailto:fjbuch at gmail.com>> wrote:
>
> Any idea why I am getting a data.table with
> headers only and zero data?
> How
> can I get around the problem.
>
> fread("http://www.cdc.gov/growthcharts/data/zscore/statage.csv",
> verbose=T)
> fails
> read.csv("http://www.cdc.gov/growthcharts/data/zscore/statage.csv")
> succeeds
>
> statagecdc <-
> fread("http://www.cdc.gov/growthcharts/data/zscore/statage.csv",
> verbose=T)
>
> trying URL
> 'http://www.cdc.gov/growthcharts/data/zscore/statage.csv'
> Content type 'application/octet-stream' length
> 66087 bytes (64 Kb)
> opened URL
> downloaded 64 Kb
>
> Input contains no \n. Taking this to be a filename
> to open
> File opened, filesize is 6.2E-05B
> File is opened and mapped ok
> Detected eol as \r only (no \n afterwards). An old
> Mac 9 standard,
> discontinued in 2002 according to Wikipedia.
> Using line 1 to detect sep (the last non blank
> line in the first
> 'autostart') ... sep=','
> Found 14 columns
> First row with 14 fields occurs on line 1 (either
> column names or first
> row
> of data)
> All the fields on line 1 are character fields.
> Treating as the column
> names.
> Byte after header row is eof or eol, 0 data rows
> present.
> Type codes: 00000000000000 (first 5 rows)
> Type codes: 00000000000000 (after applying
> colClasses and integer64)
> Type codes: 00000000000000 (after applying drop or
> select (if supplied)
> Allocating 14 column slots (14 - 0 NULL)
> 0.000s ( 0%) Memory map (rerun may be quicker)
> 0.000s ( 0%) sep and header detection
> 0.001s (100%) Count rows (wc -l)
> 0.000s ( 0%) Column type detection (first,
> middle and last 5 rows)
> 0.000s ( 0%) Allocation of 0x14 result (xMB)
> in RAM
> 0.000s ( 0%) Reading data
> 0.000s ( 0%) Allocation for type bumps (if
> any), including gc time
> if
> triggered
> 0.000s ( 0%) Coercing data already read in
> type bumps (if any)
> 0.000s ( 0%) Changing na.strings to NA
> 0.001s Total
>
>
> Thanks a lot.
>
> Farrel Buchinsky
> Google Voice Tel: (412) 567-7870
> <tel:%28412%29%20567-7870>
>
> _______________________________________________
> datatable-help mailing list
> datatable-help at lists.r-forge.r-project.org
> <mailto:datatable-help at lists.r-forge.r-project.org>
>
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
>
>
> _______________________________________________
> datatable-help mailing list
> datatable-help at lists.r-forge.r-project.org
> <mailto:datatable-help at lists.r-forge.r-project.org>
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
>
>
> _______________________________________________
> datatable-help mailing list
> datatable-help at lists.r-forge.r-project.org
> <mailto:datatable-help at lists.r-forge.r-project.org>
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
>
>
>
>
> --
>
> *PRIVATE
> **T:*+44 (0)77 3248 1517 *|**E:*carrieromichele at gmail.com
> <mailto:carrieromichele at gmail.com><http://@gmail.com>
>
> *OFFICE
> T:*+44 (0)20 8236 8992 *|**E:*michele.carriero at evolve-analytics.com
> <mailto:michele.carriero at evolve-analytics.com>_
> _*T:*www.evolve-analytics.com <http://www.evolve-analytics.com>
>
>
> <http://www.evolve-analytics.com>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/datatable-help/attachments/20140306/490c3ceb/attachment-0001.html>
More information about the datatable-help
mailing list