[datatable-help] Odd problem using fread to read in a csv file: no data, just headers
carrieromichele
carrieromichele at gmail.com
Thu Mar 6 13:54:12 CET 2014
That works I guess.
> fread("http://www.stats.ox.ac.uk/pub/datasets/csb/ch11b.dat")
trying URL 'http://www.stats.ox.ac.uk/pub/datasets/csb/ch11b.dat'
Content type 'application/x-ns-proxy-autoconfig' length 2102 bytes
opened URL
downloaded 2102 bytes
V1 V2 V3 V4 V5
1: 1 307 930 36.58 0
2: 2 307 940 36.73 0
3: 3 307 950 36.93 0
4: 4 307 1000 37.15 0
....
On 6 March 2014 12:51, Matt Dowle <mdowle at mdowle.plus.com> wrote:
>
> Yes, thanks. Are other files reading ok on Windows or is it just this
> particular file?
> e.g. does this work :
> fread("http://www.stats.ox.ac.uk/pub/datasets/csb/ch11b.dat"<http://www.stats.ox.ac.uk/pub/datasets/csb/ch11b.dat>
> )
>
> [ I don't have Windows within easy reach. ]
>
>
> On 06/03/14 12:43, carrieromichele wrote:
>
> I quickly read the last mail, Is this the test you needed guys?
>
> > fread("http://www.cdc.gov/growthcharts/data/zscore/statage.csv",
> verbose=FALSE)
> trying URL 'http://www.cdc.gov/growthcharts/data/zscore/statage.csv'
> Content type 'application/octet-stream' length 66087 bytes (64 Kb)
> opened URL
> downloaded 64 Kb
>
> Empty data.table (0 rows) of 14 cols: Sex,Agemos,L,M,S,P3...
> > sessionInfo()
> R version 3.0.2 (2013-09-25)
> Platform: x86_64-w64-mingw32/x64 (64-bit)
>
> locale:
> [1] LC_COLLATE=English_United Kingdom.1252 LC_CTYPE=English_United
> Kingdom.1252
> [3] LC_MONETARY=English_United Kingdom.1252 LC_NUMERIC=C
>
> [5] LC_TIME=English_United Kingdom.1252
>
> attached base packages:
> [1] stats graphics grDevices utils datasets methods base
>
> other attached packages:
> [1] data.table_1.9.3
>
> loaded via a namespace (and not attached):
> [1] plyr_1.8.1 Rcpp_0.11.0 reshape2_1.2.2 Rook_1.0-9
> stringr_0.6.2 tools_3.0.2
> > fread("http://www.cdc.gov/growthcharts/data/zscore/statage.csv",
> verbose=FALSE)
> trying URL 'http://www.cdc.gov/growthcharts/data/zscore/statage.csv'
> Content type 'application/octet-stream' length 66087 bytes (64 Kb)
> opened URL
> downloaded 64 Kb
>
> Empty data.table (0 rows) of 14 cols: Sex,Agemos,L,M,S,P3...
>
>
> On 6 March 2014 12:34, Matt Dowle <mdowle at mdowle.plus.com> wrote:
>
>>
>> Works for me as well on linux, same output as Kevin's.
>>
>> I was perplexed as to why Farrel's output has :
>>
>> File opened, filesize is 6.2E-05B
>> but we see :
>>
>> File opened, filesize is 0.000 GB
>> That line is switched depending on Windows or not. Comparing them :
>>
>> // On Windows :
>> if (verbose) Rprintf("File opened, filesize is %.3 GB\n",
>> 1.0*filesize/(1024*1024*1024));
>>
>> // On non-Windows :
>> if (verbose) Rprintf("File opened, filesize is %.3f GB\n",
>> 1.0*filesize/(1024*1024*1024));
>>
>> So, a missing "f". Just committed a fix for that (r1223). That line is
>> part of a block that is necessarily different on Windows because its file
>> and mmap commands are different. The missing 'f' could have feasibly
>> corrupted memory somehow (strange that the "G" of "GB" got overwritten) and
>> if so would explain why it thought it got to the end of the file before
>> seeing the \n after the \r.
>>
>> Farrel - does v1.9.2 work for you on Windows with verbose=FALSE? If yes,
>> then very likely verbose=TRUE will now work with commit 1223. Best to
>> start with a new R session to clear any possible memory corruption and then
>> try :
>>
>> fread("http://www.cdc.gov/growthcharts/data/zscore/statage.csv",
>> verbose=FALSE)
>>
>> If not, can anyone else reproduce on Windows? If so, I'll need to debug
>> it on Windows.
>>
>> Thanks,
>> Matt
>>
>>
>>
>> On 06/03/14 05:19, Kevin Ushey wrote:
>>
>>> I think Matt and Arun will have more information -- IIUC, fread is
>>> only now gaining support for reading from URLs on Windows.
>>>
>>> Something strange: I get different output on the file structure with
>>> fread. Posting in case it's useful:
>>>
>>> statagecdc <- fread("
>>>> http://www.cdc.gov/growthcharts/data/zscore/statage.csv", verbose=T)
>>>>
>>> Input contains no \n. Taking this to be a filename to open
>>> File opened, filesize is 0.000 GB
>>> File is opened and mapped ok
>>> Detected eol as \r\n (CRLF) in that order, the Windows standard.
>>> Using line 30 to detect sep (the last non blank line in the first
>>> 'autostart') ... sep=','
>>> Found 14 columns
>>> First row with 14 fields occurs on line 1 (either column names or
>>> first row of data)
>>> All the fields on line 1 are character fields. Treating as the column
>>> names.
>>> Count of eol after first data row: 437
>>> Subtracted 1 for last eol and any trailing empty lines, leaving 436 data
>>> rows
>>> Type codes: 13333333333333 (first 5 rows)
>>> Type codes: 13333333333333 (+middle 5 rows)
>>> Type codes: 13333333333333 (+last 5 rows)
>>> Type codes: 13333333333333 (after applying colClasses and integer64)
>>> Type codes: 13333333333333 (after applying drop or select (if supplied)
>>> Allocating 14 column slots (14 - 0 NULL)
>>> 0.000s ( 13%) Memory map (rerun may be quicker)
>>> 0.000s ( 4%) sep and header detection
>>> 0.000s ( 13%) Count rows (wc -l)
>>> 0.001s ( 49%) Column type detection (first, middle and last 5 rows)
>>> 0.000s ( 1%) Allocation of 436x14 result (xMB) in RAM
>>> 0.000s ( 19%) Reading data
>>> 0.000s ( 0%) Allocation for type bumps (if any), including gc time
>>> if triggered
>>> 0.000s ( 0%) Coercing data already read in type bumps (if any)
>>> 0.000s ( 0%) Changing na.strings to NA
>>> 0.002s Total
>>>
>>> Note that fread sees \r\n as newlines for me.
>>>
>>> sessionInfo()
>>>>
>>> R Under development (unstable) (2014-02-12 r64976)
>>> Platform: x86_64-apple-darwin13.0.0 (64-bit)
>>>
>>> locale:
>>> [1] en_CA.UTF-8/en_CA.UTF-8/en_CA.UTF-8/C/en_CA.UTF-8/en_CA.UTF-8
>>>
>>> attached base packages:
>>> [1] stats graphics grDevices utils datasets methods base
>>>
>>> other attached packages:
>>> [1] data.table_1.9.1 knitr_1.5.15 devtools_1.4.1.99
>>> BiocInstaller_1.13.3
>>>
>>> loaded via a namespace (and not attached):
>>> [1] compiler_3.1.0 digest_0.6.4 evaluate_0.5.1
>>> formatR_0.10 httr_0.2 memoise_0.1
>>> [7] parallel_3.1.0 plyr_1.8 Rcpp_0.11.0.3
>>> RCurl_1.95-4.1 reshape2_1.3.0.99 stringr_0.6.2
>>> [13] tools_3.1.0 whisker_0.3-2
>>>
>>> Kevin
>>>
>>> On Wed, Mar 5, 2014 at 9:04 PM, Farrel Buchinsky <fjbuch at gmail.com>
>>> wrote:
>>>
>>>> sessionInfo()
>>>>>
>>>> R version 3.0.2 (2013-09-25)
>>>> Platform: x86_64-w64-mingw32/x64 (64-bit)
>>>>
>>>> locale:
>>>> [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United
>>>> States.1252 LC_MONETARY=English_United States.1252
>>>> [4] LC_NUMERIC=C LC_TIME=English_United
>>>> States.1252
>>>>
>>>> attached base packages:
>>>> [1] grid stats graphics grDevices utils datasets methods
>>>> base
>>>>
>>>> other attached packages:
>>>> [1] reshape2_1.2.2 data.table_1.9.2 gridExtra_0.9.1
>>>> ggplot2_0.9.3.1
>>>> RGoogleDocs_0.7-0
>>>>
>>>> loaded via a namespace (and not attached):
>>>> [1] colorspace_1.2-4 dichromat_2.0-0 digest_0.6.4
>>>> gtable_0.1.2
>>>> labeling_0.2 MASS_7.3-29 munsell_0.4.2
>>>> [8] plyr_1.8.1 proto_0.3-10 RColorBrewer_1.0-5
>>>> Rcpp_0.11.0
>>>> RCurl_1.95-4.1 scales_0.2.3 stringr_0.6.2
>>>> [15] tools_3.0.2 XML_3.98-1.1
>>>>
>>>> Farrel Buchinsky
>>>> Google Voice Tel: (412) 567-7870 <%28412%29%20567-7870>
>>>>
>>>>
>>>> On Wed, Mar 5, 2014 at 10:55 PM, Kevin Ushey <kevinushey at gmail.com>
>>>> wrote:
>>>>
>>>>> Works fine for me with data.table 1.9.1 on OS X. What is your
>>>>> sessionInfo()?
>>>>>
>>>>> Kevin
>>>>>
>>>>> On Wed, Mar 5, 2014 at 7:53 PM, Farrel Buchinsky <fjbuch at gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Any idea why I am getting a data.table with headers only and zero
>>>>>> data?
>>>>>> How
>>>>>> can I get around the problem.
>>>>>>
>>>>>> fread("http://www.cdc.gov/growthcharts/data/zscore/statage.csv",
>>>>>> verbose=T)
>>>>>> fails
>>>>>> read.csv("http://www.cdc.gov/growthcharts/data/zscore/statage.csv")
>>>>>> succeeds
>>>>>>
>>>>>> statagecdc <-
>>>>>>> fread("http://www.cdc.gov/growthcharts/data/zscore/statage.csv",
>>>>>>> verbose=T)
>>>>>>>
>>>>>> trying URL 'http://www.cdc.gov/growthcharts/data/zscore/statage.csv'
>>>>>> Content type 'application/octet-stream' length 66087 bytes (64 Kb)
>>>>>> opened URL
>>>>>> downloaded 64 Kb
>>>>>>
>>>>>> Input contains no \n. Taking this to be a filename to open
>>>>>> File opened, filesize is 6.2E-05B
>>>>>> File is opened and mapped ok
>>>>>> Detected eol as \r only (no \n afterwards). An old Mac 9 standard,
>>>>>> discontinued in 2002 according to Wikipedia.
>>>>>> Using line 1 to detect sep (the last non blank line in the first
>>>>>> 'autostart') ... sep=','
>>>>>> Found 14 columns
>>>>>> First row with 14 fields occurs on line 1 (either column names or
>>>>>> first
>>>>>> row
>>>>>> of data)
>>>>>> All the fields on line 1 are character fields. Treating as the column
>>>>>> names.
>>>>>> Byte after header row is eof or eol, 0 data rows present.
>>>>>> Type codes: 00000000000000 (first 5 rows)
>>>>>> Type codes: 00000000000000 (after applying colClasses and integer64)
>>>>>> Type codes: 00000000000000 (after applying drop or select (if
>>>>>> supplied)
>>>>>> Allocating 14 column slots (14 - 0 NULL)
>>>>>> 0.000s ( 0%) Memory map (rerun may be quicker)
>>>>>> 0.000s ( 0%) sep and header detection
>>>>>> 0.001s (100%) Count rows (wc -l)
>>>>>> 0.000s ( 0%) Column type detection (first, middle and last 5
>>>>>> rows)
>>>>>> 0.000s ( 0%) Allocation of 0x14 result (xMB) in RAM
>>>>>> 0.000s ( 0%) Reading data
>>>>>> 0.000s ( 0%) Allocation for type bumps (if any), including gc
>>>>>> time
>>>>>> if
>>>>>> triggered
>>>>>> 0.000s ( 0%) Coercing data already read in type bumps (if any)
>>>>>> 0.000s ( 0%) Changing na.strings to NA
>>>>>> 0.001s Total
>>>>>>
>>>>>>
>>>>>> Thanks a lot.
>>>>>>
>>>>>> Farrel Buchinsky
>>>>>> Google Voice Tel: (412) 567-7870 <%28412%29%20567-7870>
>>>>>>
>>>>>> _______________________________________________
>>>>>> datatable-help mailing list
>>>>>> datatable-help at lists.r-forge.r-project.org
>>>>>>
>>>>>>
>>>>>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
>>>>>>
>>>>>
>>>> _______________________________________________
>>> datatable-help mailing list
>>> datatable-help at lists.r-forge.r-project.org
>>>
>>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
>>>
>>>
>> _______________________________________________
>> datatable-help mailing list
>> datatable-help at lists.r-forge.r-project.org
>>
>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/datatable-help/attachments/20140306/5c576770/attachment-0001.html>
More information about the datatable-help
mailing list