[datatable-help] Odd problem using fread to read in a csv file: no data, just headers

carrieromichele carrieromichele at gmail.com
Thu Mar 6 13:54:12 CET 2014


That works I guess.

> fread("http://www.stats.ox.ac.uk/pub/datasets/csb/ch11b.dat")
trying URL 'http://www.stats.ox.ac.uk/pub/datasets/csb/ch11b.dat'
Content type 'application/x-ns-proxy-autoconfig' length 2102 bytes
opened URL
downloaded 2102 bytes

      V1  V2   V3    V4 V5
  1:   1 307  930 36.58  0
  2:   2 307  940 36.73  0
  3:   3 307  950 36.93  0
  4:   4 307 1000 37.15  0
....


On 6 March 2014 12:51, Matt Dowle <mdowle at mdowle.plus.com> wrote:

>
> Yes, thanks.  Are other files reading ok on Windows or is it just this
> particular file?
> e.g. does this work :
> fread("http://www.stats.ox.ac.uk/pub/datasets/csb/ch11b.dat"<http://www.stats.ox.ac.uk/pub/datasets/csb/ch11b.dat>
> )
>
> [ I don't have Windows within easy reach. ]
>
>
> On 06/03/14 12:43, carrieromichele wrote:
>
>  I quickly read the last mail, Is this the test you needed guys?
>
>  > fread("http://www.cdc.gov/growthcharts/data/zscore/statage.csv",
> verbose=FALSE)
> trying URL 'http://www.cdc.gov/growthcharts/data/zscore/statage.csv'
> Content type 'application/octet-stream' length 66087 bytes (64 Kb)
> opened URL
> downloaded 64 Kb
>
>  Empty data.table (0 rows) of 14 cols: Sex,Agemos,L,M,S,P3...
> > sessionInfo()
> R version 3.0.2 (2013-09-25)
> Platform: x86_64-w64-mingw32/x64 (64-bit)
>
>  locale:
> [1] LC_COLLATE=English_United Kingdom.1252  LC_CTYPE=English_United
> Kingdom.1252
> [3] LC_MONETARY=English_United Kingdom.1252 LC_NUMERIC=C
>
> [5] LC_TIME=English_United Kingdom.1252
>
>  attached base packages:
> [1] stats     graphics  grDevices utils     datasets  methods   base
>
>  other attached packages:
> [1] data.table_1.9.3
>
>  loaded via a namespace (and not attached):
> [1] plyr_1.8.1     Rcpp_0.11.0    reshape2_1.2.2 Rook_1.0-9
> stringr_0.6.2  tools_3.0.2
> > fread("http://www.cdc.gov/growthcharts/data/zscore/statage.csv",
> verbose=FALSE)
> trying URL 'http://www.cdc.gov/growthcharts/data/zscore/statage.csv'
> Content type 'application/octet-stream' length 66087 bytes (64 Kb)
> opened URL
> downloaded 64 Kb
>
>  Empty data.table (0 rows) of 14 cols: Sex,Agemos,L,M,S,P3...
>
>
> On 6 March 2014 12:34, Matt Dowle <mdowle at mdowle.plus.com> wrote:
>
>>
>> Works for me as well on linux,  same output as Kevin's.
>>
>> I was perplexed as to why Farrel's output has :
>>
>>    File opened, filesize is 6.2E-05B
>>  but we see :
>>
>>    File opened, filesize is 0.000 GB
>>  That line is switched depending on Windows or not. Comparing them :
>>
>> // On Windows :
>> if (verbose) Rprintf("File opened, filesize is %.3 GB\n",
>> 1.0*filesize/(1024*1024*1024));
>>
>> // On non-Windows :
>> if (verbose) Rprintf("File opened, filesize is %.3f GB\n",
>> 1.0*filesize/(1024*1024*1024));
>>
>> So, a missing "f". Just committed a fix for that (r1223). That line is
>> part of a block that is necessarily different on Windows because its file
>> and mmap commands are different.  The missing 'f' could have feasibly
>> corrupted memory somehow (strange that the "G" of "GB" got overwritten) and
>> if so would explain why it thought it got to the end of the file before
>> seeing the \n after the \r.
>>
>> Farrel - does v1.9.2 work for you on Windows with verbose=FALSE? If yes,
>> then very likely verbose=TRUE will now work with commit 1223.  Best to
>> start with a new R session to clear any possible memory corruption and then
>> try :
>>
>>    fread("http://www.cdc.gov/growthcharts/data/zscore/statage.csv",
>> verbose=FALSE)
>>
>> If not, can anyone else reproduce on Windows? If so, I'll need to debug
>> it on Windows.
>>
>> Thanks,
>> Matt
>>
>>
>>
>> On 06/03/14 05:19, Kevin Ushey wrote:
>>
>>> I think Matt and Arun will have more information -- IIUC, fread is
>>> only now gaining support for reading from URLs on Windows.
>>>
>>> Something strange: I get different output on the file structure with
>>> fread. Posting in case it's useful:
>>>
>>>  statagecdc <- fread("
>>>> http://www.cdc.gov/growthcharts/data/zscore/statage.csv", verbose=T)
>>>>
>>> Input contains no \n. Taking this to be a filename to open
>>> File opened, filesize is 0.000 GB
>>> File is opened and mapped ok
>>> Detected eol as \r\n (CRLF) in that order, the Windows standard.
>>> Using line 30 to detect sep (the last non blank line in the first
>>> 'autostart') ... sep=','
>>> Found 14 columns
>>> First row with 14 fields occurs on line 1 (either column names or
>>> first row of data)
>>> All the fields on line 1 are character fields. Treating as the column
>>> names.
>>> Count of eol after first data row: 437
>>> Subtracted 1 for last eol and any trailing empty lines, leaving 436 data
>>> rows
>>> Type codes: 13333333333333 (first 5 rows)
>>> Type codes: 13333333333333 (+middle 5 rows)
>>> Type codes: 13333333333333 (+last 5 rows)
>>> Type codes: 13333333333333 (after applying colClasses and integer64)
>>> Type codes: 13333333333333 (after applying drop or select (if supplied)
>>> Allocating 14 column slots (14 - 0 NULL)
>>>     0.000s ( 13%) Memory map (rerun may be quicker)
>>>     0.000s (  4%) sep and header detection
>>>     0.000s ( 13%) Count rows (wc -l)
>>>     0.001s ( 49%) Column type detection (first, middle and last 5 rows)
>>>     0.000s (  1%) Allocation of 436x14 result (xMB) in RAM
>>>     0.000s ( 19%) Reading data
>>>     0.000s (  0%) Allocation for type bumps (if any), including gc time
>>> if triggered
>>>     0.000s (  0%) Coercing data already read in type bumps (if any)
>>>     0.000s (  0%) Changing na.strings to NA
>>>     0.002s        Total
>>>
>>> Note that fread sees \r\n as newlines for me.
>>>
>>>  sessionInfo()
>>>>
>>> R Under development (unstable) (2014-02-12 r64976)
>>> Platform: x86_64-apple-darwin13.0.0 (64-bit)
>>>
>>> locale:
>>> [1] en_CA.UTF-8/en_CA.UTF-8/en_CA.UTF-8/C/en_CA.UTF-8/en_CA.UTF-8
>>>
>>> attached base packages:
>>> [1] stats     graphics  grDevices utils     datasets  methods   base
>>>
>>> other attached packages:
>>> [1] data.table_1.9.1     knitr_1.5.15         devtools_1.4.1.99
>>> BiocInstaller_1.13.3
>>>
>>> loaded via a namespace (and not attached):
>>>   [1] compiler_3.1.0    digest_0.6.4      evaluate_0.5.1
>>> formatR_0.10      httr_0.2          memoise_0.1
>>>   [7] parallel_3.1.0    plyr_1.8          Rcpp_0.11.0.3
>>> RCurl_1.95-4.1    reshape2_1.3.0.99 stringr_0.6.2
>>> [13] tools_3.1.0       whisker_0.3-2
>>>
>>> Kevin
>>>
>>> On Wed, Mar 5, 2014 at 9:04 PM, Farrel Buchinsky <fjbuch at gmail.com>
>>> wrote:
>>>
>>>>  sessionInfo()
>>>>>
>>>> R version 3.0.2 (2013-09-25)
>>>> Platform: x86_64-w64-mingw32/x64 (64-bit)
>>>>
>>>> locale:
>>>> [1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United
>>>> States.1252    LC_MONETARY=English_United States.1252
>>>> [4] LC_NUMERIC=C                           LC_TIME=English_United
>>>> States.1252
>>>>
>>>> attached base packages:
>>>> [1] grid      stats     graphics  grDevices utils     datasets  methods
>>>> base
>>>>
>>>> other attached packages:
>>>> [1] reshape2_1.2.2    data.table_1.9.2  gridExtra_0.9.1
>>>> ggplot2_0.9.3.1
>>>> RGoogleDocs_0.7-0
>>>>
>>>> loaded via a namespace (and not attached):
>>>>   [1] colorspace_1.2-4   dichromat_2.0-0    digest_0.6.4
>>>> gtable_0.1.2
>>>> labeling_0.2       MASS_7.3-29        munsell_0.4.2
>>>>   [8] plyr_1.8.1         proto_0.3-10       RColorBrewer_1.0-5
>>>> Rcpp_0.11.0
>>>> RCurl_1.95-4.1     scales_0.2.3       stringr_0.6.2
>>>> [15] tools_3.0.2        XML_3.98-1.1
>>>>
>>>> Farrel Buchinsky
>>>> Google Voice Tel: (412) 567-7870 <%28412%29%20567-7870>
>>>>
>>>>
>>>> On Wed, Mar 5, 2014 at 10:55 PM, Kevin Ushey <kevinushey at gmail.com>
>>>> wrote:
>>>>
>>>>> Works fine for me with data.table 1.9.1 on OS X. What is your
>>>>> sessionInfo()?
>>>>>
>>>>> Kevin
>>>>>
>>>>> On Wed, Mar 5, 2014 at 7:53 PM, Farrel Buchinsky <fjbuch at gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Any idea why I am getting a data.table with headers only and zero
>>>>>> data?
>>>>>> How
>>>>>> can I get around the problem.
>>>>>>
>>>>>> fread("http://www.cdc.gov/growthcharts/data/zscore/statage.csv",
>>>>>> verbose=T)
>>>>>> fails
>>>>>> read.csv("http://www.cdc.gov/growthcharts/data/zscore/statage.csv")
>>>>>> succeeds
>>>>>>
>>>>>>  statagecdc <-
>>>>>>> fread("http://www.cdc.gov/growthcharts/data/zscore/statage.csv",
>>>>>>> verbose=T)
>>>>>>>
>>>>>> trying URL 'http://www.cdc.gov/growthcharts/data/zscore/statage.csv'
>>>>>> Content type 'application/octet-stream' length 66087 bytes (64 Kb)
>>>>>> opened URL
>>>>>> downloaded 64 Kb
>>>>>>
>>>>>> Input contains no \n. Taking this to be a filename to open
>>>>>> File opened, filesize is  6.2E-05B
>>>>>> File is opened and mapped ok
>>>>>> Detected eol as \r only (no \n afterwards). An old Mac 9 standard,
>>>>>> discontinued in 2002 according to Wikipedia.
>>>>>> Using line 1 to detect sep (the last non blank line in the first
>>>>>> 'autostart') ... sep=','
>>>>>> Found 14 columns
>>>>>> First row with 14 fields occurs on line 1 (either column names or
>>>>>> first
>>>>>> row
>>>>>> of data)
>>>>>> All the fields on line 1 are character fields. Treating as the column
>>>>>> names.
>>>>>> Byte after header row is eof or eol, 0 data rows present.
>>>>>> Type codes: 00000000000000 (first 5 rows)
>>>>>> Type codes: 00000000000000 (after applying colClasses and integer64)
>>>>>> Type codes: 00000000000000 (after applying drop or select (if
>>>>>> supplied)
>>>>>> Allocating 14 column slots (14 - 0 NULL)
>>>>>>     0.000s (  0%) Memory map (rerun may be quicker)
>>>>>>     0.000s (  0%) sep and header detection
>>>>>>     0.001s (100%) Count rows (wc -l)
>>>>>>     0.000s (  0%) Column type detection (first, middle and last 5
>>>>>> rows)
>>>>>>     0.000s (  0%) Allocation of 0x14 result (xMB) in RAM
>>>>>>     0.000s (  0%) Reading data
>>>>>>     0.000s (  0%) Allocation for type bumps (if any), including gc
>>>>>> time
>>>>>> if
>>>>>> triggered
>>>>>>     0.000s (  0%) Coercing data already read in type bumps (if any)
>>>>>>     0.000s (  0%) Changing na.strings to NA
>>>>>>     0.001s        Total
>>>>>>
>>>>>>
>>>>>> Thanks a lot.
>>>>>>
>>>>>> Farrel Buchinsky
>>>>>> Google Voice Tel: (412) 567-7870 <%28412%29%20567-7870>
>>>>>>
>>>>>> _______________________________________________
>>>>>> datatable-help mailing list
>>>>>> datatable-help at lists.r-forge.r-project.org
>>>>>>
>>>>>>
>>>>>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
>>>>>>
>>>>>
>>>>  _______________________________________________
>>> datatable-help mailing list
>>> datatable-help at lists.r-forge.r-project.org
>>>
>>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
>>>
>>>
>> _______________________________________________
>> datatable-help mailing list
>> datatable-help at lists.r-forge.r-project.org
>>
>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/datatable-help/attachments/20140306/5c576770/attachment-0001.html>


More information about the datatable-help mailing list