<div dir="ltr">Here is the verbose output:<div><br></div><div><div>> dt<-fread("data/spd_all_fixed.csv", sep=",",verbose=T)</div><div>Detected eol as \r\n (CRLF) in that order, the Windows standard.</div>
<div>Looking for supplied sep ',' on line 30 (the last non blank line in the first 30) ... found</div><div>Found 9 columns</div><div>First row with 9 fields occurs on line 1 (either column names or first row of data)</div>
<div>All the fields on line 1 are character fields. Treating as the column names.</div><div>Count of eol after first data row: 9186293</div><div>Subtracted 0 for last eol and any trailing empty lines, leaving 9186293 data rows</div>
<div>Type codes: 000002000 (first 5 rows)</div><div>Type codes: 000002200 (+middle 5 rows)</div><div>Error in fread("data/spd_all_fixed.csv", sep = ",", verbose = T) : </div><div> Expected sep (',') but '0' ends field 5 on line 6 when detecting types: 204038,2617097,20110803,0,0</div>
</div><div><br></div><div style>But here is the wc output (via cygwin; newline, word (whitespace delim so each word one 'line' here), byte)@</div><div style><div>$ wc spd_all_fixed.csv</div><div> 168997637 168997638 9078155125 spd_all_fixed.csv</div>
<div><br></div><div style>[So fread 9M, wc 168M rows].</div><div style><br></div><div style>Regards</div><div style>Paul</div></div></div><div class="gmail_extra"><br><br><div class="gmail_quote">On 30 April 2013 18:52, Matthew Dowle <span dir="ltr"><<a href="mailto:mdowle@mdowle.plus.com" target="_blank">mdowle@mdowle.plus.com</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><u></u>
<div>
<p> </p>
<p>Hi,</p>
<p>Thanks for reporting this. Please set verbose=TRUE and let us know the output.</p>
<p>Thanks, Matthew</p><div><div class="h5">
<p> </p>
<p>On 30.04.2013 18:01, Paul Harding wrote:</p>
<blockquote type="cite" style="padding-left:5px;border-left:#1010ff 2px solid;margin-left:5px;width:100%">
<div dir="ltr">
<div>Problem with fread on a large file</div>
The file is 8GB, just short of 200,000 lines, produced as SQLoutput and modified by cygwin/perl to remove the second line.<br>
<div class="gmail_quote">
<div dir="ltr">
<div>Using data.table 1.8.8 on R3.0.0 I get an fread error</div>
<div>
<div>fread("data/spd_all_fixed.csv",sep=",")</div>
<div>Error in fread("data/spd_all_fixed.csv", sep = ",") : </div>
<div> Expected sep (',') but '0' ends field 5 on line 6 when detecting types: 204038,2617097,20110803,0,0</div>
<div>Looking for the offending line,with line numbers in output so I'm guessing this is line 6 of the mid-file chunk examined,</div>
<div>
<div>$ grep -n '204038,2617097,201108' spd_all_fixed.csv</div>
<div>8316105:204038,2617097,20110801,0,0,0.64220529999999998,0,0,0</div>
<div>8751106:204038,2617097,20110802,1,0,0.65744469999999999,0,0,0</div>
<div>9186294:204038,2617097,20110803,0,0,0.49455500000000002,0,0,0</div>
<div>9621619:204038,2617097,20110804,0,0,0.3461342,0,0,0</div>
<div>10057189:204038,2617097,20110805,0,0,0.34128710000000001,0,0,0</div>
<div>and comparing to surrounding lines and the first ten lines</div>
<div>
<div>$ head spd_all_fixed.csv</div>
<div>s_key,i_key,p_key,q,pq,d,l,epi,class</div>
<div>203974,1107181,20110713,0,0,0.13700080000000001,0,0,0</div>
<div>203975,1107181,20110713,0,0,5.8352899999999999E-2,0,0,0</div>
<div>203976,1107181,20110713,0,0,7.1298999999999998E-3,0,0,0</div>
<div>203978,1107181,20110713,0,0,0.78346819999999995,0,0,0</div>
<div>203979,1107181,20110713,0,0,0.61627779999999999,0,0,0</div>
<div>203981,1107181,20110713,1,0,0.38610509999999998,0,0,0</div>
<div>203982,1107181,20110713,0,0,4.0657899999999997E-2,0,0,0</div>
<div>203983,1107181,20110713,2,0,0.71278109999999995,0,0,0</div>
<div>203984,1107181,20110713,0,0,0.42634430000000001,0.42634430000000001,2,13</div>
<div>I can't see any difference. I wonder if this is a bug? I have no problems on a small test data set run through an identical process and using the same fread command.</div>
<div>Regards</div>
<div>Paul</div>
</div>
</div>
</div>
</div>
</div>
</div>
</blockquote>
<p> </p>
<div> </div>
</div></div></div>
</blockquote></div><br></div>