[datatable-help] fread on very large file

Matthew Dowle mdowle at mdowle.plus.com
Tue Apr 30 19:52:54 CEST 2013


 

Hi, 

Thanks for reporting this. Please set verbose=TRUE and let us
know the output. 

Thanks, Matthew 

On 30.04.2013 18:01, Paul Harding
wrote: 

> Problem with fread on a large file The file is 8GB, just
short of 200,000 lines, produced as SQLoutput and modified by
cygwin/perl to remove the second line.
> 
> Using data.table 1.8.8 on
R3.0.0 I get an fread error 
> 
>
fread("data/spd_all_fixed.csv",sep=",") 
> Error in
fread("data/spd_all_fixed.csv", sep = ",") : 
> Expected sep (',') but
'0' ends field 5 on line 6 when detecting types:
204038,2617097,20110803,0,0 
> Looking for the offending line,with line
numbers in output so I'm guessing this is line 6 of the mid-file chunk
examined, 
> 
> $ grep -n '204038,2617097,201108' spd_all_fixed.csv 
>
8316105:204038,2617097,20110801,0,0,0.64220529999999998,0,0,0 
>
8751106:204038,2617097,20110802,1,0,0.65744469999999999,0,0,0 
>
9186294:204038,2617097,20110803,0,0,0.49455500000000002,0,0,0 
>
9621619:204038,2617097,20110804,0,0,0.3461342,0,0,0 
>
10057189:204038,2617097,20110805,0,0,0.34128710000000001,0,0,0 
> and
comparing to surrounding lines and the first ten lines 
> 
> $ head
spd_all_fixed.csv 
> s_key,i_key,p_key,q,pq,d,l,epi,class 
>
203974,1107181,20110713,0,0,0.13700080000000001,0,0,0 
>
203975,1107181,20110713,0,0,5.8352899999999999E-2,0,0,0 
>
203976,1107181,20110713,0,0,7.1298999999999998E-3,0,0,0 
>
203978,1107181,20110713,0,0,0.78346819999999995,0,0,0 
>
203979,1107181,20110713,0,0,0.61627779999999999,0,0,0 
>
203981,1107181,20110713,1,0,0.38610509999999998,0,0,0 
>
203982,1107181,20110713,0,0,4.0657899999999997E-2,0,0,0 
>
203983,1107181,20110713,2,0,0.71278109999999995,0,0,0 
>
203984,1107181,20110713,0,0,0.42634430000000001,0.42634430000000001,2,13

> I can't see any difference. I wonder if this is a bug? I have no
problems on a small test data set run through an identical process and
using the same fread command. 
> Regards 
> Paul

 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/datatable-help/attachments/20130430/49e3542f/attachment.html>


More information about the datatable-help mailing list