[datatable-help] fread on gzipped files

Matthew Dowle mdowle at mdowle.plus.com
Tue Apr 2 21:12:03 CEST 2013


 

Hi, 

fread memory maps the entire uncompressed file and this is
baked into the way it works (e.g. skipping to the beginning, middle and
last 5 rows to detect column types before starting to read the rows in)
and where the convenience and speed comes from. 

You could uncompress
the .gz to a ramdisk first, and then fread the uncompressed file from
that ramdisk, is probably the fastest way. Which should still be pretty
quick and I guess unlikely much slower than anything we could build into
fread (provided you use a ramdisk). 

Matthew 

On 02.04.2013 19:30,
Nathaniel Graham wrote: 

> I have a moderately large csv file that's
gzipped, but not in a tar 
> archive, so it's "filename.csv.gz" that I
want to read into a data.table. 
> I'd like to use fread(), but I can't
seem to make it work. I'm currently 
> using the following: 
>
data.table(read.csv(gzfile("filename.csv.gz","r"))) 
> Various
combinations of gzfile, gzcon, file, readLines, and 
> textConnection
all produce an error (invalid input). Is there a better 
> way to read
in large, compressed files? 
> 
> -------
> Nathaniel Graham
>
npgraham1 at gmail.com [1]
> npgraham1 at uky.edu [2]

 

Links:
------
[1]
mailto:npgraham1 at gmail.com
[2] mailto:npgraham1 at uky.edu
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/datatable-help/attachments/20130402/af5a22a4/attachment.html>


More information about the datatable-help mailing list