[datatable-help] fread on gzipped files
Matthew Dowle
mdowle at mdowle.plus.com
Tue Apr 2 21:12:03 CEST 2013
Hi,
fread memory maps the entire uncompressed file and this is
baked into the way it works (e.g. skipping to the beginning, middle and
last 5 rows to detect column types before starting to read the rows in)
and where the convenience and speed comes from.
You could uncompress
the .gz to a ramdisk first, and then fread the uncompressed file from
that ramdisk, is probably the fastest way. Which should still be pretty
quick and I guess unlikely much slower than anything we could build into
fread (provided you use a ramdisk).
Matthew
On 02.04.2013 19:30,
Nathaniel Graham wrote:
> I have a moderately large csv file that's
gzipped, but not in a tar
> archive, so it's "filename.csv.gz" that I
want to read into a data.table.
> I'd like to use fread(), but I can't
seem to make it work. I'm currently
> using the following:
>
data.table(read.csv(gzfile("filename.csv.gz","r")))
> Various
combinations of gzfile, gzcon, file, readLines, and
> textConnection
all produce an error (invalid input). Is there a better
> way to read
in large, compressed files?
>
> -------
> Nathaniel Graham
>
npgraham1 at gmail.com [1]
> npgraham1 at uky.edu [2]
Links:
------
[1]
mailto:npgraham1 at gmail.com
[2] mailto:npgraham1 at uky.edu
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/datatable-help/attachments/20130402/af5a22a4/attachment.html>
More information about the datatable-help
mailing list