[datatable-help] fread on gzipped files
Matthew Dowle
mdowle at mdowle.plus.com
Wed Apr 3 10:58:24 CEST 2013
Interesting. How much do you find read.csv is sped up by reading
gzip'd files?
On 02.04.2013 20:36, Nathaniel Graham wrote:
> Thanks,
but I suspect that it would take longer to setup and then remove
> a
ramdisk than it would to use read.csv and data.table. My files are
>
moderately large (between 200 MB and 3 GB when compressed), but not
>
enormous; I gzip not so much to save space on disk but to speed up
reads.
>
> -------
> Nathaniel Graham
> npgraham1 at gmail.com [3]
>
npgraham1 at uky.edu [4]
>
> On Tue, Apr 2, 2013 at 3:12 PM, Matthew
Dowle <mdowle at mdowle.plus.com [5]> wrote:
>
>> Hi,
>>
>> fread memory
maps the entire uncompressed file and this is baked into the way it
works (e.g. skipping to the beginning, middle and last 5 rows to detect
column types before starting to read the rows in) and where the
convenience and speed comes from.
>>
>> You could uncompress the .gz
to a ramdisk first, and then fread the uncompressed file from that
ramdisk, is probably the fastest way. Which should still be pretty quick
and I guess unlikely much slower than anything we could build into fread
(provided you use a ramdisk).
>>
>> Matthew
>>
>> On 02.04.2013
19:30, Nathaniel Graham wrote:
>>
>>> I have a moderately large csv
file that's gzipped, but not in a tar
>>> archive, so it's
"filename.csv.gz" that I want to read into a data.table.
>>> I'd like
to use fread(), but I can't seem to make it work. I'm currently
>>>
using the following:
>>>
data.table(read.csv(gzfile("filename.csv.gz","r")))
>>> Various
combinations of gzfile, gzcon, file, readLines, and
>>> textConnection
all produce an error (invalid input). Is there a better
>>> way to read
in large, compressed files?
>>>
>>> -------
>>> Nathaniel Graham
>>>
npgraham1 at gmail.com [1]
>>> npgraham1 at uky.edu [2]
Links:
------
[1]
mailto:npgraham1 at gmail.com
[2] mailto:npgraham1 at uky.edu
[3]
mailto:npgraham1 at gmail.com
[4] mailto:npgraham1 at uky.edu
[5]
mailto:mdowle at mdowle.plus.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/datatable-help/attachments/20130403/9ce46d20/attachment.html>
More information about the datatable-help
mailing list