[datatable-help] fread -- multiple header lines and multiple whitespace characters
Harish
harishv_99 at yahoo.com
Sun Jun 30 10:21:36 CEST 2013
Hi,
I am wondering whether it is possible to read a file using fread() with:
1) Multiple header lines, and
2) Multiple whitespace characters separating fields
The sample of the input file is as follows:
-------------
Garbage header information
that I need to skip when reading...
Number of lines here are variable.
Serial_Number PHIv Lu/W
(-) (lm) (lm/W)
ABCDEFG 27.0264 103.58
HIJKLMNO 33.9143 91.03
Some footer information
that spans multiple lines
-------------
To handle the multiple lines of headers, I would have to read the file using fread() first, reprocess the file using a similar algorithm to identify the actual header -- i.e. one line above what fread() would identify as the header, then throw away the names of the columns fread() created and rename it to the actual ones I find. However, this seems to be highly inefficient since I would replicate what fread() did within R -- not to mention I do not quite know how to do that.
As far as handling the multiple (and variable) spaces for separator, I do not see fread() being able to handle this either. read.table() however does with the default sep="" value. Of course, that does not handle the garbage headers and footers that fread() so beautifully avoids with its autostart algorithm.
Any suggestions as to how I would do this easily? I have lots of these files to read, and doing manual editing is not desirable. If there is a hack I can do with fread(), that would be ideal.
Thanks a lot for your help.
Regards,
Harish
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/datatable-help/attachments/20130630/8b5522b5/attachment.html>
More information about the datatable-help
mailing list