[datatable-help] fread -- multiple header lines and multiple whitespace characters

Sun Jun 30 10:21:36 CEST 2013

Hi,

I am wondering whether it is possible to read a file using fread() with:
1) Multiple header lines, and

2) Multiple whitespace characters separating fields

The sample of the input file is as follows:
-------------
Garbage header information
that I need to skip when reading...
Number of lines here are variable.

             Serial_Number   PHIv     Lu/W     
                    (-)      (lm)     (lm/W)
           ABCDEFG  27.0264 103.58
           HIJKLMNO  33.9143  91.03

Some footer information
that spans multiple lines

-------------

To handle the multiple lines of headers, I would have to read the file using fread() first, reprocess the file using a similar algorithm to identify the actual header -- i.e. one line above what fread() would identify as the header, then throw away the names of the columns fread() created and rename it to the actual ones I find.  However, this seems to be highly inefficient since I would replicate what fread() did within R -- not to mention I do not quite know how to do that.

As far as handling the multiple (and variable) spaces for separator, I do not see fread() being able to handle this either.  read.table() however does with the default sep="" value.  Of course, that does not handle the garbage headers and footers that fread() so beautifully avoids with its autostart algorithm.

Any suggestions as to how I would do this easily?  I have lots of these files to read, and doing manual editing is not desirable.  If there is a hack I can do with fread(), that would be ideal.

Thanks a lot for your help.

Regards,
Harish
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/datatable-help/attachments/20130630/8b5522b5/attachment.html>