[datatable-help] fread -- multiple header lines and multiple whitespace characters

Eduard Antonyan eduard.antonyan at gmail.com
Tue Jul 2 17:29:57 CEST 2013


I don't know how to do this with fread, but it sounds like a good feature
request.

If you want to do this in R (without fread), you could use readLines to
read until you get to the header, count the number of lines it took and use
'skip' param in read.table to read the file in. I think I remember seeing
smth like that done on SO at some point, but you can always post there to
get more advice as there is generally more people who'll be able to help
you there.


On Sun, Jun 30, 2013 at 3:21 AM, Harish <harishv_99 at yahoo.com> wrote:

> Hi,
>
> I am wondering whether it is possible to read a file using fread() with:
> 1) Multiple header lines, and
> 2) Multiple whitespace characters separating fields
>
> The sample of the input file is as follows:
> -------------
> Garbage header information
> that I need to skip when reading...
> Number of lines here are variable.
>
>              Serial_Number   PHIv     Lu/W
>                     (-)      (lm)     (lm/W)
>            ABCDEFG  27.0264 103.58
>            HIJKLMNO  33.9143  91.03
>
> Some footer information
> that spans multiple lines
> -------------
>
> To handle the multiple lines of headers, I would have to read the file
> using fread() first, reprocess the file using a similar algorithm to
> identify the actual header -- i.e. one line above what fread() would
> identify as the header, then throw away the names of the columns fread()
> created and rename it to the actual ones I find.  However, this seems to be
> highly inefficient since I would replicate what fread() did within R -- not
> to mention I do not quite know how to do that.
>
> As far as handling the multiple (and variable) spaces for separator, I do
> not see fread() being able to handle this either.  read.table() however
> does with the default sep="" value.  Of course, that does not handle the
> garbage headers and footers that fread() so beautifully avoids with its
> autostart algorithm.
>
> Any suggestions as to how I would do this easily?  I have lots of these
> files to read, and doing manual editing is not desirable.  If there is a
> hack I can do with fread(), that would be ideal.
>
> Thanks a lot for your help.
>
>
> Regards,
> Harish
>
>
> _______________________________________________
> datatable-help mailing list
> datatable-help at lists.r-forge.r-project.org
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/datatable-help/attachments/20130702/8fb5e48d/attachment.html>


More information about the datatable-help mailing list