<div dir="ltr">I don't know how to do this with fread, but it sounds like a good feature request.<div><br></div><div style>If you want to do this in R (without fread), you could use readLines to read until you get to the header, count the number of lines it took and use 'skip' param in read.table to read the file in. I think I remember seeing smth like that done on SO at some point, but you can always post there to get more advice as there is generally more people who'll be able to help you there.</div>
</div><div class="gmail_extra"><br><br><div class="gmail_quote">On Sun, Jun 30, 2013 at 3:21 AM, Harish <span dir="ltr"><<a href="mailto:harishv_99@yahoo.com" target="_blank">harishv_99@yahoo.com</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div><div style="font-size:12pt;font-family:times new roman,new york,times,serif"><div>Hi,</div><div><br></div><div>I am wondering whether it is possible to read a file using fread() with:</div>
<div>1) Multiple header lines, and<br></div><div>2) Multiple whitespace characters separating fields<br></div><div><br></div><div>The sample of the input file is as follows:</div><div>-------------</div><div>Garbage header information</div>
<div>that I need to skip when reading...</div><div>Number of lines here are variable.<br></div><div><br></div><div> Serial_Number PHIv Lu/W <br> (-) (lm)
(lm/W)<br> ABCDEFG 27.0264 103.58</div><div> HIJKLMNO 33.9143 91.03</div><div><br></div><div>Some footer information</div><div>that spans multiple lines<br></div><div>-------------</div><div><br></div>
<div>To handle the multiple lines of headers, I would have to read the file using fread() first, reprocess the file using a similar algorithm to identify the actual header -- i.e. one line above what fread() would identify as the header, then throw away the names of the columns fread() created and rename it to the actual ones I find. However, this seems to be highly inefficient since I would replicate what fread() did within R -- not to mention I do not quite know how to do that.<br>
</div><div><br></div><div>As far as handling the multiple (and variable) spaces for separator, I do not see fread() being able to
handle this either. read.table() however does with the default sep="" value. Of course, that does not handle the garbage headers and footers that fread() so beautifully avoids with its autostart algorithm.</div>
<div><br></div><div>Any suggestions as to how I would do this easily? I have lots of these files to read, and doing manual editing is not desirable. If there is a hack I can do with fread(), that would be ideal.<br></div>
<div><br></div><div>Thanks a lot for your help.</div><div><br></div><div><br></div><div>Regards,</div><div>Harish</div><div><br></div></div></div><br>_______________________________________________<br>
datatable-help mailing list<br>
<a href="mailto:datatable-help@lists.r-forge.r-project.org">datatable-help@lists.r-forge.r-project.org</a><br>
<a href="https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help" target="_blank">https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help</a><br></blockquote></div><br></div>