<div dir="ltr">For me, in a similar context, this would be particularly useful with SQL Server output, where if you need head headers it's not possible to lose the second line of underlining:<div><br></div><div style>header1 header2 header3</div>
<div style>------- ------- -------</div><div style>tom dick harry</div><div style><br></div><div style>and possibly for other flavours of SQL too. For the huge files (20GB) I use fread for I use a perl script, for smaller ones</div>
<div style><div> df <- read.csv(con, header=F, skip=2, na.strings="NULL")</div><div> names(df)<-do.call(rbind,(strsplit(readLines(con,1),",")))[1,]</div><div style><br></div><div style>Such a pain. So as this is an SQL server 'feature' it would be really useful if fread could discard unwanted lines of header. Perhaps a regexp parameter?</div>
<div style><br></div><div style>Regards</div><div style>Paul</div><div><br></div></div><div style><br></div></div><div class="gmail_extra"><br><br><div class="gmail_quote">On 3 July 2013 11:00, <span dir="ltr"><<a href="mailto:datatable-help-request@lists.r-forge.r-project.org" target="_blank">datatable-help-request@lists.r-forge.r-project.org</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Send datatable-help mailing list submissions to<br>
<a href="mailto:datatable-help@lists.r-forge.r-project.org">datatable-help@lists.r-forge.r-project.org</a><br>
<br>
To subscribe or unsubscribe via the World Wide Web, visit<br>
<a href="https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help" target="_blank">https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help</a><br>
<br>
or, via email, send a message with subject or body 'help' to<br>
<a href="mailto:datatable-help-request@lists.r-forge.r-project.org">datatable-help-request@lists.r-forge.r-project.org</a><br>
<br>
You can reach the person managing the list at<br>
<a href="mailto:datatable-help-owner@lists.r-forge.r-project.org">datatable-help-owner@lists.r-forge.r-project.org</a><br>
<br>
When replying, please edit your Subject line so it is more specific<br>
than "Re: Contents of datatable-help digest..."<br>
<br>
<br>
Today's Topics:<br>
<br>
1. Re: fread -- multiple header lines and multiple whitespace<br>
characters (Eduard Antonyan)<br>
<br>
<br>
----------------------------------------------------------------------<br>
<br>
Message: 1<br>
Date: Tue, 2 Jul 2013 10:29:57 -0500<br>
From: Eduard Antonyan <<a href="mailto:eduard.antonyan@gmail.com">eduard.antonyan@gmail.com</a>><br>
To: Harish <<a href="mailto:harishv_99@yahoo.com">harishv_99@yahoo.com</a>><br>
Cc: "<a href="mailto:datatable-help@lists.r-forge.r-project.org">datatable-help@lists.r-forge.r-project.org</a>"<br>
<<a href="mailto:datatable-help@lists.r-forge.r-project.org">datatable-help@lists.r-forge.r-project.org</a>><br>
Subject: Re: [datatable-help] fread -- multiple header lines and<br>
multiple whitespace characters<br>
Message-ID:<br>
<CAHZcBOpkh+05wNLYD17YQxXx+JbOL3SmkwoP+Y=<a href="mailto:dWZ5hNEKzog@mail.gmail.com">dWZ5hNEKzog@mail.gmail.com</a>><br>
Content-Type: text/plain; charset="iso-8859-1"<br>
<br>
I don't know how to do this with fread, but it sounds like a good feature<br>
request.<br>
<br>
If you want to do this in R (without fread), you could use readLines to<br>
read until you get to the header, count the number of lines it took and use<br>
'skip' param in read.table to read the file in. I think I remember seeing<br>
smth like that done on SO at some point, but you can always post there to<br>
get more advice as there is generally more people who'll be able to help<br>
you there.<br>
<br>
<br>
On Sun, Jun 30, 2013 at 3:21 AM, Harish <<a href="mailto:harishv_99@yahoo.com">harishv_99@yahoo.com</a>> wrote:<br>
<br>
> Hi,<br>
><br>
> I am wondering whether it is possible to read a file using fread() with:<br>
> 1) Multiple header lines, and<br>
> 2) Multiple whitespace characters separating fields<br>
><br>
> The sample of the input file is as follows:<br>
> -------------<br>
> Garbage header information<br>
> that I need to skip when reading...<br>
> Number of lines here are variable.<br>
><br>
> Serial_Number PHIv Lu/W<br>
> (-) (lm) (lm/W)<br>
> ABCDEFG 27.0264 103.58<br>
> HIJKLMNO 33.9143 91.03<br>
><br>
> Some footer information<br>
> that spans multiple lines<br>
> -------------<br>
><br>
> To handle the multiple lines of headers, I would have to read the file<br>
> using fread() first, reprocess the file using a similar algorithm to<br>
> identify the actual header -- i.e. one line above what fread() would<br>
> identify as the header, then throw away the names of the columns fread()<br>
> created and rename it to the actual ones I find. However, this seems to be<br>
> highly inefficient since I would replicate what fread() did within R -- not<br>
> to mention I do not quite know how to do that.<br>
><br>
> As far as handling the multiple (and variable) spaces for separator, I do<br>
> not see fread() being able to handle this either. read.table() however<br>
> does with the default sep="" value. Of course, that does not handle the<br>
> garbage headers and footers that fread() so beautifully avoids with its<br>
> autostart algorithm.<br>
><br>
> Any suggestions as to how I would do this easily? I have lots of these<br>
> files to read, and doing manual editing is not desirable. If there is a<br>
> hack I can do with fread(), that would be ideal.<br>
><br>
> Thanks a lot for your help.<br>
><br>
><br>
> Regards,<br>
> Harish<br>
><br>
><br>
> _______________________________________________<br>
> datatable-help mailing list<br>
> <a href="mailto:datatable-help@lists.r-forge.r-project.org">datatable-help@lists.r-forge.r-project.org</a><br>
> <a href="https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help" target="_blank">https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help</a><br>
><br>
-------------- next part --------------<br>
An HTML attachment was scrubbed...<br>
URL: <<a href="http://lists.r-forge.r-project.org/pipermail/datatable-help/attachments/20130702/8fb5e48d/attachment-0001.html" target="_blank">http://lists.r-forge.r-project.org/pipermail/datatable-help/attachments/20130702/8fb5e48d/attachment-0001.html</a>><br>
<br>
------------------------------<br>
<br>
_______________________________________________<br>
datatable-help mailing list<br>
<a href="mailto:datatable-help@lists.r-forge.r-project.org">datatable-help@lists.r-forge.r-project.org</a><br>
<a href="https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help" target="_blank">https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help</a><br>
<br>
End of datatable-help Digest, Vol 41, Issue 3<br>
*********************************************<br>
</blockquote></div><br></div>