[datatable-help] datatable-help Digest, Vol 41, Issue 3

Gabor Grothendieck ggrothendieck at gmail.com
Thu Jul 4 15:59:38 CEST 2013


On Wed, Jul 3, 2013 at 6:56 AM, Paul Harding <p.harding at paniscus.com> wrote:
> For me, in a similar context, this would be particularly useful with SQL
> Server output, where if you need head headers it's not possible to lose the
> second line of underlining:
>
> header1 header2 header3
> ------- ------- -------
> tom   dick   harry
>
> and possibly for other flavours of SQL too. For the huge files (20GB) I use
> fread for I use a perl script, for smaller ones
>   df <- read.csv(con, header=F, skip=2, na.strings="NULL")
>   names(df)<-do.call(rbind,(strsplit(readLines(con,1),",")))[1,]
>
> Such a pain. So as this is an SQL server 'feature' it would be really useful
> if fread could discard unwanted lines of header. Perhaps a regexp parameter?
>

1. If fread supported read.table's comment.char argument and extended
that to allow regular exprexsions or longer strings than just one
character that might do it; however,.iIt might have a performance
impact.

2. In the development version of data.table on R-Forge there is a
skip= argument to fread which would let one do something analogous to
what you show in your post.

3. One possible extension to fread that would address this and other
variations would be to allow connections. For example, this works with
read.table:

    read.table(pipe("sed 2d myfile.txt"), header = TRUE)

(assuming UNIX or Windows with Rtools installed).

[I didn't see this show up the first time I posted so I am re-posting.
 Hopefully it does not show up twice.]


More information about the datatable-help mailing list