[datatable-help] data.table on existing data.frame list

Matthew Dowle mdowle at mdowle.plus.com
Tue Aug 6 02:37:04 CEST 2013


The comments are really a banner at the start of the file it seems. So 
this is all built in to fread already. But the banner in the example is 
34 rows, so the default of autostart=30 isn't enough.  Try:

     fread("03217500.exsa.rsb", autostart=40)

That should do it in one shot, including detecting the column names. 
I've just increased autostart a bit to be within the data block.  See 
?fread for a detailed description of autostart and the procedure.

Btw, if there is more than one table in a single file,  then setting 
autostart to be within each one is how to read each one in.  And 
provided there is no footer, you can set autostart to be very large, too 
(with downside of time to seek back from the end to find the column names).

Matthew

On 05/08/13 20:52, jim holtman wrote:
> Here is what I would do.  Read in the file, delete the comments, write 
> it back out and then process it.
>
>
> > myFile <- tempfile()  # temp file
> > input <- readLines('/temp/dv.txt')  # this is a copy of the data you 
> posted
> > # remove comments
> > input <- input[!grepl("^#", input)]
> > require(data.table)
> Loading required package: data.table
> data.table 1.8.8  For help type: help("data.table")
> > writeLines(input, myFile)
> > dv <- fread(myFile)
>
> >
> > str(dv)
> Classes 'data.table' and 'data.frame':  367 obs. of  21 variables:
>  $ agency_cd        : chr  "5s" "USGS" "USGS" "USGS" ...
>  $ site_no          : chr  "15s" "02169570" "02169570" "02169570" ...
>  $ datetime         : chr  "20d" "2012-08-04" "2012-08-05" 
> "2012-08-06" ...
>  $ 04_00095_00001   : chr  "14n" "" "" "" ...
>  $ 04_00095_00001_cd: chr  "10s" "" "" "" ...
>  $ 04_00095_00002   : chr  "14n" "" "" "" ...
>  $ 04_00095_00002_cd: chr  "10s" "" "" "" ...
>  $ 04_00095_00003   : chr  "14n" "" "" "" ...
>  $ 04_00095_00003_cd: chr  "10s" "" "" "" ...
>  $ 05_00065_00001   : chr  "14n" "2.10" "1.71" "1.77" ...
>  $ 05_00065_00001_cd: chr  "10s" "A" "A" "A" ...
>  $ 05_00065_00002   : chr  "14n" "1.71" "1.56" "1.57" ...
>  $ 05_00065_00002_cd: chr  "10s" "A" "A" "A" ...
>  $ 05_00065_00003   : chr  "14n" "1.89" "1.62" "1.63" ...
>  $ 05_00065_00003_cd: chr  "10s" "A" "A" "A" ...
>  $ 15_00060_00001   : chr  "14n" "52" "33" "36" ...
>  $ 15_00060_00001_cd: chr  "10s" "A" "A" "A" ...
>  $ 15_00060_00002   : chr  "14n" "33" "27" "27" ...
>  $ 15_00060_00002_cd: chr  "10s" "A" "A" "A" ...
>  $ 15_00060_00003   : chr  "14n" "42" "29" "30" ...
>  $ 15_00060_00003_cd: chr  "10s" "A" "A" "A" ...
>  - attr(*, ".internal.selfref")=<externalptr>
>
>
>
> On Mon, Aug 5, 2013 at 3:38 PM, iembry <iruckaE at mail2world.com 
> <mailto:iruckaE at mail2world.com>> wrote:
>
>     Hi Matthew, this link is in a similar format to the files that I'm
>     processing
>     now:
>     http://waterdata.usgs.gov/nwis/dv?cb_00095=on&cb_00065=on&cb_00060=on&format=rdb&period=&begin_date=2012-08-04&end_date=2013-08-04&site_no=02169570&referred_module=sw
>
>     Both file formats begin with the comments followed by the column names
>     followed by agency code information and then the actual data.
>
>     The .rdb text files vary in length (some may range from a few
>     hundred lines
>     long to over 20,000 lines). I am given the files that I am processing.
>
>     Thank you.
>
>     Irucka
>
>
>
>
>
>
>
>     --
>     View this message in context:
>     http://r.789695.n4.nabble.com/data-table-on-existing-data-frame-list-tp4673142p4673181.html
>     Sent from the datatable-help mailing list archive at Nabble.com.
>     _______________________________________________
>     datatable-help mailing list
>     datatable-help at lists.r-forge.r-project.org
>     <mailto:datatable-help at lists.r-forge.r-project.org>
>     https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
>
>
>
>
> -- 
> Jim Holtman
> Data Munger Guru
>
> What is the problem that you are trying to solve?
> Tell me what you want to do, not how you want to do it.
>
>
> _______________________________________________
> datatable-help mailing list
> datatable-help at lists.r-forge.r-project.org
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/datatable-help/attachments/20130806/5de5203a/attachment.html>


More information about the datatable-help mailing list