[datatable-help] data.table on existing data.frame list

Matthew Dowle mdowle at mdowle.plus.com
Tue Aug 6 10:49:44 CEST 2013


On 06/08/13 03:12, iembry wrote:
> Hi Matthew, thank you for your prompt and great assistance.
>
> Yes, moving the autostart = 40 does work. Yes, it did detect the column
> names.
Great.
>
> In order to read in the .exsa.rdb files I created a function that follows
>
> getDataRatingDepotFiles <- function (file, hasHeader = TRUE, separator =
> "\t")
> {
>      RDdatatmp <- as.matrix(read.table(file, sep = "\t", fill = TRUE,
> comment.char = "#", header = T, as.is = TRUE, stringsAsFactors = FALSE,
> na.strings = "NA", col.names = c("y", "shift", "x", "stor")))
>      RDdatatmp <- as.matrix(RDdatatmp[c(-1), c(-4)])
>      RDdatatmp <- as.data.frame(RDdatatmp, stringsAsFactors = FALSE)
>      RDdatatmp$y <- as.numeric(as.character(RDdatatmp$y))
>      RDdatatmp$x <- as.numeric(as.character(RDdatatmp$x))
>      RDdatatmp$shift <- as.numeric(as.character(RDdatatmp$shift))
>      return(RDdatatmp)
> }
>
> I created an object called sitefiles that has the pattern of the file
> extension that I want. In the same folder there are files with two other
> file extensions that I do not want to use in this project.
>
> sitefiles <- list.files(path ="/tried", pattern <- ".exsa.rdb$", full.names
> = TRUE)
> getratings <- lapply(sitefiles, getDataRatingDepotFiles)
>
> Is there any way to replicate the above with fread?
I don't follow.  fread reads the file. 'select' arg can be used to 
select columns,  or you can use setnames() afterwards to rename them.  
fread doesn't create factors anyway. The numeric columns should be 
detected automatically but you can pass 'colClasses' manually to fread 
if you need to read integer data as a numeric type, in the latest 
version. Or are you asking if fread can read multiple files?


>
> Irucka
>
>
>
>
>
>
>
>
> The comments are really a banner at the start of the file it seems. So this
> is all built in to fread already. But the banner in the example is 34 rows,
> so the default of autostart=30 isn't enough.  Try:
>
>      fread("03217500.exsa.rsb", autostart=40)
>
> That should do it in one shot, including detecting the column names. I've
> just increased autostart a bit to be within the data block.  See ?fread for
> a detailed description of autostart and the procedure.
>
> Btw, if there is more than one table in a single file,  then setting
> autostart to be within each one is how to read each one in.  And provided
> there is no footer, you can set autostart to be very large, too (with
> downside of time to seek back from the end to find the column names).
>
> Matthew
>
>
>
> --
> View this message in context: http://r.789695.n4.nabble.com/data-table-on-existing-data-frame-list-tp4673142p4673201.html
> Sent from the datatable-help mailing list archive at Nabble.com.
> _______________________________________________
> datatable-help mailing list
> datatable-help at lists.r-forge.r-project.org
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
>



More information about the datatable-help mailing list