[Rcpp-devel] Read csv and export object in R
ogami musashi
uragami at hotmail.com
Tue Apr 21 11:01:17 CEST 2015
Hello Dirk,
Got it sorted, the basic problem was that the output matrix's dimensions
has to be defined precisely.
I had some problems with first line (col names) and first columns (row
names).
But it works now.
Benchmarks against fread shows the code i use returns a lighter object
(a simple matrix) and thus processes faster.
reading 400 16,2Mb files with a 6 cores took 177,949 seconds with the
cpp function and 228.231 seconds with fread.
Neadless to say both are considerably faster than read.table (took 21669
seconds!) and read_csv from readr package (took about the same).
I know it would be better to contribute an rcpp gallery but for now i
just have time to post the code here:
#include <Rcpp.h>
#include <fstream>
#include <sstream>
#include <string>
using namespace Rcpp;
//Function is taking a path to a numeric file and return the same data
in a NumericMatrix object
// [[Rcpp::export]]
NumericMatrix readfilecpp(std::string path)
{
NumericMatrix output(20,46749);// output matrix (specifying the size is
critical otherwise R crashes)
std::ifstream myfile(path.c_str()); //Opens the file. c_str is mandatory
here so that ifstream accepts the string path
std::string line;
std::getline(myfile,line,'\n'); //skip the first line (col names in our
case). Remove those lines if note necessary
for (int row=0; row<20; ++row) // basic idea: getline() will read lines
row=0:19 and for each line will put the value separated by ',' into
46749 columns
{
std::string line;
std::getline(myfile,line,'\n'); //Starts at the second line because
the first one was ditched previously
if(!myfile.good() ) //If end of rows then break
break;
std::stringstream iss(line); // take the line into a stringstream
std::string val;
std::getline(iss,val,','); ///skips the first column (row names)
for (int col=0; col<46749; ++col )
{
std::string val;
std::getline(iss,val,','); //reads the stringstream line and
separate it into 49749 values (that were delimited by a ',' in the
stringstream)
std::stringstream convertor(val); //get the results into another
stringstream 'convertor'
convertor >> output(row,col); //put the result into our output
matrix at for the actual row and col
}
}
return(output);
}
On 20/04/15 13:16, Dirk Eddelbuettel wrote:
> On 20 April 2015 at 12:01, ogami musashi wrote:
> | Problem is..i have 400 object of 16,5 Mb each. and it take about 6 hours
> | to reimport in R! I use the readr package as this is the fastest base
> | function in R.
>
> a) readr != base R
>
> b) fread in package data.table is considered the fastest reader function
>
> | I adapted a C++ code to use Rcpp, it compiles but when using it it
> | crashes R:
>
> I fear you may have to debug that yourself. As for speed, you won't be able
> to beat fread which has been optimised for this for years and uses mmap and
> other tricks.
>
> Dirk
>
More information about the Rcpp-devel
mailing list