[datatable-help] Is data.table efficient for sparse data?

Steve Lianoglou lianoglou.steve at gene.com
Mon Jul 8 19:02:52 CEST 2013


Hi,

On Mon, Jul 8, 2013 at 2:30 AM, Huashan Chen <chenhuashan at gmail.com> wrote:
>  have a huge (over 10 million rows, 5000 columns) sparse matrix stored as
> simple_triplet_matrix. I also want to utilize the fast index feature (among
> others) of data.table. So I am wondering if data.table is memory efficient
> for sparse data? And, is there anyway to convert simple_triplet_matrix into
> data.table without the intermediate as.matrix() operation which is way too
> slow.

To help get a better idea of what you are after, can you explain some
sample queries you'd like to use that you think would leverage
data.table's fast indexing?

There is no "sparse data.table" type of support, so you'd have to have
a "full" data.table with elements for all rows and all columns -- you
can always create your triplet data (row,col,val) as a data.table,
which may or not be helpful depending on what you want to do with this
thing, which is why I'm asking about some example queries you'd want
to run.

-steve

--
Steve Lianoglou
Computational Biologist
Bioinformatics and Computational Biology
Genentech


More information about the datatable-help mailing list