[Sprint-developer] Using SPRINT random forest when data are distributed on difference computational nodes
Terry Sloan
tms at epcc.ed.ac.uk
Mon May 6 12:41:48 CEST 2013
A SPRINT user asked
=================
"I am using your R package SPRINT. I wonder if it can be used to train a
random forest when data are distributed on difference
computationalnodes. Thanks a lot."
A member of the SPRINT team replied
================================
"First of all, thanks for the interest in SPRINT. Currently, the random
forest parallelisation is unable to train the forest when data are
already distributed.
The parallelisation approach is to parallelise across the requested
number of trees in the forest. The data should start out on the master
R process, and will be distributed to all computational nodes.
This implementation choice was made because existing parallel algorithms
for growing single decision trees are not well suited to the data we
were initially focussed on (microarrays). For more details, see
Mitchell et al. CCPE 2012 (DOI: 10.1002/cpe.2928)."
--
----------------------------------------------------------------------
Terry Sloan Email: t.sloan at epcc.ed.ac.uk
EPCC Phone: +44 131 650 5155
WWW : http://www.epcc.ed.ac.uk/
----------------------------------------------------------------------
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.
More information about the Sprint-developer
mailing list