[Sprint-developer] Using SPRINT random forest when data are distributed on difference computational nodes

Mon May 6 12:41:48 CEST 2013

A SPRINT user asked
=================

"I am using your R package SPRINT. I wonder if it can be used to train a 
random forest when data are distributed on difference 
computationalnodes. Thanks a lot."

A member of the SPRINT team replied
================================

"First of all, thanks for the interest in SPRINT.  Currently, the random 
forest parallelisation is unable to train the forest when data are 
already distributed.

The parallelisation approach is to parallelise across the requested 
number of trees in the forest.  The data should start out on the master 
R process, and will be distributed to all computational nodes.

This implementation choice was made because existing parallel algorithms 
for growing single decision trees are not well suited to the data we 
were initially focussed on (microarrays).  For more details, see 
Mitchell et al. CCPE 2012 (DOI: 10.1002/cpe.2928)."

-- 
----------------------------------------------------------------------
  Terry Sloan                        Email: t.sloan at epcc.ed.ac.uk
  EPCC                               Phone: +44 131 650 5155
                                     WWW  : http://www.epcc.ed.ac.uk/
----------------------------------------------------------------------

The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.