[Rsiena-help] RSiena on a computer cluster

Ruth Ripley ruth at stats.ox.ac.uk
Wed Sep 18 18:56:27 CEST 2013


Dear Tobias,

Parallel processing always seems to have costs which can noticeably
reduce the expected time savings. In the case of RSiena (normal forward
processing) with the parallel package the major extra costs are 1) every
sub-process is sent a copy of the data - this will take a while to set
up and need a fair amount of memory if the data is large (but luckily
only needs to be done once) and 2) every sub-process needs to receive
the parameters at each iteration and return the statistics and scores -
this is not a great overhead.

In fact if one is using linux or Mac, it is not necessary to send the
data across explicitly, but I did it because one must on Windows, and
because it is a one-off cost. Any data that is updated during the
iteration will need to be copied by the operating system, though.

In my experience there is a consistent benefit, provided a few 'serious'
effects are being fitted, unless the processes overwhelm the CPU's
available and compete.

I would be interested to see if I can replicate the behaviour you
describe - if you would like to send me your data  and commands (please
email them direct to me), I will investigate. It may be that the rather
random element - length of time in phase 2 (where the controller must
wait for the slowest process at each iteration, and the end conditions
depend on the results of the simulations) is causing the strange results.

The actual communication speeds will vary between architectures. What
operating system were you using? And which versions of R and RSiena?

Regards,

Ruth



On 18/09/2013 17:14, Tom Snijders wrote:
> Dear Tobias,
> 
> The multicluster option in RSiena is based on the R package parallel. It has the disadvantage of requiring rather much communication between the processors. How this works out in practice depends strongly on the hardware configuration. In my experience, using multiple processes does have an advantage over the use of only one process. I would guess that a really large number makes no difference, and 16 seems already a quite large number in this respect. The result that using 8 processes takes more time than 1, and 16 takes less time, seems to me totally hardware specific.
> But I do not know a lot about this, and if anybody else can correct me or say more specific things, that would be great.
> 
> We are still hoping that the settings model will be implemented some time in the future, which should be much more reasonable and less time-consuming for large networks. But this is not yet nearby.
> 
> Best wishes,
> Tom
> 
> 
> ================================================================
> Tom A.B. Snijders
> Professor of Statistics in the Social Sciences
> Department of Politics and Department of Statistics
> Nuffield College
> University of Oxford
> tel. +44-01865-278599
> 
> 
> -----Original Message-----
> From: rsiena-help-bounces at lists.r-forge.r-project.org [mailto:rsiena-help-bounces at lists.r-forge.r-project.org] On Behalf Of Tobias Stark
> Sent: 18 September 2013 06:55
> To: rsiena-help at lists.r-forge.r-project.org
> Subject: [Rsiena-help] RSiena on a computer cluster
> 
> Dear RSiena developers,
> 
> I hope to increase the speed of my analyses using a computer cluster. I ran the exact same test analysis with a large network (approx. 1,000 nodes) and varied the number of cores on which SIENA could run. I noticed that there was hardly any gain in speed using more cores. In fact, the analysis took longer when I ran it on 8 cores instead of 4 cores (no matter if the cores where on one machine or distributed across the cluster). The analyses where considerably faster on 16 cores but using 26 or even 32 cores did not result in quicker results
> 
> I wonder if there is a restriction within SIENA that prevents additional gains in speed with more cores or if the problem lies with the communication between machines in the computer cluster. Do you have a hint for me?
> 
> Thanks,
> Tobias
> _______________________________________________
> Rsiena-help mailing list
> Rsiena-help at lists.r-forge.r-project.org
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/rsiena-help
> 
> Nuffield College is a Registered Charity No. 1137506. Registered Office: Nuffield College, New Road, Oxford, OX1 1NF
> _______________________________________________
> Rsiena-help mailing list
> Rsiena-help at lists.r-forge.r-project.org
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/rsiena-help
> 

-- 
Ruth M. Ripley,                         Email:ruth at stats.ox.ac.uk
Dept. of Statistics,                    http://www.stats.ox.ac.uk/~ruth/
University of Oxford,                   Tel:   01865 282857
1 South Parks Road, Oxford OX1 3TG, UK  Fax:   01865 272595



More information about the Rsiena-help mailing list