[Rsiena-help] RSiena on a computer cluster
Ruth Ripley
ruth at stats.ox.ac.uk
Sat Sep 21 11:39:35 CEST 2013
Dear Tobias,
After a few hours experiments, I have not finished one complete run yet,
but have some initial comments about the odd behaviour you report. For
the benefit of Tom and Mark I have included some more technical comments.
First, I had memory problems on a machine with 8 GB. Creating the data
and running siena07 straight afterwards resulted in a master process
using 5.5 GB and subprocesses using 1 GB each. I failed to get a single
run with nbrNodes=2 to function on a machine with 8 GB. I recommend you
run two R jobs: one to create the siena data object, model and effects
object and save just these three and another job which just loads these
3 objects and RSiena and runs siena07. Each process then uses about 1
GB, startup is much quicker and I have 4 processes running happily on
the 8 GB machine. (1 with useCluster=FALSE, 2 from a run with
nbrNodes=2, and 1 with useCluster=FALSE and dolby=FALSE in the model.)
Secondly, the model is running into estimation problems in phase 2. The
subphases are being repeated different numbers of times, so the total
number of iterations in phase 2 ends up varying a lot.
Thirdly, with dolby=TRUE the phase 2 iterations are taking longer
because they need to accumulate the scores - hence my experiment with
turning it off. This estimation is winning at the moment (nearing the
end of phase 2.2) largely because it restarted (after 50 iterations)
twice in phase 2.1 and then terminated the phase after another 50, while
the other single process did 338 slower iterations in phase 2.1 with no
restarts and the job with nbrNodes=2 did 269 iterations in phase 2.1 and
has restarted phase 2.2 twice after 50 iterations.
It does not seem to be communication that is the problem with the extra
need for scores, but the calculation of them.
I think in phase 1 the expected behaviour does occur, but the memory
issues may reduce the benefit by causing more paging.
The networks are undirected and using model type 3 - this adds a little
extra processing to each iteration but does not seem to be the main
cause of the unpredictable timings.
Regards,
Ruth
On 18/09/2013 17:56, Ruth Ripley wrote:
> Dear Tobias,
>
> Parallel processing always seems to have costs which can noticeably
> reduce the expected time savings. In the case of RSiena (normal forward
> processing) with the parallel package the major extra costs are 1) every
> sub-process is sent a copy of the data - this will take a while to set
> up and need a fair amount of memory if the data is large (but luckily
> only needs to be done once) and 2) every sub-process needs to receive
> the parameters at each iteration and return the statistics and scores -
> this is not a great overhead.
>
> In fact if one is using linux or Mac, it is not necessary to send the
> data across explicitly, but I did it because one must on Windows, and
> because it is a one-off cost. Any data that is updated during the
> iteration will need to be copied by the operating system, though.
>
> In my experience there is a consistent benefit, provided a few 'serious'
> effects are being fitted, unless the processes overwhelm the CPU's
> available and compete.
>
> I would be interested to see if I can replicate the behaviour you
> describe - if you would like to send me your data and commands (please
> email them direct to me), I will investigate. It may be that the rather
> random element - length of time in phase 2 (where the controller must
> wait for the slowest process at each iteration, and the end conditions
> depend on the results of the simulations) is causing the strange results.
>
> The actual communication speeds will vary between architectures. What
> operating system were you using? And which versions of R and RSiena?
>
> Regards,
>
> Ruth
>
>
>
> On 18/09/2013 17:14, Tom Snijders wrote:
>> Dear Tobias,
>>
>> The multicluster option in RSiena is based on the R package parallel. It has the disadvantage of requiring rather much communication between the processors. How this works out in practice depends strongly on the hardware configuration. In my experience, using multiple processes does have an advantage over the use of only one process. I would guess that a really large number makes no difference, and 16 seems already a quite large number in this respect. The result that using 8 processes takes more time than 1, and 16 takes less time, seems to me totally hardware specific.
>> But I do not know a lot about this, and if anybody else can correct me or say more specific things, that would be great.
>>
>> We are still hoping that the settings model will be implemented some time in the future, which should be much more reasonable and less time-consuming for large networks. But this is not yet nearby.
>>
>> Best wishes,
>> Tom
>>
>>
>> ================================================================
>> Tom A.B. Snijders
>> Professor of Statistics in the Social Sciences
>> Department of Politics and Department of Statistics
>> Nuffield College
>> University of Oxford
>> tel. +44-01865-278599
>>
>>
>> -----Original Message-----
>> From: rsiena-help-bounces at lists.r-forge.r-project.org [mailto:rsiena-help-bounces at lists.r-forge.r-project.org] On Behalf Of Tobias Stark
>> Sent: 18 September 2013 06:55
>> To: rsiena-help at lists.r-forge.r-project.org
>> Subject: [Rsiena-help] RSiena on a computer cluster
>>
>> Dear RSiena developers,
>>
>> I hope to increase the speed of my analyses using a computer cluster. I ran the exact same test analysis with a large network (approx. 1,000 nodes) and varied the number of cores on which SIENA could run. I noticed that there was hardly any gain in speed using more cores. In fact, the analysis took longer when I ran it on 8 cores instead of 4 cores (no matter if the cores where on one machine or distributed across the cluster). The analyses where considerably faster on 16 cores but using 26 or even 32 cores did not result in quicker results
>>
>> I wonder if there is a restriction within SIENA that prevents additional gains in speed with more cores or if the problem lies with the communication between machines in the computer cluster. Do you have a hint for me?
>>
>> Thanks,
>> Tobias
>> _______________________________________________
>> Rsiena-help mailing list
>> Rsiena-help at lists.r-forge.r-project.org
>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/rsiena-help
>>
>> Nuffield College is a Registered Charity No. 1137506. Registered Office: Nuffield College, New Road, Oxford, OX1 1NF
>> _______________________________________________
>> Rsiena-help mailing list
>> Rsiena-help at lists.r-forge.r-project.org
>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/rsiena-help
>>
>
--
Ruth M. Ripley, Email:ruth at stats.ox.ac.uk
Dept. of Statistics, http://www.stats.ox.ac.uk/~ruth/
University of Oxford, Tel: 01865 282857
1 South Parks Road, Oxford OX1 3TG, UK Fax: 01865 272595
More information about the Rsiena-help
mailing list