<div dir="ltr">Hi, <br><br>Glad to see you've been reading the tutorial in such detail! <br><br>These are great questions, and the way you have asked them actually hints at the answer: set.seed() is not inherently linked to multivariate techniques or datasets, but rather with random number generation (more specifically, with getting <i>reproducible</i> results from "random" processes). This is probably why you have seen set.seed come up in the context of bootstrap Monte Carlo procedures! <br>


<br>Essentially, when R is asked to generate a "random" number, it actually generates a pseudo-random number by taking some input and generating an output that seems random. Without being given an input, R does this by using your computer's clock and using the current time as its starting point, from which it generates a seemingly random number. You would not get the same random number at a different time, so we find this adequate to call the process "random" number generation, BUT if in fact you tried to generate two "random" numbers at the exact same time (down to the millisecond), you would actually get the exact same "random" number. (Note: I have glossed over a lot of really interesting things about this process, so if you want to know more about random number generation, please read on here: <a href="http://cran.r-project.org/web/packages/randtoolbox/vignettes/fullpres.pdf" target="_blank">http://cran.r-project.org/web/packages/randtoolbox/vignettes/fullpres.pdf</a> ). <br>


<br>This potential problem with random number generation can occasionally be quite useful in cases where we want to run something that requires random number generation but where we would also like to get the same result each time. <br>


set.seed() is the way we control this. With set.seed(), the "seed" is used as the input to our random number generation (instead of the clock), which allows you to get <i>reproducible </i>"random" numbers. <div>


<br></div><div>Try this example: <br><br><div><div><font color="#073763">rnorm(3)</font></div><div><font color="#073763">rnorm(3)</font></div><div><font color="#073763"><br></font></div><div><font color="#073763">set.seed(1)</font></div>


<div><font color="#073763">rnorm(3)</font></div><div><font color="#073763"><br></font></div><div><font color="#073763">set.seed(1) </font><font color="#38761d"># note: for set.seed() to work, you need to use it before every instance of random number generation. </font></div>


<div><font color="#073763">rnorm(3)</font></div><div><br></div><div>Neat! Having established this, we can now answer your questions about why we use set.seed() where we do in the DAPC tutorial. <br><br></div><div>On page 20, we use it before creating a loading plot. This is just because we use the argument lab.jitter to move the labels around a bit. Jitter works by adding random noise, so we can control it with set.seed(). We have chosen to use set.seed(4) simply because it "randomly" put the labels in a nice enough place. Arguably, set.seed(6) would have done a better job (next time!), but it's a good thing we didn't use set.seed(2). </div>


<div><br></div><div>If you would like, you can see for yourself:</div><div><br></div><div><div><font color="#073763">data(H3N2)</font></div><div><font color="#073763">pop(H3N2) <- factor(H3N2$other$epid)</font></div><div>


<font color="#073763">dapc.flu <- dapc(H3N2, n.pca=30,n.da=10)</font></div><div><font color="#073763"><br></font></div><div><div><font color="#073763">set.seed(</font><font color="#ff0000">4</font><font color="#073763">)</font></div>


<div><font color="#073763">contrib <- loadingplot(dapc.flu$var.contr, axis=2, thres=.07, lab.jitter=1)</font></div></div><div><div><font color="#073763"><br></font></div><div><font color="#073763">set.seed(</font><font color="#ff0000">6</font><font color="#073763">)</font></div>


<div><font color="#073763">contrib <- loadingplot(dapc.flu$var.contr, axis=2, thres=.07, lab.jitter=1)</font></div></div><div><font color="#073763"><br></font></div><div><font color="#073763">set.seed(</font><font color="#ff0000">2</font><font color="#073763">)</font></div>


<div><font color="#073763">contrib <- loadingplot(dapc.flu$var.contr, axis=2, thres=.07, lab.jitter=1)</font></div></div><div><br></div><div>Finally, we use set.seed(2) on page 39 to get a "random" sample of 20 individuals (you were right about that) to serve as our "supplementary individuals" for that exercise. Here, the use of set.seed(2) just ensures that no matter how many times we edit and re-build that tutorial, we will always get the same set of 20 individuals, which is useful for consistency's sake. </div>


<div><br></div><div>All in all, I apologise for the long response that was possibly less related to DAPC than you might have expected, but I hope that helped answer your question! </div><div><br></div><div>Best, <br>Caitlin. </div>


<br><br></div></div></div><div class="gmail_extra"><br><br><div class="gmail_quote">On Wed, Jun 18, 2014 at 6:51 PM, Manuela <span dir="ltr"><<a href="mailto:manuelacorreia2@gmail.com" target="_blank">manuelacorreia2@gmail.com</a>></span> wrote:<br>

<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr">Hi there,<br><br><br>I'd like to understand  the role of set.seeds and the criteria chosen  in the DAPC examples according to the two examples presented in the lattested version of DAPC tutorial.<br>

<br>

I used to see set. seeds(N?) in the context of significance as well as bootstrap Monte Carlo procedures, but not within multivariate techniques or even with datasets.<br><br>At page 20 from DAPC tutorial there is a set. seed(4) before getting the loadingplot. Also, another example at page 39, before split the dataset microbov in two parts. And by the way, what is  20 in the sample(e,20....)? 20 individuals picked at random from all microbov populations?<br>


<br><br>So, I do have two questions. <br>One is  "why to use them?" here in these particular examples? <br>The second one "what criteria were behind the choice of the number 4 in the former case, and the number 2 in the latter?   <br>


<br>How do I know which seed will be the best one for my datased in case I need to have the loadingplot? <br><br>Thanks in advance,<br>M.<br></div>

<br>_______________________________________________<br>

adegenet-forum mailing list<br>

<a href="mailto:adegenet-forum@lists.r-forge.r-project.org">adegenet-forum@lists.r-forge.r-project.org</a><br>

<a href="https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/adegenet-forum" target="_blank">https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/adegenet-forum</a><br></blockquote></div><br></div>