Dears
I converted my hj.str data into genind object using the read.structure function.?hj<- read.structure(file="hj.str", n.ind=747, n.loc=168344, ?onerowperind=TRUE, col.lab=1, col.pop=2, NA.char=-9, ask=FALSE)
My data contains two populations (represented by 1 & 2 in pop column of the hj.str data), the data was not sorted by?population. How can I define the population of each individual in the genind object hj at pop.
I tried this onepop_1 <-?hj at pop==1pop_2?<-?hj at pop==2
However, this has clumped the entire population just into two dots instead of representing each individual in the PCA plot.
Any help is appreciated
Takele
Hello,
I am encountering a problem with the fasta2genlight function:
Here is the error it gives me:
Erreur dans `alleles<-`(`*tmp*`, value = list()) :
Miss-formed strings in replacement (must be e.g. 'c/g')
It seems that the error is due to SNPs absence in the data file.
Does someone already encountered this error ?
Thanks for your help,
Gabriel
Hello,
please have a look at the documentation, especially the vignette on basics (vignette("adegenet-basics")).
You want to use :
pop(hj) <- ...
Cheers
Thibaut
Hello,
can you post a (small) toy dataset to reproduce the error?
Cheers
Thibaut
Here is a dataset (attached file):
(My whole dataset are numerous file with small dataset like this one)
>ana
CAGGTGACG-CAATTTTACTGTAATTTGTTTGGCCGCACGTAC---TTGGAGGCCT-GACATGGGGCAATGTCAGCTCGTTTGTGCATGCTCAG-------
>ere
CAGGTGACG-CAATTTTACTGTAATTTGTTTGGCCGCACGTAC---TTGGAGGCCT-GACATGGGGCAATGTCAGCTCGTTTGTGCATGCTCAG-------
>sec
CAGGTGACG-CAATTTTACTGTAATTTGTTTGGCCGCACGTAC---TTGGAGGCCT-GACATGGGGCAATGTCAGCTCGTTTGTGCATGCTCAG-------
>vil
CAGGTGACG-CAATTTTACTGTAATTTGTTTGGCCGCACGTAC---TTGGAGGCCT-GACATGGGGCAATGTCAGCTCGTTTGTGCATGCTCAG-------
Thanks a lot
Gabriel Terraz -Doctorant-
Tel: +33(0)4 72 43 29 08
Laboratoire de Biom?trie et Biologie Evolutive, UMR CNRS 5558
Batiment Mendel
Universit? Claude Bernard - Lyon 1
43, Bd du 11 novembre 1918
69622 Villeurbanne
>ana
CAGGTGACG-CAATTTTACTGTAATTTGTTTGGCCGCACGTAC---TTGGAGGCCT-GACATGGGGCAATGTCAGCTCGTTTGTGCATGCTCAG-------
>ere
CAGGTGACG-CAATTTTACTGTAATTTGTTTGGCCGCACGTAC---TTGGAGGCCT-GACATGGGGCAATGTCAGCTCGTTTGTGCATGCTCAG-------
>sec
CAGGTGACG-CAATTTTACTGTAATTTGTTTGGCCGCACGTAC---TTGGAGGCCT-GACATGGGGCAATGTCAGCTCGTTTGTGCATGCTCAG-------
>vil
CAGGTGACG-CAATTTTACTGTAATTTGTTTGGCCGCACGTAC---TTGGAGGCCT-GACATGGGGCAATGTCAGCTCGTTTGTGCATGCTCAG-------
Hello,
yes indeed, this is a bug, the function does not expect entirely non-typed loci.
If RAM is not a constraint (if your dataset is small), you don't have to use genlight. You can use DNAbin format; to read data in:
dna <- fasta2genlight("sequ.fa")
Cheers
Thibaut
My bad, this is not a bug as such.
genlight is meant to store SNPs. All your sequences are identical.
Cheers
Thibaut
This issue is now fixed in the development version. The patch is available at the address below:
https://r-forge.r-project.org/scm/viewvc.php/*checkout*/pkg/R/import.R?root=adegenet
Cheers
Thibaut
Hi Kelvin,
sorry about this reaction as prompt as the one of a stone in drunker
stupor.
I guess you have probably moved a bit further on the interpretation of your
results by now. Any how, I can try to tell you something
useful (? - who knows)
> Here are my questions:
> 1. For the sPCA based on spatial (not depth) coordinates, the barplot of
> eigenvalues shows the typical pattern of PC3 > PC4 > PC5, but if you look
> at the screeplot (the graph of PC score variance vs spatial
> autocorrelation), PC5 accounts for a larger amount of variance than PC3 and
> 4. This seems contradictory to me. Does anyone have an explanation?
>
> the eigenvalues in the spca have two components, the variance and the
spatial autocorrelation, if you type summary(yourspca)[[3]] you will see
the list of var and morane values for each of the eigenvalues. The PC5 may
be correlated with genetic variables with higher variance than the ones
contributing to pc3 and 4 but that are not spatially ordered?
2. Next, to do more exploratory analyses, I wanted to see how robust these
> results were for different distance limits (d2) in constructing the
> connection network. I noticed that when I pick an arbitrary number, like
> d2=12 for the sPCA using spatial (not depth) coordinates, the spatial
> patchiness disappears and instead there now appears to be a cline. Because
> sPCA decomposes both genetic and spatial variance, is it possible for the
> spatial variance to swamp out the genetic variance, particularly if you
> define a connection network too arbitrarily? In other words, by defining
> d2=12, does the sPCA miss the finer scale spatial patchiness that was found
> when I defined my connection network with a more "sensible" d2?
>
> In my personal experience with spca, usually if the spatial patterns are
strong no matter what graph is used they do not change substantially, in
the worst case just a few points look a bit different. In your specific
case the fact that you decided to consider neighbours on the basis of the
positive spatial autocorrelation sounds a bit circular, in this case you
might be forcing the method to highlight the pattern of positive spatial
autocorrelation that may not be driving the genetic distribution of your
sample. I would rather go for inverse distances which are usually more
accurate. Btw, did you run the global and local tests? If they change from
significant to non significant changing the neighbouring method I would not
think that there is a spatial significant pattern.
> 3. Clearly depth and space are autocorrelated with each other. Based on
> the partial mantel tests, both are significantly, but only weakly
> correlated with genetic relatedness. Are there any general guidelines for
> interpreting low Mantel r values? As I understand it, Mantel r is not the
> same as a correlation r, because Mantel tests are based on distances and
> not raw data. I've seen other studies commenting on how small Mantel r's
> are often reported, but so far, I have not come across any studies that
> report values as small as mine.
>
> I've never seen so small mantel test values either...In this case, when I
first read about this issue of 'controlling' depth for the space I had two
different thoughts about it:
1. if you think about spatial proximities, being less or more depth does
not mean to be more or less close, clearly. Considering your results of a
spatial gradient from more to less depth, this is likely highlighting a
adaptive pattern to depth, but maybe this is exactly the reason why you run
the method on depth only.
2. If I wanted to see the effect of space and depth, I would probably use
the depth in combination with a linear simplified distance scheme (like
points on a line or a circle reproducing the spatial shape of the coral
reef) and build the spatial connection with it. In this case you would
analyse together the role of spatial distances (in 3-D) and the potential
role of adaptation, which is already disentangled in the spatial analysis
based on depth only.
End. Just to let you know I hate you a bit because you work in the Hawaii.
Ciao
Valeria
On 30 January 2013 21:10, Kelvin Gorospe wrote:
> Hello all,
>
> I'd like to ask some input on interpreting some results. I have
> microsatellite genotypes, depth, and spatial coordinates for 2352 corals
> from a single coral reef. I ran partial mantel tests looking at the
> relationship between genetic relatedness and space (controlling for depth)
> as well as the relationship between genetic relatedness and depth
> (controlling for space) and found highly significant p values (p=0.001) but
> very small Mantel r values (0.008 for space and 0.01 for depth). So there
> is a small, but still significant relationship between genetics and space
> as well as genetics and depth on a very small scale (the reef covered an
> area of only about 1300m^2 with depths of between 1 and 4m).
>
> Next, I wanted to visualize these structures using sPCA. So first I
> constructed two connection networks: both neighbor by distance connections,
> but one based on depth measurements (0,z) and one based on spatial
> coordinates (x,y). The distance limit (d2) for each network was based on
> inspecting correlograms for genetics vs. depth and genetics vs. space and
> using the extent of positive autocorrelation as the upper limit (d2) for
> defining neighbors in each of the connection networks. After performing
> sPCA I then plot the PCs using the spatial (x,y) coordinates to visualize
> the spatial arrangement of genetic relatedness. The sPCA based on spatial
> coordinates show a patchy reef, groups of similar PC scores clumping
> together throughout the reef. The sPCA based one depth coordinates,
> however, show a depth cline, with corals in the center of the reef (the
> shallow part) having distinct PC scores from corals on the outer slopes of
> the reef (the deeper part).
>
> Here are my questions:
> 1. For the sPCA based on spatial (not depth) coordinates, the barplot of
> eigenvalues shows the typical pattern of PC3 > PC4 > PC5, but if you look
> at the screeplot (the graph of PC score variance vs spatial
> autocorrelation), PC5 accounts for a larger amount of variance than PC3 and
> 4. This seems contradictory to me. Does anyone have an explanation?
>
> 2. Next, to do more exploratory analyses, I wanted to see how robust these
> results were for different distance limits (d2) in constructing the
> connection network. I noticed that when I pick an arbitrary number, like
> d2=12 for the sPCA using spatial (not depth) coordinates, the spatial
> patchiness disappears and instead there now appears to be a cline. Because
> sPCA decomposes both genetic and spatial variance, is it possible for the
> spatial variance to swamp out the genetic variance, particularly if you
> define a connection network too arbitrarily? In other words, by defining
> d2=12, does the sPCA miss the finer scale spatial patchiness that was found
> when I defined my connection network with a more "sensible" d2?
>
> 3. Clearly depth and space are autocorrelated with each other. Based on
> the partial mantel tests, both are significantly, but only weakly
> correlated with genetic relatedness. Are there any general guidelines for
> interpreting low Mantel r values? As I understand it, Mantel r is not the
> same as a correlation r, because Mantel tests are based on distances and
> not raw data. I've seen other studies commenting on how small Mantel r's
> are often reported, but so far, I have not come across any studies that
> report values as small as mine.
>
> I've also tried to attach some graphs to this email, but I'm not sure if
> the list serve allows attachments. But hopefully my descriptions of my
> results were still good enough to get some feedback. Any input would be
> greatly appreciated! Thanks everyone!
>
>
> _______________________________________________
> adegenet-forum mailing list
> adegenet-forum at lists.r-forge.r-project.org
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/adegenet-forum
>
Dear Dr. Jombart,
I hope this email finds you well. We have exchanged thoughts before, and I
wish to thank you for having gotten back to me in the past.
I have been going through your latest vignette about dapc in adegenet (Nov
2012). I have used dapc on a butterflyfish hybrid zone in the past
(Montanari et al 2012, Ecology and Evolution), and now I am going through a
second dataset, and would like to compare the 2. Hence, I have a couple of
questions for you:
- am I correct in thinking that I want the same level of stability between
the 2 analyses if I am to compare the results? (eg, in both have retained
PCs = N/3)
- in your tutorial you mention that dapc$posterior used to construct
compoplot are not the same as structure admixture coefficients. Could you
point me in a direction that would allow me to understand how they are not?
I have run the results through structure and the hybrids show up nicely as
50/50 clustred with parent 1 and 2 (k=2). adegenet also reckons that k=2
should be the best, but the compoplot shows no membership misassignment
(even if the # of PCs is conservative). Do you have any suggestions as to
why?
Hoping to have been clear enough and not to have bored you senseless, I
look forward to hearing back from you.
Best regards,
Stef
--------------------------
Stefano R. Montanari
PhD Candidate
James Cook University
School of Marine and Tropical Biology
ATSIP (Building 145 James Cook Drive)
4811 Townsville QLD
stefanomontanari at gmail.com
Work: +61 7 4781 5441
Mob: +61 404 736 509
Hi Stefano,
thanks for reposting on the forum. It gives me the chance to clarify an important point.
For the first point, there is not a linear relationship between 'stability' of DAPC results and the number of PCs retained in the PCA step. 'xxx' PCs can represent 2% of the variance in one analysis and 60% in another. If the two data table have fairly comparable dimensions, it would be best to retain roughly the same proportion of variance. If their dimensions are very different, then the same number of PCs makes sense.
STRUCTURE or similar approaches have a model which partitions genotypes into groups. It is basically a mixture distribution problem with a multinomial distribution for each locus and group. So the 'admixture' coefficient has a a straightforward biological interpretation.
In DAPC, assignment of individuals to groups using the discriminant functions are based on a geometric criteria. In other words, "tell me where you are in the discriminant space, I will tell you the probability that you belong to groups xxx, yyy and zzz". This is of course dependent on the discriminant space. The more dimensions retained in the PCA step, the easier it is the find a space providing perfect discrimination. The obtained group membership probabilities can reflect admixture, but they do not represent the proportion of the genome assigned to a given group. In your case, use a smaller space, you may start seeing less clear-cut group definition. optim.a.score may help selecting the number of PCs.
Cheers
Thibaut.
Hi Thibaut,
thank you for your prompt reply, it was very clear. Just a quick question
about optim.a.score: I had used it before, and this morning I tried again
just to make sure I remembered the results correctly. For one dataset
(N=109, 12 loci) it finds that 17 PCs is the best; for the other (N=83, 20
loci), retaining only 1 PC (not possible since PC=>2) gives the highest a
score. This worries me. Do you think these data should not be used for
DAPC?
Cheers
Stef
--------------------------
Stefano R. Montanari
PhD Candidate
James Cook University
School of Marine and Tropical Biology
ATSIP (Building 145 James Cook Drive)
4811 Townsville QLD
stefanomontanari at gmail.com
Work: +61 7 4781 5441
Mob: +61 404 736 509
Hi all,
I'm working with a domestic species and I have been trying to integrate
spatio-temporal data with haplotype frequencies. I've been working with
some ancient DNA data, associated with spatial location. I have a small
fragment of the mitochondrial DNA for several hundreds of individuals, each
associated with a geographical coordinate (x,y) and to a 14C dating. I can
see that there is a strong geographical correlation (geographically close
samples show related DNA haplotypes in similar frequencies), but I can also
see that this distribution is also strongly correlated with sampling time.
Because I'm working with a domestic species, this strong influence of time
has to do with Neolithic migrations, the change of haplotypes frequencies
is directly correlated with human migrations and their time of arrival in
different European locations.
Following the list, a saw it was possible to perform a 3D sPCA using depth
data. So, I was wondering if instead of depth data as a Z coordinate, it
would be possible to integrate temporal data on a sPCA, using something
like (x, y, time)? I have a radiocarbon dating for each sample, but I was
thinking of using time frames (or categories) related to the different
human cultures (Neolithic, Mesolithic, etc) since the change in haplotype
frequencies is directly related to the changes in human culture....
Is it possible?
Sibelle
Hello,
Why 'not possible since PC >=2'? You can choose to retain only on PC if you wish.
This suggests that the first PC of the second analysis already contains all the between-group discrimination.
Cheers
Thibaut
Dear Sibelle,
yes, I think it makes sense, although you'll probably have a harder time interpreting and plotting the results.
But basically, all you need to do is formulate the spatio-temporal proximities in a proximity matrix (terms >= 0, diagonal =0). Another, more complicated approach would be discretizing the temporal data and have say 'T' time steps with one matrix of proxity each. You'll then have to coordinate 'T' sPCA, which is doable using K-table approaches such as multiple co-inertia or STATIS (all in ade4), but substantially more of a pain.
Cheers
Thibaut
Dear Thibaut and the rest of the Adegenet users,
Well done on all those DAPC assignment and structure functions. Me and my
data are loving them!
I am currently using the assignment functions based on a set of
microsatellite data AND (following your suggestion at the top of the
vignette) on a set of environment driven quantitative phenotype data. It's
working quite well with both datasets, albeit showing rather different
structures at different spatial scales. And here's the thing, would it be
possible to analyse both datasets merged together? I believe this would
allow maximum assignment power as the structures complement each other
(rather than mimic each other).
Many thanks in advance for any help and congratulations on an amazing
package!
Regards,
Niklas
Dr Niklas Tysklind
Postdoctoral Research Officer
Celtic Sea Trout Project
Environment Centre for Wales
School of Biological Sciences
College of Natural Sciences
Bangor University,
Bangor, LL57 2UW
UK
Phone: +44 1248 382139
Email: ntysklind at bangor.ac.uk
Hello,
the quick and dirty way to do this would be taking the transformed data, normalize them to the same inertia (sum of squared values of the entries in each table), bind them into a single table, and run DAPC on this. It is not very elegant, but may do the trick if you want a quick and sample answer.
There is quite a bit of literature on coupling data. I think I mention a few in a very quick overview in my review paper (http://www.ncbi.nlm.nih.gov/pubmed/19156164). Coinertia analysis would be an option (function "coinertia" in the package "ade4"), but it won't allow you to couple two DAPC (only say, two PCA). Such implementation would be possible, but would probably demand quite a bit of work (essentially, a new paper, and a slightly boring one to write too!).
One option in between a clean, elegant solution and something manageable without too much pain is: look for combinations of the Discriminant Factors which are most alike between the two datasets. The procedure would be:
1) Make a DAPC for each table; keep all axes
2) Get the DAPC coordinates of the two tables, and standardize them to the same (say, 1) inertia. That is, divide the table by the sum of all squared entries.
3) Use these new matrices as inputs of coinertia
Does this make sense?
Cheers
Thibaut
