[adegenet-forum] interpretation results private_alleles
Zhian Kamvar
zkamvar at gmail.com
Sat May 23 18:18:43 CEST 2020
Hello,
Thank you for your patience. I will try to answer this as best I can.
Note that this is explicitly an adegenet forum, so the question is a bit out of place. I am copying the response to the poppr forum. Additionally, it would be good to know what versions of R, adegenet, and poppr you are using.
Please find my answers below
> In the output I have 13072 rows, 6536 unique "alleles names" x 2 (M and
> P). Why don't I have all loci in my dataset ( I should have 24996 rows)??.
> Many of these alleles are not private in none of the two pops.. so it is
> not only the loci with private alleles.
The core part of private_alleles will report alleles that are represented by fewer than two populations in the current data set [1]. This is why you do not have all alleles represented. If you want a table that reports counts of alleles per population, you can convert your data to a genpop object with genind2genpop().
I suspect that the reason you are getting zero counts in your data is because this represents a subset of a larger data set and the alleles that were present in the broader data set were not dropped. For example, I can represent the same pattern with the nancycats data set:
suppressPackageStartupMessages(library("poppr"))
data(nancycats)
all_cats <- private_alleles(nancycats, allele ~ .)
dim(all_cats)
#> [1] 17 13
all_cats[, 1:5] # shows private alleles in three populations
#> fca8.117 fca8.119 fca8.127 fca43.157 fca77.132
#> P01 0 0 0 0 0
#> P02 0 0 0 0 0
#> P03 0 0 0 0 0
#> P04 0 0 0 0 0
#> P05 0 0 0 0 0
#> P06 0 0 0 0 0
#> P07 0 0 0 0 0
#> P08 0 0 0 0 0
#> P09 0 0 0 0 0
#> P10 0 0 0 1 0
#> P11 0 0 0 0 1
#> P12 0 0 0 0 0
#> P13 0 0 0 0 0
#> P14 1 1 1 0 0
#> P15 0 0 0 0 0
#> P16 0 0 0 0 0
#> P17 0 0 0 0 0
two_cats <- private_alleles(nancycats[pop = 1:2], allele ~ .)
dim(two_cats)
#> [1] 2 83
two_cats[, 1:5] # shows that these alleles do not exist in these two
#> fca8.117 fca8.119 fca8.121 fca8.123 fca8.127
#> P01 0 0 0 0 0
#> P02 0 0 0 0 0
> For example, for "1219:12:-" , population M has 4 private alleles. Am I
> correct that this means there are 4 individuals of population M that hold a
> private allele at this locus?
> population allele count
> 13 M 588:43:+.1 0
> 14 P 588:43:+.1 0
> 15 M 1086:34:-.1 0
> 16 P 1086:34:-.1 0
> 17 M 1219:12:-.1 4
Yes. This is correct.
> Then, I count the number of population -private alleles across loci with
> *n.pAlleles*<-pa.all%>%group_by(population)%>%summarise(sum(count))%>%as.data.frame
> #count total nb of private alleles, loci pooled
>> n.pAlleles
> population sum(count)
> 1 M 1648
> 2 P 1684
>
> And I count the number of loci with population -private alleles with
> pa.all1<-filter(pa.all, count !=0) #remove loci with of 0 p.a.
> *n.pLocus*<-pa.all1 %>% count(population) %>%as.data.frame #count per pop
> the nb of loci with at least 1 private allele
>> n.pLocus
> population n
> 1 M 864
> 2 P 742
>
> I get very different results for the two measures, which I didn't expect.
> Am I interpreting well the results?
These represent the differences between count.alleles = TRUE and count.alleles = FALSE.
Let's say you had a clonal data set where every individual was duplicated perfectly in each population, the first result would give 3296 for M and 3368 for P because the number of alleles in the pool has doubled. The second result would give 864 for M and 742 for P because the number of allelic states has remained the same.
> Also, I realised I get exactly the same results using "form =loci~." in the
> private_alleles function, but the first is much much faster. This is weird.
> I think in the two cases it gives me the number of alleles per
> population.... So I don't get what is " loci~ " for.
I'm afraid I do not have enough information to give you an answer as to why you are getting the same result. What I suspect is that your data may have been mis-formatted on import. You have binary alleles, but only the 1s are being counted. It may be that zero alleles were considered missing somewhere, so it would be good to make sure that alleles are not represented as zero. If you used vcfR2genind to import your data, you can use the return.alleles = TRUE argument to return the alleles as the actual allele call (which was fixed in the latest version of vcfR)
I hope that helps.
Best,
Zhian
[1] For those interested here is the precise code that does that: https://github.com/grunwaldlab/poppr/blob/af877c388899298197cd6b76fbd8876f59815833/R/Index_calculations.r#L1096-L1097
> Date: Tue, 28 Apr 2020 13:16:36 +0200
> From: Amaranta Fontcuberta <amaranta.fontcuberta at unil.ch>
> To: adegenet-forum at lists.r-forge.r-project.org
> Subject: [adegenet-forum] interpretation results private_alleles
> Message-ID:
> <CAG+96r8SdfFfhqaAu39sk=0-yp3HghaOWn=BwkHOYwq261xpfA at mail.gmail.com>
> Content-Type: text/plain; charset="utf-8"
>
> Dear Zhian,
>
> I calculate tne number of alleles per population with poppr. I get
> unexpected results and some weird behavior of the private_alleles function,
> so I would like to confirm I am understanding well what the function is
> doing and I am interpreting the results correctly. Thanks in advance for
> your help!
>
> I have a genind object with 27 individuals and 12.498 loci (bi allelic
> data, 0/1). I have 2 populations called M and P.
> I calculate the nb of private alleles per population with
>> pa.all<-private_alleles(derbo.gd
> ,form=alleles~.,level="population",report="data.frame")
>
> In the output I have 13072 rows, 6536 unique "alleles names" x 2 (M and
> P). Why don't I have all loci in my dataset ( I should have 24996 rows)??.
> Many of these alleles are not private in none of the two pops.. so it is
> not only the loci with private alleles.
>
> For example, for "1219:12:-" , population M has 4 private alleles. Am I
> correct that this means there are 4 individuals of population M that hold a
> private allele at this locus?
> population allele count
> 13 M 588:43:+.1 0
> 14 P 588:43:+.1 0
> 15 M 1086:34:-.1 0
> 16 P 1086:34:-.1 0
> 17 M 1219:12:-.1 4
>
> Then, I count the number of population -private alleles across loci with
> *n.pAlleles*<-pa.all%>%group_by(population)%>%summarise(sum(count))%>%as.data.frame
> #count total nb of private alleles, loci pooled
>> n.pAlleles
> population sum(count)
> 1 M 1648
> 2 P 1684
>
> And I count the number of loci with population -private alleles with
> pa.all1<-filter(pa.all, count !=0) #remove loci with of 0 p.a.
> *n.pLocus*<-pa.all1 %>% count(population) %>%as.data.frame #count per pop
> the nb of loci with at least 1 private allele
>> n.pLocus
> population n
> 1 M 864
> 2 P 742
>
> I get very different results for the two measures, which I didn't expect.
> Am I interpreting well the results?
>
> Also, I realised I get exactly the same results using "form =loci~." in the
> private_alleles function, but the first is much much faster. This is weird.
> I think in the two cases it gives me the number of alleles per
> population.... So I don't get what is " loci~ " for.
>
> Thanks in advance for your advice,
>
> All the best,
>
>
>> derbo.gd
> /// GENIND OBJECT /////////
> // 27 individuals; 12,498 loci; 24,996 alleles; size: 8.4 Mb
> // Basic content
> @tab: 27 x 24996 matrix of allele counts
> @loc.n.all: number of alleles per locus (range: 2-2)
> @loc.fac: locus factor for the 24996 columns of @tab
> @all.names: list of allele names for each locus
> @ploidy: ploidy of each individual (range: 2-2)
> @type: codom
> @call: .local(x = x, i = i, j = j, drop = drop)
>
> // Optional content
> @pop: population of each individual (group size range: 13-14)
>>
>> derbo.gd at loc.fac
> [1] 47:42:- 47:42:- 69:32:+ 69:32:+ 170:70:+ 170:70:+
> 255:71:+ 255:71:+ 318:29:+
> [10] 318:29:+ 413:44:- 413:44:- 447:82:+ 447:82:+ 471:26:+
> 471:26:+ 541:74:- 541:74:-
> [19] 588:43:+ 588:43:+ 702:20:+ 702:20:+ 745:45:- 745:45:-
> 749:18:+ 749:18:+ 770:10:+
> [28] 770:10:+ 854:6:+ 854:6:+ 1086:34:- 1086:34:- 1142:62:+
> 1142:62:+ 1183:42:+ 1183:42:+
>
>> pa.all<-private_alleles(derbo.gd
> ,form=alleles~.,level="population",report="data.frame")
>> pa.all (alllele ~pop)
> population allele count
> 1 M 69:32:+.1 0
> 2 P 69:32:+.1 0
> 3 M 170:70:+.1 0
> 4 P 170:70:+.1 0
> 5 M 255:71:+.1 0
> 6 P 255:71:+.1 0
> 7 M 318:29:+.1 0
> 8 P 318:29:+.1 0
> 9 M 447:82:+.1 0
> 10 P 447:82:+.1 0
> 11 M 541:74:-.1 0
> 12 P 541:74:-.1 0
> 13 M 588:43:+.1 0
> 14 P 588:43:+.1 0
> 15 M 1086:34:-.1 0
> 16 P 1086:34:-.1 0
> 17 M 1219:12:-.1 4
>
>
>
>> pa.all (loci~pop)
> population locus count
> 1 M 69:32:+ 0
> 2 P 69:32:+ 0
> 3 M 170:70:+ 0
> 4 P 170:70:+ 0
> 5 M 255:71:+ 0
> 6 P 255:71:+ 0
> 7 M 318:29:+ 0
> 8 P 318:29:+ 0
> 9 M 447:82:+ 0
> 10 P 447:82:+ 0
> 11 M 541:74:- 0
> 12 P 541:74:- 0
> 13 M 588:43:+ 0
> 14 P 588:43:+ 0
> 15 M 1086:34:- 0
> 16 P 1086:34:- 0
> 17 M 1219:12:- 4
>
>
> ----------------------------------------------------------
> Amaranta Fontcuberta, assistante diplômée
> Dept. Écologie et Évolution
> Université de Lausanne
> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL: <http://lists.r-forge.r-project.org/pipermail/adegenet-forum/attachments/20200428/79603835/attachment-0001.html>
>
> ------------------------------
>
> Subject: Digest Footer
>
> _______________________________________________
> adegenet-forum mailing list
> adegenet-forum at lists.r-forge.r-project.org
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/adegenet-forum
>
> ------------------------------
>
> End of adegenet-forum Digest, Vol 137, Issue 5
> **********************************************
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/adegenet-forum/attachments/20200523/fff2c921/attachment-0001.html>
More information about the adegenet-forum
mailing list