[adegenet-forum] df2genind never stops
VARALDI JULIEN
Julien.Varaldi at univ-lyon1.fr
Tue Aug 2 19:44:27 CEST 2016
Dear adegenet users,
I have two datasets that I would like to combine into a single one, ideally a genlight one. The first dataset is a vcf file from the 1000 genomes. I can read it using the package vcfR and then convert it to a genlight object. This take a while (few minutes) but works fine:
vcf=read.vcfR(vcf_file)
my_genlight <- vcfR2genlight(x=vcf, n.cores = 8)
The other dataset is a data frame containing genotypes obtained from genome-wide SNP array. It contains the genotypes for 31 individuals on 868146 loci. The initial file is only 90Mb. I tried to use df2genind but without success (I stopped it after 20 minutes or something like that… it is running without apparent error). Here is what I did:
>tab=read.table(my_data, head=T, sep=",")
>head(tab)
>loci=tab$rs_number
>tab=t(tab)
>tab=tab[-1,]
>colnames(tab)=loci
> tab[1:5, 1:4]
rs10458597 rs9629043 rs11510103 rs12565286
Sample_4 "CC" "CC" "AA" "CC"
Sample_5 "CC" "NN" "AA" "CC"
Sample_6 "CC" "CC" "AA" "CC"
Sample_7 "CC" "CC" "AA" "CC"
Sample_8 "CC" "CC" "AA" "CC"
> dim(tab)
[1] 31 868146
my_genind=df2genind(tab, ploidy=2, sep="", NA.char = "N")
This last command lasts for ever.
I would appreciate any suggestion. The next step is to combine the two datasets, with the difficulty that one will be a genlight, the other a genind, AND the 1000 thousand dataset contains much more loci than the snp dataset (does repool deal with this situation?). I would also appreciate any input on that.
I am running R 3.3.1 on a mac os 10.11.4
thanks a lot,
cheers,
Julien
More information about the adegenet-forum
mailing list