[adegenet-forum] df2genind never stops

VARALDI JULIEN Julien.Varaldi at univ-lyon1.fr
Tue Aug 2 19:44:27 CEST 2016


Dear adegenet users,

I have two datasets that I would like to combine into a single one, ideally a genlight one. The first dataset is a vcf file from the 1000 genomes. I can read it using the package vcfR and then convert it to a genlight object. This take a while (few minutes) but works fine:

vcf=read.vcfR(vcf_file)
my_genlight <- vcfR2genlight(x=vcf, n.cores = 8)

The other dataset is a data frame containing genotypes obtained from genome-wide SNP array. It contains the genotypes for 31 individuals on 868146 loci. The initial file is only 90Mb. I tried to use df2genind but without success (I stopped it after 20 minutes or something like that… it is running without apparent error). Here is what I did:

>tab=read.table(my_data, head=T, sep=",")
>head(tab)
>loci=tab$rs_number
>tab=t(tab)
>tab=tab[-1,]
>colnames(tab)=loci

> tab[1:5, 1:4]
         rs10458597 rs9629043 rs11510103 rs12565286
Sample_4 "CC"       "CC"      "AA"       "CC"      
Sample_5 "CC"       "NN"      "AA"       "CC"      
Sample_6 "CC"       "CC"      "AA"       "CC"      
Sample_7 "CC"       "CC"      "AA"       "CC"      
Sample_8 "CC"       "CC"      "AA"       "CC"      

> dim(tab)
[1]     31 868146
my_genind=df2genind(tab, ploidy=2, sep="", NA.char = "N")

This last command lasts for ever.

I would appreciate any suggestion. The next step is to combine the two datasets, with the difficulty that one will be a genlight, the other a genind, AND the 1000 thousand dataset contains much more loci than the snp dataset (does repool deal with this situation?). I would also appreciate any input on that. 

I am running R 3.3.1 on a mac os 10.11.4
thanks a lot,
cheers,
Julien


More information about the adegenet-forum mailing list