[genoPlotR-help] genoPlot R

Lionel Guy guy.lionel at gmail.com
Tue Aug 19 16:14:16 CEST 2014


Hi Shatavia,

You should get the fasta file (.fna) corresponding to the genome (not gene) of your organisms of interest. For example, for NC_018621, get 
ftp://ftp.ncbi.nlm.nih.gov/genomes/Bacteria/Chlamydia_psittaci_VS225_uid175574/NC_018621.fna

Then use blastn to compare the genomes to each other: you'll get a very long list of hits, but that's OK, they will be filtered by genoPlotR (you can also use -query_loc and -subject_loc in blastn to filter only the region you're after)

You'll need to perform only two blasts, i.e. A->B and B->C, since A->C won't be represented.

You can still use the partial .ptt files as input for read_dna_segs, that's fine.

Cheers,

Lionel
On 19 Aug 2014, at 15:08 , Morrison, Shatavia Sharday (CDC/OID/NCIRD) (CTR) <xxh5 at cdc.gov> wrote:

> Hi Lionel,
> 
> Thank you for your suggestions. I should of mentioned my goal in the previous email. I want to show 3 genes in 3 different highly similar organisms. So if I understand your solution I should do the following:
> 
> 1) Get the nucleotide sequence for genes of interest for each organism
> 2) Blastn these sequences to use in the read_comparison_from_blast function ( I will need to perform 3 separate blast for organisms A,B,and C - AB, BC, AC)
> 
> Here is where I get confused- Can I still use the parsed .ptt GenBank files similar to the ones I sent in the previous email or do I just use the entire ptt file?
> 
> Best,
> Shatavia
> 
> -----Original Message-----
> From: Lionel Guy [mailto:guy.lionel at gmail.com] 
> Sent: Monday, August 18, 2014 4:09 PM
> To: Morrison, Shatavia Sharday (CDC/OID/NCIRD) (CTR)
> Cc: Help for genoPlotR
> Subject: Re: genoPlot R
> 
> Hi Shatavia,
> 
> You are trying to compare contigs (or chromosomes) using blastp (i.e. protein against protein) results. That won't work, because blastp results have coordinates in protein space (i.e. where 1 is the first amino acid of the protein and goes to the end of the protein, irrespective of its position on the contig) and contigs (or chromosomes) are numbered from 1 (beginning of contig/chromorome) to the end of the contig/chromosome. 
> 
> You have two solutions:
> - use blastn (if your contigs/chromosomes are very similar) or tblastx (if they are not). Note that with tblastx, you may run into a problem that is that (last time I checked) the number of hits per sequence in tblastx comparisons is limited to 400. Thus, if you compare large contigs, you will get only the top hits which are not necessarily those you're after. It may be that the newer versions of tblastx allow to change this. You may also use the -query option of tblastx to limit the blast to a specific region of your query (and possibly -subject_loc if you blast one sequence against another.
> - recode protein coordinates into nucleotide positions along the contig. This is not trivial but can be done. I have an experimental Perl script that I'm willing to share, but this doesn't take .ptt files as input. Instead it uses an ad hoc format (four columns: id start end strand). 
> 
> I'd start with trying with blastn, then try tblastx. Finally, try recoding the positions if none of the above works.
> 
> Kind regards,
> 
> Lionel
> 
> On 18 Aug 2014, at 20:56 , Morrison, Shatavia Sharday (CDC/OID/NCIRD) (CTR) <xxh5 at cdc.gov> wrote:
> 
>> Hi Lionel,
>> 
>> Thank you for your reply.  Here are the items that you asked for. All the files are attached. R Code is in the email.
>> 
>> Best,
>> Shatavia
>> 
>> Here is the R Code:
>> 
>> org_16BC <- read_dna_seg_from_ptt("NC_015470_6BC.ptt_pttExtract.ptt")
>> org_2VS225 <- 
>> read_dna_seg_from_ptt("NC_018621_VS225.ptt.txt_pttExtract.ptt")
>> # Need pairwise comparison blast results one per comparison set 
>> blastResults <- read_comparison_from_blast("genoOUT.txt")
>> genomeList<- list(org_16BC,org_2VS225) comparisonList <- 
>> list(blastResults) xlims <- 
>> list(c(77800,79308,176734,179346,209685,210215),c(77763,79271,176672,1
>> 79293,209626,210156))
>> 
>> 
>> #compSegs <- 
>> list(c(77800,210215),c(176734,179346),c(209685,210215),c(77763,79271),
>> c(176672,179293),c(209626,210156),c(77792,79294),c(176718,179297),c(20
>> 9630,210160))
>> #compSegs1 <- list(c(77800,79308),c(77763,210156),c(77792,210160))
>> 
>> plot_gene_map(dna_segs = genomeList, comparisons= 
>> comparisonList,xlims=xlims,main="Test GenoPlotR", 
>> gene_type="arrows",dna_seg_scale= FALSE, scale=TRUE)
>> 
>> 
>> Best,
>> Shatavia
>> From: Lionel Guy [mailto:guy.lionel at gmail.com]
>> Sent: Monday, August 18, 2014 2:53 PM
>> To: Morrison, Shatavia Sharday (CDC/OID/NCIRD) (CTR)
>> Cc: Help for genoPlotR
>> Subject: Re: genoPlot R
>> 
>> Hi Shatavia,
>> 
>> Thanks for using genoPlotR. To be able to answer your question, I would need you to send me the whole code that you are using, including the part where you load the genes and the comparison, and the actual files containing the comparison and the genes. 
>> 
>> Best regards,
>> 
>> Lionel
>> 
>> On 18 août 2014, at 19:36, "Morrison, Shatavia Sharday (CDC/OID/NCIRD) (CTR)" <xxh5 at cdc.gov> wrote:
>> 
>> Hello Lionel,
>> 
>> I'm running into some issues with the genoPlotR package.  Currently I'm trying to generate a plot similar to the one in the Getting Started with genoplotR manual, quick start page 2.  I'm not interested in generating a phylogenetic tree with the graph image, I just want to show the genes and their corresponding blast results synteny.
>> 
>> As a test I'm using three genes from two different genomes.  The code listed below generates the attached image. I would like to see the comparisons of the blast results, which are missing.  I have removed the self-hits from the blast results as I thought this is why genoPlotR was not creating the comparisons since there were multiple hits in the file, now that I have removed them I'm still having the same issue.
>> 
>> What am I missing from my plot gene map function? 
>> 
>> plot_gene_map(dna_segs = genomeList, comparisons= 
>> comparisonList,xlims=xlims,main="Test GenoPlotR", 
>> gene_type="arrows",dna_seg_scale= FALSE, scale=TRUE)
>> 
>> Best,
>> Shatavia
>> <TestGenoR.tiff>
>> <NC_015470_6BC.ptt_pttExtract.ptt><NC_018621_VS225.ptt.txt_pttExtract.
>> ptt><genoOUT.txt>
> 
> --
> Lionel Guy
> Fålhagsleden 13, SE-75324 Uppsala | email: guy.lionel at gmail.com | mobile: +46 (0)73 9760618 | phone: +46 (0)18 410 7398
> 

--
Lionel Guy 
Fålhagsleden 13, SE-75324 Uppsala | email: guy.lionel at gmail.com | mobile: +46 (0)73 9760618 | phone: +46 (0)18 410 7398



More information about the genoPlotR-help mailing list