[genoPlotR-help] genoplotR help with graphic presentation or CDS and names

Lionel Guy lionel.guy at imbim.uu.se
Thu Jan 26 21:33:00 CET 2017


Hi Diane, 

See my comments inline:

> On 26 Jan 2017, at 15:07 , Artemis H <hatziiod at gmail.com> wrote:
> 
> Hi Lionel,
> 
> Thanks for the prompt very useful help.
> 
> I have a few more questions though. I made a tree by giving RAxML a MEGA7 alignment and tried feeding that into my command list but I seem to have a name issue.
> 
> My sequences are called
> ##Sequences 
> setA.seq <- read_dna_seg_from_genbank("setAfile.gb", tagsToParse=c("CDS"))
> etc
> 
> My tree number 1 was
> tree <- newick2phylog("((setU.seq:0.10250953446912315636,setR.seq:0.09775744461960976517):0.01209692024574308099,((setA.seq:0.04104447134815911863,setQ.seq:0.06305211466588933611):0.01132124623294956944,setH.seq:0.11080167962495350575):0.01743359660415144674,otherset.seq:0.10269874269575925141):0.0;")
> 
> I also tried chopping the last :0.0; off the end (fruitless improvisation)
> 
> My plot line was;
> plot_gene_map(dna_segs=list(otherset.seq, setA.seq, setQ.seq, setH.seq, setR.seq, setU.seq ), comparisons=list(Geobacillin_setA.comparison, setA_setQ.comparison, setQ_setH.comparison, setH_setR.comparison, setR_setU.comparison),override_color_schemes=TRUE, global_color_scheme=c("e_value", "auto", "grey", "0"),tree=tree)
> 
> That gave me:
> Error in plot_gene_map(dna_segs = list(otherset.seq, setA.seq, setQ.seq,  :
>   If tree is given, label names should be provided via named list dna_segs or dna_seg_labels
> Execution halted
> 
> As far as I can see though the label names I gave are exactly the same as the dna_segs names. I tried introducing labels instead:
> 
> plot_gene_map(dna_segs=list(otherset.seq, setA.seq, setQ.seq, setH.seq, setR.seq, setU.seq ), comparisons=list(otherset_set.comparison, setA_setQ.comparison, setQ_setH.comparison, setH_setR.comparison, setR_setU.comparison),override_color_schemes=TRUE, global_color_scheme=c("e_value", "auto", "grey", "0"),dna_seg_labels=c("otherset I", "setin A cluster", "setin Q cluster", "setin H cluster","setin R cluster","setin U cluster"),tree=tree)
> 
> Again I got "Error in plot_gene_map(dna_segs = list(Geobacillin.seq, nisA.seq, nisQ.seq,  : 
>   If tree is given, label names should be provided via named list dna_segs or dna_seg_labels
> Execution halted”
> 

Two ways: a) name your objects in the list: 

plot_gene_map(dna_segs=list(otherset.seq=otherset.seq, setA.seq=setA.seq, setQ.seq=setQ.seq, and so on, so that the names of the elements of the list (before the =) exactly match the leaves of the tree

b) use dna_seg_labels that are identical to the names of the leaves of the tree:

plot_gene_map(…, …, dna_seg_labels=c(“otherset.seq", “setA.seq", “setQ.seq”, etc...

> I also tried giving the dna_seg names in the newick list the labels in the form of tree <- newick2phylog("(("setin U cluster":0.102509534.. etc but that terminated at the tree line with a "Error: unexpected symbol in "tree <- newick2phylog("(("setin"
> Execution halted"  message.

You don’t want to mess with extra quotes there. Keep away from spaces in tree labels.

> I guess it must be something horribly simple with names and labels but if its obvious to you please share the secret.
> 
> I gt the arrows and gene names showing but couldn't figure out how to get the cluster names showing to the left, should I assume that if the tree is recognized it  will show the names then?

Yes. I think that even without the tree, if you use solution a) or b) above it should work. 

> Also just for your amusement it took me about 2 hours to realize there isn't a problem with my genbank files but the gene_type I wanted was arrows not arrow.

The smallest errors are often the hardest to find :)

Good luck with your plot!

Lionel

> Lots of grateful thanks,
> Diane
> 
> 
> 
> 
> 
> On Thu, Jan 26, 2017 at 10:16 AM, Lionel Guy <lionel.guy at imbim.uu.se> wrote:
> Hi Diane,
> 
> Nice plot!
> The option you’re looking for is gene_type. You can change it for example by doing:
> 
> geneA.seq$gene_type <- “arrow”
> 
> To see all gene types, look into the examples of gene_types. What you are looking for is probably “arrow” or “block”:
> 
> ?gene_types
> 
> To label clusters, you need to use the “annotation” argument of plot_gene_map. To generate an annotation object, use annotation or the auto_annotate:
> 
> annotA <- auto_annotate(geneA.seq)
> 
> and so on for the other dna_segs and then
> 
> plot_gene_map(dna_segs=*all my seg files* ), comparisons=*all my comparison files*), annotations=*all your annotations*, override_color_schemes=TRUE, global_color_scheme=c("e_value", "auto", "grey", "0”))
> 
> To generate a newick file, you need to obtain a tree, or you can write it yourself. This is a bit beyond the scope of the help list, but look into RAxML or other phylogeny programs. Definition of Newick format is here: http://evolution.genetics.washington.edu/phylip/newicktree.html.
> 
> Hope that helps.
> 
> Cheers,
> 
> Lionel
> 
> > On 25 Jan 2017, at 19:18 , Artemis H <hatziiod at gmail.com> wrote:
> >
> > Hello,
> >
> > I've just tried to use genoplotR for the first time. I used read_dna_seg_from_genbank to import CDS info of cluster genes and read_comparison_from_blast to read  blastn comparison files. I checked them all with is.dna_seg and is.comparison and they all came out TRUE. Finally I used plot_gene_map(dna_segs=*all my seg files* ), comparisons=*all my comparison files*),override_color_schemes=TRUE, global_color_scheme=c("e_value", "auto", "grey", "0")).
> >
> > With this process I end up with a nice image which seems to show the blastn conserved regions but instead of CDS boxes I get thin blue dispersed lines along my stick cluster and I can't seem to find how to give each cluster a name. I've attached the usual output.
> >
> > I would like to ask first, how can I label the clusters, secondly how can I make the CDS show for each cluster as a box or arrow, thirdly how does one go about generating a newick2phylog file with the raw data?
> > I'm pasting an example of a dna_seg object in case that helps:
> > > geneA.seq
> >    name start   end strand length pid gene synonym product  proteinid feature
> > 1  geneA   828  1001      1     57  NA geneA      NA    geneA CD352.1     CDS
> > 2  geneB  1109  4090      1    993  NA geneB      NA    geneB CD353.1     CDS
> > 3  geneT  4101  5903      1    600  NA geneT      NA    geneT CD354.1     CDS
> > 4  geneC  5896  7140      1    414  NA geneC      NA    geneC CD355.1     CDS
> > 5  geneI  7137  7874      1    245  NA geneI      NA    geneI CD356.1     CDS
> > 6  geneP  7876  9924      1    682  NA geneP      NA    geneP CD357.1     CDS
> > 7  geneR  9993 10679      1    228  NA geneR      NA    geneR CD358.1     CDS
> > 8  geneK 10672 12015      1    447  NA geneK      NA    geneK CD359.1     CDS
> > 9  geneF 12114 12791      1    225  NA geneF      NA    geneF CD360.1     CDS
> > 10 geneE 12793 13521      1    242  NA geneE      NA    geneE CD361.1     CDS
> > 11 geneG 13508 14152      1    214  NA geneG      NA    geneG CD362.1     CDS
> >    gene_type  col lty lwd pch cex
> > 1       bars blue   1   1   8   1
> > 2       bars blue   1   1   8   1
> > 3       bars blue   1   1   8   1
> > 4       bars blue   1   1   8   1
> > 5       bars blue   1   1   8   1
> > 6       bars blue   1   1   8   1
> > 7       bars blue   1   1   8   1
> > 8       bars blue   1   1   8   1
> > 9       bars blue   1   1   8   1
> > 10      bars blue   1   1   8   1
> > 11      bars blue   1   1   8   1
> >
> >
> > Thank you in advance, any help and tips would be appreciated.
> > Diane
> >
> > <Rplots_v1.pdf>_______________________________________________
> > genoPlotR-help mailing list
> > genoPlotR-help at lists.r-forge.r-project.org
> > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genoplotr-help
> 
> --
> Lionel Guy
> Department for Medical Biochemistry and Microbiology, Uppsala University, Uppsala, Sweden
> phone: +46 18 471 4246; mobile +46 73 976 0618; postal address: Box 582, SE-751 23 Uppsala; visiting address: BMC D7:304c, Husargatan 3, SE-752 37 Uppsala
> lionel.guy at imbim.uu.se
> 
> 

--
Lionel Guy
Department for Medical Biochemistry and Microbiology, Uppsala University, Uppsala, Sweden
phone: +46 18 471 4246; mobile +46 73 976 0618; postal address: Box 582, SE-751 23 Uppsala; visiting address: BMC D7:304c, Husargatan 3, SE-752 37 Uppsala
lionel.guy at imbim.uu.se



More information about the genoPlotR-help mailing list