[genoPlotR-help] genoplotR help with graphic presentation or CDS and names
Lionel Guy
lionel.guy at imbim.uu.se
Thu Jan 26 21:33:00 CET 2017
Hi Diane,
See my comments inline:
> On 26 Jan 2017, at 15:07 , Artemis H <hatziiod at gmail.com> wrote:
>
> Hi Lionel,
>
> Thanks for the prompt very useful help.
>
> I have a few more questions though. I made a tree by giving RAxML a MEGA7 alignment and tried feeding that into my command list but I seem to have a name issue.
>
> My sequences are called
> ##Sequences
> setA.seq <- read_dna_seg_from_genbank("setAfile.gb", tagsToParse=c("CDS"))
> etc
>
> My tree number 1 was
> tree <- newick2phylog("((setU.seq:0.10250953446912315636,setR.seq:0.09775744461960976517):0.01209692024574308099,((setA.seq:0.04104447134815911863,setQ.seq:0.06305211466588933611):0.01132124623294956944,setH.seq:0.11080167962495350575):0.01743359660415144674,otherset.seq:0.10269874269575925141):0.0;")
>
> I also tried chopping the last :0.0; off the end (fruitless improvisation)
>
> My plot line was;
> plot_gene_map(dna_segs=list(otherset.seq, setA.seq, setQ.seq, setH.seq, setR.seq, setU.seq ), comparisons=list(Geobacillin_setA.comparison, setA_setQ.comparison, setQ_setH.comparison, setH_setR.comparison, setR_setU.comparison),override_color_schemes=TRUE, global_color_scheme=c("e_value", "auto", "grey", "0"),tree=tree)
>
> That gave me:
> Error in plot_gene_map(dna_segs = list(otherset.seq, setA.seq, setQ.seq, :
> If tree is given, label names should be provided via named list dna_segs or dna_seg_labels
> Execution halted
>
> As far as I can see though the label names I gave are exactly the same as the dna_segs names. I tried introducing labels instead:
>
> plot_gene_map(dna_segs=list(otherset.seq, setA.seq, setQ.seq, setH.seq, setR.seq, setU.seq ), comparisons=list(otherset_set.comparison, setA_setQ.comparison, setQ_setH.comparison, setH_setR.comparison, setR_setU.comparison),override_color_schemes=TRUE, global_color_scheme=c("e_value", "auto", "grey", "0"),dna_seg_labels=c("otherset I", "setin A cluster", "setin Q cluster", "setin H cluster","setin R cluster","setin U cluster"),tree=tree)
>
> Again I got "Error in plot_gene_map(dna_segs = list(Geobacillin.seq, nisA.seq, nisQ.seq, :
> If tree is given, label names should be provided via named list dna_segs or dna_seg_labels
> Execution halted”
>
Two ways: a) name your objects in the list:
plot_gene_map(dna_segs=list(otherset.seq=otherset.seq, setA.seq=setA.seq, setQ.seq=setQ.seq, and so on, so that the names of the elements of the list (before the =) exactly match the leaves of the tree
b) use dna_seg_labels that are identical to the names of the leaves of the tree:
plot_gene_map(…, …, dna_seg_labels=c(“otherset.seq", “setA.seq", “setQ.seq”, etc...
> I also tried giving the dna_seg names in the newick list the labels in the form of tree <- newick2phylog("(("setin U cluster":0.102509534.. etc but that terminated at the tree line with a "Error: unexpected symbol in "tree <- newick2phylog("(("setin"
> Execution halted" message.
You don’t want to mess with extra quotes there. Keep away from spaces in tree labels.
> I guess it must be something horribly simple with names and labels but if its obvious to you please share the secret.
>
> I gt the arrows and gene names showing but couldn't figure out how to get the cluster names showing to the left, should I assume that if the tree is recognized it will show the names then?
Yes. I think that even without the tree, if you use solution a) or b) above it should work.
> Also just for your amusement it took me about 2 hours to realize there isn't a problem with my genbank files but the gene_type I wanted was arrows not arrow.
The smallest errors are often the hardest to find :)
Good luck with your plot!
Lionel
> Lots of grateful thanks,
> Diane
>
>
>
>
>
> On Thu, Jan 26, 2017 at 10:16 AM, Lionel Guy <lionel.guy at imbim.uu.se> wrote:
> Hi Diane,
>
> Nice plot!
> The option you’re looking for is gene_type. You can change it for example by doing:
>
> geneA.seq$gene_type <- “arrow”
>
> To see all gene types, look into the examples of gene_types. What you are looking for is probably “arrow” or “block”:
>
> ?gene_types
>
> To label clusters, you need to use the “annotation” argument of plot_gene_map. To generate an annotation object, use annotation or the auto_annotate:
>
> annotA <- auto_annotate(geneA.seq)
>
> and so on for the other dna_segs and then
>
> plot_gene_map(dna_segs=*all my seg files* ), comparisons=*all my comparison files*), annotations=*all your annotations*, override_color_schemes=TRUE, global_color_scheme=c("e_value", "auto", "grey", "0”))
>
> To generate a newick file, you need to obtain a tree, or you can write it yourself. This is a bit beyond the scope of the help list, but look into RAxML or other phylogeny programs. Definition of Newick format is here: http://evolution.genetics.washington.edu/phylip/newicktree.html.
>
> Hope that helps.
>
> Cheers,
>
> Lionel
>
> > On 25 Jan 2017, at 19:18 , Artemis H <hatziiod at gmail.com> wrote:
> >
> > Hello,
> >
> > I've just tried to use genoplotR for the first time. I used read_dna_seg_from_genbank to import CDS info of cluster genes and read_comparison_from_blast to read blastn comparison files. I checked them all with is.dna_seg and is.comparison and they all came out TRUE. Finally I used plot_gene_map(dna_segs=*all my seg files* ), comparisons=*all my comparison files*),override_color_schemes=TRUE, global_color_scheme=c("e_value", "auto", "grey", "0")).
> >
> > With this process I end up with a nice image which seems to show the blastn conserved regions but instead of CDS boxes I get thin blue dispersed lines along my stick cluster and I can't seem to find how to give each cluster a name. I've attached the usual output.
> >
> > I would like to ask first, how can I label the clusters, secondly how can I make the CDS show for each cluster as a box or arrow, thirdly how does one go about generating a newick2phylog file with the raw data?
> > I'm pasting an example of a dna_seg object in case that helps:
> > > geneA.seq
> > name start end strand length pid gene synonym product proteinid feature
> > 1 geneA 828 1001 1 57 NA geneA NA geneA CD352.1 CDS
> > 2 geneB 1109 4090 1 993 NA geneB NA geneB CD353.1 CDS
> > 3 geneT 4101 5903 1 600 NA geneT NA geneT CD354.1 CDS
> > 4 geneC 5896 7140 1 414 NA geneC NA geneC CD355.1 CDS
> > 5 geneI 7137 7874 1 245 NA geneI NA geneI CD356.1 CDS
> > 6 geneP 7876 9924 1 682 NA geneP NA geneP CD357.1 CDS
> > 7 geneR 9993 10679 1 228 NA geneR NA geneR CD358.1 CDS
> > 8 geneK 10672 12015 1 447 NA geneK NA geneK CD359.1 CDS
> > 9 geneF 12114 12791 1 225 NA geneF NA geneF CD360.1 CDS
> > 10 geneE 12793 13521 1 242 NA geneE NA geneE CD361.1 CDS
> > 11 geneG 13508 14152 1 214 NA geneG NA geneG CD362.1 CDS
> > gene_type col lty lwd pch cex
> > 1 bars blue 1 1 8 1
> > 2 bars blue 1 1 8 1
> > 3 bars blue 1 1 8 1
> > 4 bars blue 1 1 8 1
> > 5 bars blue 1 1 8 1
> > 6 bars blue 1 1 8 1
> > 7 bars blue 1 1 8 1
> > 8 bars blue 1 1 8 1
> > 9 bars blue 1 1 8 1
> > 10 bars blue 1 1 8 1
> > 11 bars blue 1 1 8 1
> >
> >
> > Thank you in advance, any help and tips would be appreciated.
> > Diane
> >
> > <Rplots_v1.pdf>_______________________________________________
> > genoPlotR-help mailing list
> > genoPlotR-help at lists.r-forge.r-project.org
> > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genoplotr-help
>
> --
> Lionel Guy
> Department for Medical Biochemistry and Microbiology, Uppsala University, Uppsala, Sweden
> phone: +46 18 471 4246; mobile +46 73 976 0618; postal address: Box 582, SE-751 23 Uppsala; visiting address: BMC D7:304c, Husargatan 3, SE-752 37 Uppsala
> lionel.guy at imbim.uu.se
>
>
--
Lionel Guy
Department for Medical Biochemistry and Microbiology, Uppsala University, Uppsala, Sweden
phone: +46 18 471 4246; mobile +46 73 976 0618; postal address: Box 582, SE-751 23 Uppsala; visiting address: BMC D7:304c, Husargatan 3, SE-752 37 Uppsala
lionel.guy at imbim.uu.se
More information about the genoPlotR-help
mailing list