From hatziiod at gmail.com Wed Jan 25 19:18:29 2017 From: hatziiod at gmail.com (Artemis H) Date: Wed, 25 Jan 2017 20:18:29 +0200 Subject: [genoPlotR-help] genoplotR help with graphic presentation or CDS and names Message-ID: Hello, I've just tried to use genoplotR for the first time. I used read_dna_seg_from_genbank to import CDS info of cluster genes and read_comparison_from_blast to read blastn comparison files. I checked them all with is.dna_seg and is.comparison and they all came out TRUE. Finally I used plot_gene_map(dna_segs=*all my seg files* ), comparisons=*all my comparison files*),override_color_schemes=TRUE, global_color_scheme=c("e_value", "auto", "grey", "0")). With this process I end up with a nice image which seems to show the blastn conserved regions but instead of CDS boxes I get thin blue dispersed lines along my stick cluster and I can't seem to find how to give each cluster a name. I've attached the usual output. I would like to ask first, how can I label the clusters, secondly how can I make the CDS show for each cluster as a box or arrow, thirdly how does one go about generating a newick2phylog file with the raw data? I'm pasting an example of a dna_seg object in case that helps: > geneA.seq name start end strand length pid gene synonym product proteinid feature 1 geneA 828 1001 1 57 NA geneA NA geneA CD352.1 CDS 2 geneB 1109 4090 1 993 NA geneB NA geneB CD353.1 CDS 3 geneT 4101 5903 1 600 NA geneT NA geneT CD354.1 CDS 4 geneC 5896 7140 1 414 NA geneC NA geneC CD355.1 CDS 5 geneI 7137 7874 1 245 NA geneI NA geneI CD356.1 CDS 6 geneP 7876 9924 1 682 NA geneP NA geneP CD357.1 CDS 7 geneR 9993 10679 1 228 NA geneR NA geneR CD358.1 CDS 8 geneK 10672 12015 1 447 NA geneK NA geneK CD359.1 CDS 9 geneF 12114 12791 1 225 NA geneF NA geneF CD360.1 CDS 10 geneE 12793 13521 1 242 NA geneE NA geneE CD361.1 CDS 11 geneG 13508 14152 1 214 NA geneG NA geneG CD362.1 CDS gene_type col lty lwd pch cex 1 bars blue 1 1 8 1 2 bars blue 1 1 8 1 3 bars blue 1 1 8 1 4 bars blue 1 1 8 1 5 bars blue 1 1 8 1 6 bars blue 1 1 8 1 7 bars blue 1 1 8 1 8 bars blue 1 1 8 1 9 bars blue 1 1 8 1 10 bars blue 1 1 8 1 11 bars blue 1 1 8 1 Thank you in advance, any help and tips would be appreciated. Diane -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Rplots_v1.pdf Type: application/pdf Size: 7909 bytes Desc: not available URL: From lionel.guy at imbim.uu.se Thu Jan 26 09:16:26 2017 From: lionel.guy at imbim.uu.se (Lionel Guy) Date: Thu, 26 Jan 2017 08:16:26 +0000 Subject: [genoPlotR-help] genoplotR help with graphic presentation or CDS and names In-Reply-To: References: Message-ID: <27257591-7F35-4BAE-B9DE-A8B17E380681@imbim.uu.se> Hi Diane, Nice plot! The option you?re looking for is gene_type. You can change it for example by doing: geneA.seq$gene_type <- ?arrow? To see all gene types, look into the examples of gene_types. What you are looking for is probably ?arrow? or ?block?: ?gene_types To label clusters, you need to use the ?annotation? argument of plot_gene_map. To generate an annotation object, use annotation or the auto_annotate: annotA <- auto_annotate(geneA.seq) and so on for the other dna_segs and then plot_gene_map(dna_segs=*all my seg files* ), comparisons=*all my comparison files*), annotations=*all your annotations*, override_color_schemes=TRUE, global_color_scheme=c("e_value", "auto", "grey", "0?)) To generate a newick file, you need to obtain a tree, or you can write it yourself. This is a bit beyond the scope of the help list, but look into RAxML or other phylogeny programs. Definition of Newick format is here: http://evolution.genetics.washington.edu/phylip/newicktree.html. Hope that helps. Cheers, Lionel > On 25 Jan 2017, at 19:18 , Artemis H wrote: > > Hello, > > I've just tried to use genoplotR for the first time. I used read_dna_seg_from_genbank to import CDS info of cluster genes and read_comparison_from_blast to read blastn comparison files. I checked them all with is.dna_seg and is.comparison and they all came out TRUE. Finally I used plot_gene_map(dna_segs=*all my seg files* ), comparisons=*all my comparison files*),override_color_schemes=TRUE, global_color_scheme=c("e_value", "auto", "grey", "0")). > > With this process I end up with a nice image which seems to show the blastn conserved regions but instead of CDS boxes I get thin blue dispersed lines along my stick cluster and I can't seem to find how to give each cluster a name. I've attached the usual output. > > I would like to ask first, how can I label the clusters, secondly how can I make the CDS show for each cluster as a box or arrow, thirdly how does one go about generating a newick2phylog file with the raw data? > I'm pasting an example of a dna_seg object in case that helps: > > geneA.seq > name start end strand length pid gene synonym product proteinid feature > 1 geneA 828 1001 1 57 NA geneA NA geneA CD352.1 CDS > 2 geneB 1109 4090 1 993 NA geneB NA geneB CD353.1 CDS > 3 geneT 4101 5903 1 600 NA geneT NA geneT CD354.1 CDS > 4 geneC 5896 7140 1 414 NA geneC NA geneC CD355.1 CDS > 5 geneI 7137 7874 1 245 NA geneI NA geneI CD356.1 CDS > 6 geneP 7876 9924 1 682 NA geneP NA geneP CD357.1 CDS > 7 geneR 9993 10679 1 228 NA geneR NA geneR CD358.1 CDS > 8 geneK 10672 12015 1 447 NA geneK NA geneK CD359.1 CDS > 9 geneF 12114 12791 1 225 NA geneF NA geneF CD360.1 CDS > 10 geneE 12793 13521 1 242 NA geneE NA geneE CD361.1 CDS > 11 geneG 13508 14152 1 214 NA geneG NA geneG CD362.1 CDS > gene_type col lty lwd pch cex > 1 bars blue 1 1 8 1 > 2 bars blue 1 1 8 1 > 3 bars blue 1 1 8 1 > 4 bars blue 1 1 8 1 > 5 bars blue 1 1 8 1 > 6 bars blue 1 1 8 1 > 7 bars blue 1 1 8 1 > 8 bars blue 1 1 8 1 > 9 bars blue 1 1 8 1 > 10 bars blue 1 1 8 1 > 11 bars blue 1 1 8 1 > > > Thank you in advance, any help and tips would be appreciated. > Diane > > _______________________________________________ > genoPlotR-help mailing list > genoPlotR-help at lists.r-forge.r-project.org > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genoplotr-help -- Lionel Guy Department for Medical Biochemistry and Microbiology, Uppsala University, Uppsala, Sweden phone: +46 18 471 4246; mobile +46 73 976 0618; postal address: Box 582, SE-751 23 Uppsala; visiting address: BMC D7:304c, Husargatan 3, SE-752 37 Uppsala lionel.guy at imbim.uu.se From lionel.guy at imbim.uu.se Thu Jan 26 21:33:00 2017 From: lionel.guy at imbim.uu.se (Lionel Guy) Date: Thu, 26 Jan 2017 20:33:00 +0000 Subject: [genoPlotR-help] genoplotR help with graphic presentation or CDS and names In-Reply-To: References: <27257591-7F35-4BAE-B9DE-A8B17E380681@imbim.uu.se> Message-ID: <3F558525-3D0C-468F-AF14-F945A76ADCF2@imbim.uu.se> Hi Diane, See my comments inline: > On 26 Jan 2017, at 15:07 , Artemis H wrote: > > Hi Lionel, > > Thanks for the prompt very useful help. > > I have a few more questions though. I made a tree by giving RAxML a MEGA7 alignment and tried feeding that into my command list but I seem to have a name issue. > > My sequences are called > ##Sequences > setA.seq <- read_dna_seg_from_genbank("setAfile.gb", tagsToParse=c("CDS")) > etc > > My tree number 1 was > tree <- newick2phylog("((setU.seq:0.10250953446912315636,setR.seq:0.09775744461960976517):0.01209692024574308099,((setA.seq:0.04104447134815911863,setQ.seq:0.06305211466588933611):0.01132124623294956944,setH.seq:0.11080167962495350575):0.01743359660415144674,otherset.seq:0.10269874269575925141):0.0;") > > I also tried chopping the last :0.0; off the end (fruitless improvisation) > > My plot line was; > plot_gene_map(dna_segs=list(otherset.seq, setA.seq, setQ.seq, setH.seq, setR.seq, setU.seq ), comparisons=list(Geobacillin_setA.comparison, setA_setQ.comparison, setQ_setH.comparison, setH_setR.comparison, setR_setU.comparison),override_color_schemes=TRUE, global_color_scheme=c("e_value", "auto", "grey", "0"),tree=tree) > > That gave me: > Error in plot_gene_map(dna_segs = list(otherset.seq, setA.seq, setQ.seq, : > If tree is given, label names should be provided via named list dna_segs or dna_seg_labels > Execution halted > > As far as I can see though the label names I gave are exactly the same as the dna_segs names. I tried introducing labels instead: > > plot_gene_map(dna_segs=list(otherset.seq, setA.seq, setQ.seq, setH.seq, setR.seq, setU.seq ), comparisons=list(otherset_set.comparison, setA_setQ.comparison, setQ_setH.comparison, setH_setR.comparison, setR_setU.comparison),override_color_schemes=TRUE, global_color_scheme=c("e_value", "auto", "grey", "0"),dna_seg_labels=c("otherset I", "setin A cluster", "setin Q cluster", "setin H cluster","setin R cluster","setin U cluster"),tree=tree) > > Again I got "Error in plot_gene_map(dna_segs = list(Geobacillin.seq, nisA.seq, nisQ.seq, : > If tree is given, label names should be provided via named list dna_segs or dna_seg_labels > Execution halted? > Two ways: a) name your objects in the list: plot_gene_map(dna_segs=list(otherset.seq=otherset.seq, setA.seq=setA.seq, setQ.seq=setQ.seq, and so on, so that the names of the elements of the list (before the =) exactly match the leaves of the tree b) use dna_seg_labels that are identical to the names of the leaves of the tree: plot_gene_map(?, ?, dna_seg_labels=c(?otherset.seq", ?setA.seq", ?setQ.seq?, etc... > I also tried giving the dna_seg names in the newick list the labels in the form of tree <- newick2phylog("(("setin U cluster":0.102509534.. etc but that terminated at the tree line with a "Error: unexpected symbol in "tree <- newick2phylog("(("setin" > Execution halted" message. You don?t want to mess with extra quotes there. Keep away from spaces in tree labels. > I guess it must be something horribly simple with names and labels but if its obvious to you please share the secret. > > I gt the arrows and gene names showing but couldn't figure out how to get the cluster names showing to the left, should I assume that if the tree is recognized it will show the names then? Yes. I think that even without the tree, if you use solution a) or b) above it should work. > Also just for your amusement it took me about 2 hours to realize there isn't a problem with my genbank files but the gene_type I wanted was arrows not arrow. The smallest errors are often the hardest to find :) Good luck with your plot! Lionel > Lots of grateful thanks, > Diane > > > > > > On Thu, Jan 26, 2017 at 10:16 AM, Lionel Guy wrote: > Hi Diane, > > Nice plot! > The option you?re looking for is gene_type. You can change it for example by doing: > > geneA.seq$gene_type <- ?arrow? > > To see all gene types, look into the examples of gene_types. What you are looking for is probably ?arrow? or ?block?: > > ?gene_types > > To label clusters, you need to use the ?annotation? argument of plot_gene_map. To generate an annotation object, use annotation or the auto_annotate: > > annotA <- auto_annotate(geneA.seq) > > and so on for the other dna_segs and then > > plot_gene_map(dna_segs=*all my seg files* ), comparisons=*all my comparison files*), annotations=*all your annotations*, override_color_schemes=TRUE, global_color_scheme=c("e_value", "auto", "grey", "0?)) > > To generate a newick file, you need to obtain a tree, or you can write it yourself. This is a bit beyond the scope of the help list, but look into RAxML or other phylogeny programs. Definition of Newick format is here: http://evolution.genetics.washington.edu/phylip/newicktree.html. > > Hope that helps. > > Cheers, > > Lionel > > > On 25 Jan 2017, at 19:18 , Artemis H wrote: > > > > Hello, > > > > I've just tried to use genoplotR for the first time. I used read_dna_seg_from_genbank to import CDS info of cluster genes and read_comparison_from_blast to read blastn comparison files. I checked them all with is.dna_seg and is.comparison and they all came out TRUE. Finally I used plot_gene_map(dna_segs=*all my seg files* ), comparisons=*all my comparison files*),override_color_schemes=TRUE, global_color_scheme=c("e_value", "auto", "grey", "0")). > > > > With this process I end up with a nice image which seems to show the blastn conserved regions but instead of CDS boxes I get thin blue dispersed lines along my stick cluster and I can't seem to find how to give each cluster a name. I've attached the usual output. > > > > I would like to ask first, how can I label the clusters, secondly how can I make the CDS show for each cluster as a box or arrow, thirdly how does one go about generating a newick2phylog file with the raw data? > > I'm pasting an example of a dna_seg object in case that helps: > > > geneA.seq > > name start end strand length pid gene synonym product proteinid feature > > 1 geneA 828 1001 1 57 NA geneA NA geneA CD352.1 CDS > > 2 geneB 1109 4090 1 993 NA geneB NA geneB CD353.1 CDS > > 3 geneT 4101 5903 1 600 NA geneT NA geneT CD354.1 CDS > > 4 geneC 5896 7140 1 414 NA geneC NA geneC CD355.1 CDS > > 5 geneI 7137 7874 1 245 NA geneI NA geneI CD356.1 CDS > > 6 geneP 7876 9924 1 682 NA geneP NA geneP CD357.1 CDS > > 7 geneR 9993 10679 1 228 NA geneR NA geneR CD358.1 CDS > > 8 geneK 10672 12015 1 447 NA geneK NA geneK CD359.1 CDS > > 9 geneF 12114 12791 1 225 NA geneF NA geneF CD360.1 CDS > > 10 geneE 12793 13521 1 242 NA geneE NA geneE CD361.1 CDS > > 11 geneG 13508 14152 1 214 NA geneG NA geneG CD362.1 CDS > > gene_type col lty lwd pch cex > > 1 bars blue 1 1 8 1 > > 2 bars blue 1 1 8 1 > > 3 bars blue 1 1 8 1 > > 4 bars blue 1 1 8 1 > > 5 bars blue 1 1 8 1 > > 6 bars blue 1 1 8 1 > > 7 bars blue 1 1 8 1 > > 8 bars blue 1 1 8 1 > > 9 bars blue 1 1 8 1 > > 10 bars blue 1 1 8 1 > > 11 bars blue 1 1 8 1 > > > > > > Thank you in advance, any help and tips would be appreciated. > > Diane > > > > _______________________________________________ > > genoPlotR-help mailing list > > genoPlotR-help at lists.r-forge.r-project.org > > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genoplotr-help > > -- > Lionel Guy > Department for Medical Biochemistry and Microbiology, Uppsala University, Uppsala, Sweden > phone: +46 18 471 4246; mobile +46 73 976 0618; postal address: Box 582, SE-751 23 Uppsala; visiting address: BMC D7:304c, Husargatan 3, SE-752 37 Uppsala > lionel.guy at imbim.uu.se > > -- Lionel Guy Department for Medical Biochemistry and Microbiology, Uppsala University, Uppsala, Sweden phone: +46 18 471 4246; mobile +46 73 976 0618; postal address: Box 582, SE-751 23 Uppsala; visiting address: BMC D7:304c, Husargatan 3, SE-752 37 Uppsala lionel.guy at imbim.uu.se