[genoPlotR-help] reading comparisons from blasts
Lionel Guy
lionel.guy at ebc.uu.se
Thu May 19 09:01:05 CEST 2011
Hi Peter,
I think you should read a bit the help page for lapply, it seems that's what confusing you. Try
Mesostigma <- try(read_dna_seg_from_file("/Users/pgn6/Desktop/R/Mesostigma_viride.gb"))
Chlorokybus <- try(read_dna_seg_from_file("/Users/pgn6/Desktop/R/Chlorokybus_atmophyticus_chloroplast.gb"))
Chara <- try(read_dna_seg_from_file("/Users/pgn6/Desktop/R/Chara_vulgaris_Chloroplast.gb"))
Chaetosphaeridium <- try(read_dna_seg_from_file("/Users/pgn6/Desktop/R/Chaetosphaeridium_globosum.gb"))
Zygnema <- try(read_dna_seg_from_file("/Users/pgn6/Desktop/R/Zygnema_circumcarinatum_Chloroplast.gb"))
Staurastrum <- try(read_dna_seg_from_file("/Users/pgn6/Desktop/R/Staurastrum_punctulatumChloroplastGenome.gb"))
Mesostigma_vs_Chlorokybus <- try(read_comparison_from_blast("/Users/pgn6/Desktop/R/Mesostigma_vs_Chlorokybus.txt"))
Chlorokybus_vs_Chara <- try(read_comparison_from_blast("/Users/pgn6/Desktop/R/Chlorokybus_vs_Chara.txt"))
Chara_vs_Chaetosphaeridium <- try(read_comparison_from_blast ("/Users/pgn6/Desktop/R/Chara_vs_Chaetosphaeridium.txt"))
Chaetosphaeridium_vs_Zygnema <- try (read_comparison_from_blast ("/Users/pgn6/Desktop/R/Zygnema_vs_Staurastrum.txt"))
Zygnema_vs_Staurastrum <- try(read_comparison_from_blast ("/Users/pgn6/Desktop/R/Zygnema_vs_Staurastrum.txt"))
## telling it to just graph the inverted repeat regions
xlims<- list (c(112304, 118360), c(144615, 152254), c(174015, 184933), c(130683, 130937), c(165301, 165372), c(156101, 157089))
ds <- list (Mesostigma, Chlorokybus, Chara, Chaetosphaeridium, Zygnema, Staurastrum)
cp <- list (Mesostigma_vs_Chlorokybus, Chlorokybus_vs_Chara, Chara_vs_Chaetosphaeridium, Chaetosphaeridium_vs_Zygnema, Zygnema_vs_Staurastrum)
annots <-lapply(ds, auto_annotate)
plot_gene_map (dna_segs=ds, comparisons=cp, annotations=annots, xlims=xlims, main = "Charaphyte Algae Internal Repeat", gene_type = "auto", limit_to_longest_dna_seg = FALSE, dna_seg_scale = TRUE, scale = FALSE)
If you still have difficulties with lapply, I suggest something along these lines, to make it clearer (it basically does the same as
annots <-lapply(ds, auto_annotate)
but in a more readable version):
annots <- list()
for (i in 1:length(ds)){
annots[[i]] <- auto_annotate(ds[[i]])
}
Cheers,
Lionel
On 18 May 2011, at 18:24 , Peter Neofotis wrote:
> Dr. Guy,
>
> Thank you for your reply.
>
> I am still though, deeply confused - perhaps even more so. I still feel my confusion is that I can't write my own scripts from the examples in your exercise because you use the barto dataset (without giving me what a full script of my own would look like). The below (at the end of this email) just tells me qualitatively what you did, not how you actually made the comparison. Does that make sense? An example full would be helpful.
>
> I did make the modifications you suggested to my script - and it is not working at all now. I get the following error.
>
> > annots <-lapply(auto_annotate, function(x) {
> + mid <- middle(x)
> + annot <- annotation (x1 = mid, text = x$name, rot= 30)
> + idx <- grep ("^[^B]", annot$text, perl=TRUE)
> + annot[idx[idx%%1==0],]
> + })
> Error in middle(x) : argument should be a dna_seg object
> >
> > plot_gene_map (ds, cp, annotations=annots, comparison = list (Mesostigma_vs_Chlorokybus, Chlorokybus_vs_Chara, Chara_vs_Chaetosphaeridium, Chaetosphaeridium_vs_Zygnema,
> + Zygnema_vs_Staurastrum), xlims=xlims, main = "Charaphyte Algae Internal Repeat", gene_type = "auto", limit_to_longest_dna_seg = FALSE, dna_seg_scale = TRUE, scale = FALSE)
> Error in plot_gene_map(ds, cp, annotations = annots, comparison = list(Mesostigma_vs_Chlorokybus, :
> Argument tree should be of class phylog (ade4)
>
>
> Once again, I thank you for your time and help. Like I said, I think I could figure this out if I just had a complete script of what you did in Example 4 (if you didn't load the barto dataset but had to do it, like me, from scratch). As what I want to do is exactly the same. But maybe I'm wrong and my errors are elsewhere.
>
> Best,
>
> Peter
>
> Mesostigma <- try(read_dna_seg_from_file("/Users/pgn6/Desktop/R/Mesostigma_viride.gb"))
> Chlorokybus <- try(read_dna_seg_from_file("/Users/pgn6/Desktop/R/Chlorokybus_atmophyticus_chloroplast.gb"))
> Chara <- try(read_dna_seg_from_file("/Users/pgn6/Desktop/R/Chara_vulgaris_Chloroplast.gb"))
> Chaetosphaeridium <- try(read_dna_seg_from_file("/Users/pgn6/Desktop/R/Chaetosphaeridium_globosum.gb"))
> Zygnema <- try(read_dna_seg_from_file("/Users/pgn6/Desktop/R/Zygnema_circumcarinatum_Chloroplast.gb"))
> Staurastrum <- try(read_dna_seg_from_file("/Users/pgn6/Desktop/R/Staurastrum_punctulatumChloroplastGenome.gb"))
>
> Mesostigma_vs_Chlorokybus <- try(read_comparison_from_blast("/Users/pgn6/Desktop/R/Mesostigma_vs_Chlorokybus.txt"))
> Chlorokybus_vs_Chara <- try(read_comparison_from_blast("/Users/pgn6/Desktop/R/Chlorokybus_vs_Chara.txt"))
> Chara_vs_Chaetosphaeridium <- try(read_comparison_from_blast ("/Users/pgn6/Desktop/R/Chara_vs_Chaetosphaeridium.txt"))
> Chaetosphaeridium_vs_Zygnema <- try (read_comparison_from_blast ("/Users/pgn6/Desktop/R/Zygnema_vs_Staurastrum.txt"))
> Zygnema_vs_Staurastrum <- try(read_comparison_from_blast ("/Users/pgn6/Desktop/R/Zygnema_vs_Staurastrum.txt"))
>
> ## telling it to just graph the inverted repeat regions
> xlims<- list (c(112304, 118360), c(144615, 152254), c(174015, 184933), c(130683, 130937), c(165301, 165372), c(156101, 157089))
>
> ds <- list (Mesostigma, Chlorokybus, Chara, Chaetosphaeridium, Zygnema, Staurastrum)
> cp <- list (Mesostigma_vs_Chlorokybus, Chlorokybus_vs_Chara, Chara_vs_Chaetosphaeridium, Chaetosphaeridium_vs_Zygnema, Zygnema_vs_Staurastrum)
>
> annots <-lapply(auto_annotate, function(x) {
> mid <- middle(x)
> annot <- annotation (x1 = mid, text = x$name, rot= 30)
> idx <- grep ("^[^B]", annot$text, perl=TRUE)
> annot[idx[idx%%1==0],]
> })
>
> plot_gene_map (ds, cp, annotations=annots, comparison = list (Mesostigma_vs_Chlorokybus, Chlorokybus_vs_Chara, Chara_vs_Chaetosphaeridium, Chaetosphaeridium_vs_Zygnema,
> Zygnema_vs_Staurastrum), xlims=xlims, main = "Charaphyte Algae Internal Repeat", gene_type = "auto", limit_to_longest_dna_seg = FALSE, dna_seg_scale = TRUE, scale = FALSE)
>
>
>
> Comparison of 4 Bartonella genomes
>
> Description
>
> Comparison of 4 Bartonella genomes by BLAST.
>
> Usage
>
> data(barto)
>
> Format
>
> barto, a list of three dataframes, representing the four genomes and their pairwise comparisons:
>
> • dna_segswhich is a list of 4 dna_seg objects, containing all the protein genes for each genome. Obtained by reading ptt files downloaded from NCBI with read_dna_seg_from_ptt.
> • comparisonswhich is a list of 3 comparison objects, obtained by doing genome-to-genome (fasta files) BLASTS, and then reading the resulting tab files with read_comparison_from_blast.
> • rnt_segswhich is a list of 4 dna_seg objects, containing all the RNA genes of the four genomes. Obtained by reading rnt files downloaded from NCBI with read_dna_seg_from_ptt.
> A bash script to obtain the same file as in the dataset is available in the extdata folder of the package. Find its location by running system.file('extdata/barto.sh', package = 'genoPlotR').
>
> References
>
> BLAST: http://www.ncbi.nlm.nih.gov/blast/
>
> Examples
>
> data(barto)
> plot_gene_map(barto$rnt_segs, barto$comparisons, gene_type="blocks")
>
>
> On Wed, May 11, 2011 at 12:18 PM, Lionel Guy <lionel.guy at ebc.uu.se> wrote:
> Hi Peter,
>
> On 11 May 2011, at 18:02 , Peter Neofotis wrote:
>
> > I am stuck, though, on how to make the figure that you do in Example 4 of your vignette. I think my main confusion is that although you say you do the "comparison between four Bartonelle genomes using BLAST) it is not clear how this data is formatted when you load it for data(barto) - so I am confused when starting from scratch with my own data (from BLAST).
>
> You can find a complete description of the "barto" dataset by getting help on the barto object
> > library(genoPlotR)
> > ?barto
>
> In short, it loads a barto object which is a list of 3, including
> - a list of 4 dna_seg object
> - a list of 3 comparisons
> - a further list of dna_segs that is not needed here
>
> > Charophyte_Algae <- try(read_comparison_from_blast("/Users/pgn6/Desktop/R/CharaphyteAlgaeBlast.txt"))
>
> You are loading only one file, whereas you should have one file per comparison (i.e Mesostigma_vs_Chlorokybus, Chlorokybus_vs_Chara, etc..)
> > comp1 <- read_comparison_from_blast("Mesostigma_vs_Chlorokybus.blast")
> > comp2 <- ...
>
> > ## Plotting - this is the part that does not work.
> > plot_gene_map(dna_segs=list (Mesostigma, Chlorokybus, Chara, Chaetosphaeridium, Zygnema, Staurastrum), comparison = list(Charophyte_Algae), xlims=xlims, main = "Charaphyte Algae Internal Repeat", gene_type = "side_blocks", dna_seg_scale = TRUE, scale = FALSE)
>
> And here you need to have comparison=list(comp1, comp2, ...).
>
> In addition, looking at your files, it seems you are doing blastn, which is likely to yield very few results for organisms that are so far away. Try tblastx instead.
>
> HTH,
>
> Lionel
>
============================================
Lionel Guy
Molecular Evolution, EBC, Uppsala University
Norbyv. 18C, 752 36 Uppsala, Sweden
phone: +46 (0)18-471 6129
fax : +46 (0)18-471 6404
email: lionel.guy at ebc.uu.se
============================================
More information about the genoPlotR-help
mailing list