Hi I wrote this small script in order to be able to analyze read from CHIP seq data. This script allow me to get from one read sequence all the possible combination by removing a base at each side of the read until no base are left.
my issue is that I dont find the way to pull all the output into one csv or xls file with the name of the sample (read name) and the full list of combination.
script
#j=1
#nr<-nrow(seq2)
#n=1
#while (n<nr) {
#p<-str_length(seq2[n,2])
#i<-1:p
#while (j<=p) { seqpart<-(str_sub(seq2[n,2], start=i[i], end=-j))
#print(seq2[n,1])
#out.files <-seqpart
#write.csv(seqpart, out.files[n])
#j=j+1
#}
#j=1
#n=n+1
#print(n)
#}
I am sure that we have better ways to do that, but I am just a beginner in R.....
My input are a excel spread shit
name seq1.s Col3
1 AA1 CAAGGCGCGCGT AE
2 AB1 GGTACACAATATATGGTGTGCGTGCGTGCG AF mod
3 AC1 AGTGCGTGAAAACCGTC AI
4 AD1 AAAGTGTGTGCCACAC AG
5 AE1 TGACTGACT AB
6 AF1 TAGTTGCAACGTTGCAGCGTTGCA AH full
7 AG1 AACGCGCCGTTAACGTTGACACGTGTGT AD mod
8 AH1 ACGTGGAGTGCGTGTGTACACGTGTG Ai full
9 AI1 TGAACAGTGGTACGTACATGCGTACGTTAACG AC longer
10 AJ1 CATGCATGCATGCATGCAT AA
and the ouput I can get for the first sequence is for one run "","x"
"1","CAAGGCGCGCGT"
"2","AAGGCGCGCGT"
"3","AGGCGCGCGT"
"4","GGCGCGCGT"
"5","GCGCGCGT"
"6","CGCGCGT"
"7","GCGCGT"
"8","CGCGT"
"9","GCGT"
"10","CGT"
"11","GT"
"12","T"
with this script I am getting a number of file according to p. I would like to merge all the file from one sequence to one first, remove all duplicate like "" and then merge all the file together with gene or read name and full list of possible combination in order to extract from all the possible combination repeat pattern.
Thanks a lot
PAPY
<br/><hr align="left" width="300" />
View this message in context: <a href="http://r.789695.n4.nabble.com/merging-output-to-one-data-file-tp4705396.html">merging output to one data file</a><br/>
Sent from the <a href="http://r.789695.n4.nabble.com/datatable-help-f2315188.html">datatable-help mailing list archive</a> at Nabble.com.<br/>