<div dir="ltr"><div>I was off by a factor of 10; I thought it said 200,000 but it was only 20,000 so it only takes 10 seconds to solves</div><div><br></div><div>> n <- 20000<br>> mi <- 4500<br>> start <- sample(n * 10, n) # start times<br>> int <- sample(1000, n, TRUE) # interval between start and end<br>> genes <- data.frame(gene = paste0('gene', 1:n)<br>+ , start = start<br>+ , end = start + int<br>+ , stringsAsFactors = FALSE<br>+ )<br>> miRNA <- data.frame(name = paste0('mi', 1:mi)<br>+ , pos = sample(n * 9, mi)<br>+ , stringsAsFactors = FALSE<br>+ )<br>> require(sqldf)<br>> <br>> system.time({<br>+ matches <- sqldf("<br>+ select m.*, g.*<br>+ from miRNA as m<br>+ join genes as g<br>+ on m.pos between g.start and g.end<br>+ ")<br>+ }) <br> user system elapsed <br> 10.91 0.02 10.96 <br>> head(matches, 10)<br> name pos gene start end<br>1 mi1 3825 gene200 3634 4134<br>2 mi1 3825 gene385 3616 4241<br>3 mi1 3825 gene410 3492 4089<br>4 mi1 3825 gene1172 3707 3847<br>5 mi1 3825 gene1228 3825 3919<br>6 mi1 3825 gene1726 3586 4552<br>7 mi1 3825 gene1859 3633 4163<br>8 mi1 3825 gene1869 3269 4138<br>9 mi1 3825 gene2061 3812 4094<br>10 mi1 3825 gene2248 3225 3939<br></div><div class="gmail_extra">> str(matches)<br>'data.frame': 224028 obs. of 5 variables:<br> $ name : chr "mi1" "mi1" "mi1" "mi1" ...<br> $ pos : int 3825 3825 3825 3825 3825 3825 3825 3825 3825 3825 ...<br> $ gene : chr "gene200" "gene385" "gene410" "gene1172" ...<br> $ start: int 3634 3616 3492 3707 3825 3586 3633 3269 3812 3225 ...<br> $ end : int 4134 4241 4089 3847 3919 4552 4163 4138 4094 3939 ...<br><br clear="all"><div><div class="gmail_signature"><br>Jim Holtman<br>Data Munger Guru<br> <br>What is the problem that you are trying to solve?<br>Tell me what you want to do, not how you want to do it.</div></div>
<br><div class="gmail_quote">On Sun, Mar 15, 2015 at 3:41 PM, Papysounours <span dir="ltr"><<a href="mailto:Cyrille.laurent.sage@gmail.com" target="_blank">Cyrille.laurent.sage@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;padding-left:1ex;border-left-color:rgb(204,204,204);border-left-width:1px;border-left-style:solid">Hi<br>
<br>
I am just starting R programming because i need it to analyse new sequencing<br>
data. I got two list of data (excel table) one is gene list with chromosomal<br>
position (like start:123456 end:124567), the other is miRNA list with only<br>
one position (like 123789).<br>
In the first liste i have around 20000 row (meaning 20000 gene name to<br>
compare to) and for the second around 4500 row (4500 miRNA).<br>
I want to compare the position of each individual miRNA position (<br>
genestart<=miRNA<=geneend ) to the entire list of gene in order to get in a<br>
new table the name of the miRNA (first colum of the miRNA list) and the name<br>
of the gene (first colum of the gene list) related to the miRNA.<br>
Hope thisis not to much to ask.<br>
Papy<br>
<br>
<br>
<br>
--<br>
View this message in context: <a href="http://r.789695.n4.nabble.com/R-beginner-tp4704684.html" target="_blank">http://r.789695.n4.nabble.com/R-beginner-tp4704684.html</a><br>
Sent from the datatable-help mailing list archive at Nabble.com.<br>
_______________________________________________<br>
datatable-help mailing list<br>
<a href="mailto:datatable-help@lists.r-forge.r-project.org">datatable-help@lists.r-forge.r-project.org</a><br>
<a href="https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help" target="_blank">https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help</a><br>
</blockquote></div><br></div></div>