<html><head><style>body{font-family:Helvetica,Arial;font-size:13px}</style></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;"><div id="bloop_customfont" style="font-family:Helvetica,Arial;font-size:13px; color: rgba(0,0,0,1.0); margin: 0px; line-height: auto;">Cyrile,</div><div id="bloop_customfont" style="font-family:Helvetica,Arial;font-size:13px; color: rgba(0,0,0,1.0); margin: 0px; line-height: auto;">See `?foverlaps` function from data.table package or `?findOverlaps` from GenomicRanges package. These implement algorithms specifically designed for operating on interval ranges efficiently.</div> <br> <div id="bloop_sign_1426528092358340096" class="bloop_sign"><div style="font-family:helvetica,arial;font-size:13px">-- <br>Arun</div></div> <br><p style="color:#000;">On 15 Mar 2015 at 22:41:18, jim holtman (<a href="mailto:jholtman@gmail.com">jholtman@gmail.com</a>) wrote:</p> <blockquote type="cite" class="clean_bq"><span><div><div></div><div>
<title></title>
<div dir="ltr">
<div>I was off by a factor of 10; I thought it said 200,000 but it
was only 20,000 so it only takes 10 seconds to solves</div>
<div><br></div>
<div>> n <- 20000<br>
> mi <- 4500<br>
> start <- sample(n * 10, n) # start times<br>
> int <- sample(1000, n, TRUE) # interval between start
and end<br>
> genes <- data.frame(gene = paste0('gene', 1:n)<br>
+
, start = start<br>
+
, end = start + int<br>
+
, stringsAsFactors = FALSE<br>
+
)<br>
> miRNA <- data.frame(name = paste0('mi', 1:mi)<br>
+
, pos = sample(n * 9, mi)<br>
+
, stringsAsFactors = FALSE<br>
+
)<br>
> require(sqldf)<br>
><br>
> system.time({<br>
+ matches <- sqldf("<br>
+ select m.*, g.*<br>
+ from miRNA as m<br>
+ join genes as g<br>
+ on m.pos between
g.start and g.end<br>
+ ")<br>
+ }) <br>
user system elapsed<br>
10.91 0.02 10.96<br>
> head(matches, 10)<br>
name pos gene
start end<br>
1 mi1 3825 gene200 3634 4134<br>
2 mi1 3825 gene385 3616 4241<br>
3 mi1 3825 gene410 3492 4089<br>
4 mi1 3825 gene1172 3707 3847<br>
5 mi1 3825 gene1228 3825 3919<br>
6 mi1 3825 gene1726 3586 4552<br>
7 mi1 3825 gene1859 3633 4163<br>
8 mi1 3825 gene1869 3269 4138<br>
9 mi1 3825 gene2061 3812 4094<br>
10 mi1 3825 gene2248 3225 3939<br></div>
<div class="gmail_extra">> str(matches)<br>
'data.frame': 224028 obs. of 5 variables:<br>
$ name : chr "mi1" "mi1" "mi1" "mi1" ...<br>
$ pos : int 3825 3825 3825 3825 3825 3825 3825
3825 3825 3825 ...<br>
$ gene : chr "gene200" "gene385" "gene410" "gene1172"
...<br>
$ start: int 3634 3616 3492 3707 3825 3586 3633 3269
3812 3225 ...<br>
$ end : int 4134 4241 4089 3847 3919 4552 4163
4138 4094 3939 ...<br>
<br clear="all">
<div>
<div class="gmail_signature"><br>
Jim Holtman<br>
Data Munger Guru<br>
<br>
What is the problem that you are trying to solve?<br>
Tell me what you want to do, not how you want to do it.</div>
</div>
<br>
<div class="gmail_quote">On Sun, Mar 15, 2015 at 3:41 PM,
Papysounours <span dir="ltr"><<a href="mailto:Cyrille.laurent.sage@gmail.com" target="_blank">Cyrille.laurent.sage@gmail.com</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;padding-left:1ex;border-left-color:rgb(204,204,204);border-left-width:1px;border-left-style:solid">
Hi<br>
<br>
I am just starting R programming because i need it to analyse new
sequencing<br>
data. I got two list of data (excel table) one is gene list with
chromosomal<br>
position (like start:123456 end:124567), the other is miRNA list
with only<br>
one position (like 123789).<br>
In the first liste i have around 20000 row (meaning 20000
gene name to<br>
compare to) and for the second around 4500 row (4500 miRNA).<br>
I want to compare the position of each individual miRNA position
(<br>
genestart<=miRNA<=geneend ) to the entire list of gene in
order to get in a<br>
new table the name of the miRNA (first colum of the miRNA list) and
the name<br>
of the gene (first colum of the gene list) related to the
miRNA.<br>
Hope thisis not to much to ask.<br>
Papy<br>
<br>
<br>
<br>
--<br>
View this message in context: <a href="http://r.789695.n4.nabble.com/R-beginner-tp4704684.html" target="_blank">http://r.789695.n4.nabble.com/R-beginner-tp4704684.html</a><br>
Sent from the datatable-help mailing list archive at
Nabble.com.<br>
_______________________________________________<br>
datatable-help mailing list<br>
<a href="mailto:datatable-help@lists.r-forge.r-project.org">datatable-help@lists.r-forge.r-project.org</a><br>
<a href="https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help" target="_blank">https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help</a><br>
</blockquote>
</div>
<br></div>
</div>
_______________________________________________
<br>datatable-help mailing list
<br>datatable-help@lists.r-forge.r-project.org
<br>https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help</div></div></span></blockquote></body></html>