<html><head><style>body{font-family:Helvetica,Arial;font-size:13px}</style></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;"><div id="bloop_customfont" style="font-family:Helvetica,Arial;font-size:13px; color: rgba(0,0,0,1.0); margin: 0px; line-height: auto;">Cyrile,</div><div id="bloop_customfont" style="font-family:Helvetica,Arial;font-size:13px; color: rgba(0,0,0,1.0); margin: 0px; line-height: auto;">See `?foverlaps` function from data.table package or `?findOverlaps` from GenomicRanges package. These implement algorithms specifically designed for operating on interval ranges efficiently.</div> <br> <div id="bloop_sign_1426528092358340096" class="bloop_sign"><div style="font-family:helvetica,arial;font-size:13px">-- <br>Arun</div></div> <br><p style="color:#000;">On 15 Mar 2015 at 22:41:18, jim holtman (<a href="mailto:jholtman@gmail.com">jholtman@gmail.com</a>) wrote:</p> <blockquote type="cite" class="clean_bq"><span><div><div></div><div>


<title></title>


<div dir="ltr">

<div>I was off by a factor of 10; I thought it said 200,000 but it

was only 20,000 so it only takes 10 seconds to solves</div>

<div><br></div>

<div>> n <- 20000<br>

> mi <- 4500<br>

> start <- sample(n * 10, n)  # start times<br>

> int <- sample(1000, n, TRUE)  # interval between start

and end<br>

> genes <- data.frame(gene = paste0('gene', 1:n)<br>

+                

, start = start<br>

+                

, end = start + int<br>

+                

, stringsAsFactors = FALSE<br>

+                

)<br>

> miRNA <- data.frame(name = paste0('mi', 1:mi)<br>

+                

, pos = sample(n * 9, mi)<br>

+                

, stringsAsFactors = FALSE<br>

+                

)<br>

> require(sqldf)<br>

><br>

> system.time({<br>

+ matches <- sqldf("<br>

+     select m.*, g.*<br>

+     from miRNA as m<br>

+     join genes as g<br>

+         on m.pos between

g.start and g.end<br>

+ ")<br>

+ })       <br>

   user  system elapsed<br>

  10.91    0.02   10.96<br>

> head(matches, 10)<br>

   name  pos     gene

start  end<br>

1   mi1 3825  gene200  3634 4134<br>

2   mi1 3825  gene385  3616 4241<br>

3   mi1 3825  gene410  3492 4089<br>

4   mi1 3825 gene1172  3707 3847<br>

5   mi1 3825 gene1228  3825 3919<br>

6   mi1 3825 gene1726  3586 4552<br>

7   mi1 3825 gene1859  3633 4163<br>

8   mi1 3825 gene1869  3269 4138<br>

9   mi1 3825 gene2061  3812 4094<br>

10  mi1 3825 gene2248  3225 3939<br></div>

<div class="gmail_extra">> str(matches)<br>

'data.frame':   224028 obs. of  5 variables:<br>

 $ name : chr  "mi1" "mi1" "mi1" "mi1" ...<br>

 $ pos  : int  3825 3825 3825 3825 3825 3825 3825

3825 3825 3825 ...<br>

 $ gene : chr  "gene200" "gene385" "gene410" "gene1172"

...<br>

 $ start: int  3634 3616 3492 3707 3825 3586 3633 3269

3812 3225 ...<br>

 $ end  : int  4134 4241 4089 3847 3919 4552 4163

4138 4094 3939 ...<br>

<br clear="all">

<div>

<div class="gmail_signature"><br>

Jim Holtman<br>

Data Munger Guru<br>

 <br>

What is the problem that you are trying to solve?<br>

Tell me what you want to do, not how you want to do it.</div>

</div>

<br>

<div class="gmail_quote">On Sun, Mar 15, 2015 at 3:41 PM,

Papysounours <span dir="ltr"><<a href="mailto:Cyrille.laurent.sage@gmail.com" target="_blank">Cyrille.laurent.sage@gmail.com</a>></span> wrote:<br>

<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;padding-left:1ex;border-left-color:rgb(204,204,204);border-left-width:1px;border-left-style:solid">

Hi<br>

<br>

I am just starting R programming because i need it to analyse new

sequencing<br>

data. I got two list of data (excel table) one is gene list with

chromosomal<br>

position (like start:123456 end:124567), the other is miRNA list

with only<br>

one position (like 123789).<br>

 In the first liste i have around 20000 row (meaning 20000

gene name to<br>

compare to) and for the second around 4500 row (4500 miRNA).<br>

I want to compare the position of each individual miRNA position

(<br>

genestart<=miRNA<=geneend ) to the entire list of gene in

order to get in a<br>

new table the name of the miRNA (first colum of the miRNA list) and

the name<br>

of the gene  (first colum of the gene list) related to the

miRNA.<br>

Hope thisis not to much to ask.<br>

Papy<br>

<br>

<br>

<br>

--<br>

View this message in context: <a href="http://r.789695.n4.nabble.com/R-beginner-tp4704684.html" target="_blank">http://r.789695.n4.nabble.com/R-beginner-tp4704684.html</a><br>


Sent from the datatable-help mailing list archive at

Nabble.com.<br>

_______________________________________________<br>

datatable-help mailing list<br>

<a href="mailto:datatable-help@lists.r-forge.r-project.org">datatable-help@lists.r-forge.r-project.org</a><br>


<a href="https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help" target="_blank">https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help</a><br>

</blockquote>

</div>

<br></div>

</div>


_______________________________________________

<br>datatable-help mailing list

<br>datatable-help@lists.r-forge.r-project.org

<br>https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help</div></div></span></blockquote></body></html>