<div dir="ltr"><div>You didn't provide any test data, so I made some up with the sizes you gave.  This uses the 'sqldf' package and took about 2 minutes to come up with the matches.</div><div><br></div><div>> n <- 200000<br>> mi <- 4500<br>> start <- sample(n * 10, n)  # start times<br>> int <- sample(1000, n, TRUE)  # interval between start and end<br>> genes <- data.frame(gene = paste0('gene', 1:n)<br>+                 , start = start<br>+                 , end = start + int<br>+                 , stringsAsFactors = FALSE<br>+                 )<br>> miRNA <- data.frame(name = paste0('mi', 1:mi)<br>+                 , pos = sample(n * 9, mi)<br>+                 , stringsAsFactors = FALSE<br>+                 )<br>> require(sqldf)<br>Loading required package: sqldf<br>Loading required package: gsubfn<br>Loading required package: proto<br>Loading required package: RSQLite<br>Loading required package: DBI<br>> matches <- sqldf("<br>+     select m.*, g.*<br>+     from miRNA as m<br>+     join genes as g<br>+         on m.pos between g.start and g.end<br>+ ")<br>Loading required package: tcltk<br>>         <br>> str(matches)<br>'data.frame':   225045 obs. of  5 variables:<br> $ name : chr  "mi1" "mi1" "mi1" "mi1" ...<br> $ pos  : int  279341 279341 279341 279341 279341 279341 279341 279341 279341 279341 ...<br> $ gene : chr  "gene3133" "gene14326" "gene14997" "gene17652" ...<br> $ start: int  279000 278623 279157 279296 278379 279055 279180 279273 278938 278960 ...<br> $ end  : int  279924 279444 280150 279930 279347 279861 279782 280268 279791 279796 ...<br>> head(matches)<br>  name    pos      gene  start    end<br>1  mi1 279341  gene3133 279000 279924<br>2  mi1 279341 gene14326 278623 279444<br>3  mi1 279341 gene14997 279157 280150<br>4  mi1 279341 gene17652 279296 279930<br>5  mi1 279341 gene21208 278379 279347<br>6  mi1 279341 gene30889 279055 279861</div><div><br></div></div><div class="gmail_extra"><br clear="all"><div><div class="gmail_signature"><br>Jim Holtman<br>Data Munger Guru<br> <br>What is the problem that you are trying to solve?<br>Tell me what you want to do, not how you want to do it.</div></div>

<br><div class="gmail_quote">On Sun, Mar 15, 2015 at 3:41 PM, Papysounours <span dir="ltr"><<a href="mailto:Cyrille.laurent.sage@gmail.com" target="_blank">Cyrille.laurent.sage@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Hi<br>

<br>

I am just starting R programming because i need it to analyse new sequencing<br>

data. I got two list of data (excel table) one is gene list with chromosomal<br>

position (like start:123456 end:124567), the other is miRNA list with only<br>

one position (like 123789).<br>

 In the first liste i have around 20000 row (meaning 20000 gene name to<br>

compare to) and for the second around 4500 row (4500 miRNA).<br>

I want to compare the position of each individual miRNA position (<br>

genestart<=miRNA<=geneend ) to the entire list of gene in order to get in a<br>

new table the name of the miRNA (first colum of the miRNA list) and the name<br>

of the gene  (first colum of the gene list) related to the miRNA.<br>

Hope thisis not to much to ask.<br>

Papy<br>

<br>

<br>

<br>

--<br>

View this message in context: <a href="http://r.789695.n4.nabble.com/R-beginner-tp4704684.html" target="_blank">http://r.789695.n4.nabble.com/R-beginner-tp4704684.html</a><br>

Sent from the datatable-help mailing list archive at Nabble.com.<br>

_______________________________________________<br>

datatable-help mailing list<br>

<a href="mailto:datatable-help@lists.r-forge.r-project.org">datatable-help@lists.r-forge.r-project.org</a><br>

<a href="https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help" target="_blank">https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help</a><br>

</blockquote></div><br></div>