<html><head></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space; "><p>Okay here we go, once again. A much more detailed look:</p>

<p>A) Let’s start with <code>datat.able</code>:</p>

<pre><code>require(data.table) ## 1.9.3 commit 1263
dt <- data.table(x=1:1e7, y=1:1e7)

## with optimisation - the names are removed and added at the end
system.time(dt[, list(z=y), by=x])
#   user  system elapsed  
#  7.481   0.253   8.017  
   
## without optimisation + no external function still.
system.time(dt[, {list(z=y)}, by=x])
#   user  system elapsed  
#  9.913   0.076  10.408  

## without optimisation + external function with unnamed list
foo <- function(x) list(x)
system.time(dt[, foo(y), by=x])
#   user  system elapsed  
# 13.742   0.139  14.320  
  
## without optimisation + external function with named list
foo <- function(x) list(z=x)
system.time(dt[, foo(y), by=x])
#   user  system elapsed  
# 15.333   0.181  15.911  
</code></pre>

<p>Summary: The difference between evaluating a named and unnamed list seems to be around 2.4 seconds without function and about 1.6 seconds with functions..</p>

<p>Using functions to evaluate is what seems to bring the speedup to ~2x when compared to list with no names.</p>

<hr>

<p>B) Let’s verify it by comparing the same as above separately without any other factors, in a separate C file:</p>

<pre><code>// test.c
#include <R.h>
#define USE_RINTERNALS
#include <Rinternals.h>
#include <Rdefines.h>

// test function - no checks!
SEXP test(SEXP expr, SEXP env, SEXP n)
{
    R_len_t i;
    SEXP ans;
    for (i=0; i<INTEGER(n)[0]; i++) {
        ans = eval(expr, env);
    }
    return(ans);
}
</code></pre>

<p>Save it as test.c and then from command line:</p>

<pre><code>## From command line:
R CMD SHLIB -o test.so test.c
</code></pre>

<p>Now, from R-session:</p>

<pre><code>## From R session
dyn.load("~/Downloads/test.so")
env <- new.env()
env$y = 1L

expr = quote(list(z=y))
system.time(.Call("test", expr, env, 1e7L))
#   user  system elapsed  
#  5.249   0.015   5.343  

expr = quote(list(y))
system.time(.Call("test", expr, env, 1e7L))
#   user  system elapsed  
#  4.030   0.010   4.054  

foo <- function(y) list(z=y)
expr = quote(foo(y))
system.time(.Call("test", expr, env, 1e7L))
#   user  system elapsed  
# 11.653   0.021  11.745  

foo <- function(y) list(y)
expr = quote(foo(y))
system.time(.Call("test", expr, env, 1e7L))
#   user  system elapsed  
# 10.064   0.022  10.224  
</code></pre>

<p>Summary: More or less the same as (A), but slightly better. The difference is always around 1.3 seconds on 1e7 groups, function or no function. <em>But still evaluating functions take longer.</em></p>

<p>@Matt, thoughts? Because turning verbose on with <code>options(datatable.verbose=TRUE)</code> states that using named lists is terribly inefficient.. which seems to be not so much the case here..?</p>

<p>C) Let’s now add a <code>match</code> statement and test with 1e7 groups:</p>

<pre><code>// test.c
#include <R.h>
#define USE_RINTERNALS
#include <Rinternals.h>
#include <Rdefines.h>

// test function - no checks!
SEXP test(SEXP expr, SEXP env, SEXP n)
{
    R_len_t i;
    SEXP tmp, nm, ans, j;
    j = allocVector(INTSXP, 1);
    ans = eval(expr, env);
    nm = getAttrib(ans, R_NamesSymbol);
    for (i=0; i<INTEGER(n)[0]; i++) {
        ans = eval(expr, env);
        tmp = getAttrib(ans, R_NamesSymbol);
        j = match(tmp, nm, 0);
    }
    return(j);
}
</code></pre>

<p>Running it only on expressions which return named list:</p>

<pre><code>dyn.load("~/Downloads/test.so")
env <- new.env()
env$y = 1L

expr = quote(list(z=y))
system.time(.Call("test", expr, env, 1e7L))
#   user  system elapsed  
# 15.444   0.042  15.546  

foo <- function(y) list(z=y)
expr = quote(foo(y))
system.time(.Call("test", expr, env, 1e7L))
#   user  system elapsed  
# 26.969   0.062  27.199  
</code></pre>

<p>So, when we have to check for names - note that this still only matches for names, not checks if they’re in the right order yet etc.. It takes:</p>

<p>15.5 seconds instead of 5.3 seconds in case of named list
27.2 seconds instead of 11.7 seconds in case of a function that returns a named list.</p>

<p>If we decide to avoid the call to <code>match</code> 1e7 times (here), then we’ve to collect all the names and the results first for each group and then match once and then rearrange the results, which would be very memory inefficient, I’d think.</p>

<p>Perhaps Matt’ll have a better outlook from these results..</p>

<p><style>body{font-family:Helvetica,Arial;font-size:13px}</style><style>body {
        font-family: "Helvetica Neue", Helvetica, Arial, sans-serif;
        padding:1em;
        margin:auto;
        background:#fefefe;
}

h1, h2, h3, h4, h5, h6 {
        font-weight: bold;
}

h1 {
        color: #000000;
        font-size: 28pt;
}

h2 {
        border-bottom: 1px solid #CCCCCC;
        color: #000000;
        font-size: 24px;
}

h3 {
        font-size: 18px;
}

h4 {
        font-size: 16px;
}

h5 {
        font-size: 14px;
}

h6 {
        color: #777777;
        background-color: inherit;
        font-size: 14px;
}

hr {
        height: 0.2em;
        border: 0;
        color: #CCCCCC;
        background-color: #CCCCCC;
}

p, blockquote, ul, ol, dl, li, table, pre {
        margin: 15px 0;
}

a, a:visited {
        color: #4183C4;
        background-color: inherit;
        text-decoration: none;
}

#message {
        border-radius: 6px;
        border: 1px solid #ccc;
        display:block;
        width:100%;
        height:60px;
        margin:6px 0px;
}

button, #ws {
        font-size: 12 pt;
        padding: 4px 6px;
        border-radius: 5px;
        border: 1px solid #bbb;
        background-color: #eee;
}

code, pre, #ws, #message {
        font-family: Monaco;
        font-size: 10pt;
        border-radius: 3px;
        background-color: #F8F8F8;
        color: inherit;
}

code {
        border: 1px solid #EAEAEA;
        margin: 0 2px;
        padding: 0 5px;
}

pre {
        border: 1px solid #CCCCCC;
        overflow: auto;
        padding: 4px 8px;
}

pre > code {
        border: 0;
        margin: 0;
        padding: 0;
}

#ws { background-color: #f8f8f8; }


table {
border-collapse: collapse;  
font-family: Helvetica, arial, freesans, clean, sans-serif;  
color: rgb(51, 51, 51);  
font-size: 15px; line-height: 25px;
padding: 0; }

table tr {
border-top: 1px solid #cccccc;
background-color: white;
margin: 0;
padding: 0; }
     
table tr:nth-child(2n) {
background-color: #f8f8f8; }

table tr th {
font-weight: bold;
border: 1px solid #cccccc;
margin: 0;
padding: 6px 13px; }

table tr td {
border: 1px solid #cccccc;
margin: 0;
padding: 6px 13px; }

table tr th :first-child, table tr td :first-child {
margin-top: 0; }

table tr th :last-child, table tr td :last-child {
margin-bottom: 0; }




.send { color:#77bb77; }
.server { color:#7799bb; }
.error { color:#AA0000; }</style></p><div id="bloop_customfont" style="font-family:Helvetica,Arial;font-size:13px; color: rgba(0,0,0,1.0); margin: 0px; line-height: auto;"><br></div> <div id="bloop_sign_1397671402945370880" class="bloop_sign"><div style="font-family:helvetica,arial;font-size:13px">Arun</div></div> <div style="color:black"><br>From: <span style="color:black">Arunkumar Srinivasan</span> <a href="mailto:aragorn168b@gmail.com">aragorn168b@gmail.com</a><br>Reply: <span style="color:black">Arunkumar Srinivasan</span> <a href="mailto:aragorn168b@gmail.com">aragorn168b@gmail.com</a><br>Date: <span style="color:black">April 16, 2014 at 6:41:50 PM</span><br>To: <span style="color:black">Clayton Stanley</span> <a href="mailto:cstanley@cstanley.no-ip.biz">cstanley@cstanley.no-ip.biz</a>, <span style="color:black">datatable-help@lists.r-forge.r-project.org</span> <a href="mailto:datatable-help@lists.r-forge.r-project.org">datatable-help@lists.r-forge.r-project.org</a><br>Subject: <span style="color:black"> Re: [datatable-help] data.table and aggregating out-of-order columns in result from by <br></span></div><br> <blockquote type="cite" class="clean_bq"><span><div style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;"><div></div><div>






<title></title>



<p>Clayton,</p>
<p>Thanks for posting it here. Here’s the first follow-up. Here’s
an example:</p>
<pre><code>require(data.table) ## 1.9.3 comm 1263
dt <- data.table(x=1:1e7, y=1:1e7)

## data.table optimisation removes names
system.time(ans1 <- dt[, list(z=y), by=x])

#   user  system elapsed   
#  7.193   0.275   7.859   
    
## data.table can't optimise to remove names
foo <- function(x) list(z=x)
system.time(ans2 <- dt[, foo(y), by=x])
#   user  system elapsed   
# 16.020   0.179  16.411   

> identical(ans1, ans2)
[1] TRUE
</code>
</pre>
<p>This is <em>without</em> checking for names, for each of the 1e7
groups.</p>
<div id="bloop_customfont" style="font-family:Helvetica,Arial;font-size:13px; color: rgba(0,0,0,1.0); margin: 0px; line-height: auto;">
<br></div>
<div id="bloop_sign_1397665611048812032" class="bloop_sign">
<div style="font-family:helvetica,arial;font-size:13px">Arun</div>
</div>
<div style="color:black"><br>
From: <span style="color:black">Clayton Stanley</span>
<a href="mailto:cstanley@cstanley.no-ip.biz">cstanley@cstanley.no-ip.biz</a><br>

Reply: <span style="color:black">Clayton Stanley</span>
<a href="mailto:cstanley@cstanley.no-ip.biz">cstanley@cstanley.no-ip.biz</a><br>

Date: <span style="color:black">April 16, 2014 at 6:23:50
PM</span><br>
To: <span style="color:black">datatable-help@lists.r-forge.r-project.org</span>
<a href="mailto:datatable-help@lists.r-forge.r-project.org">datatable-help@lists.r-forge.r-project.org</a><br>

Subject:  <span style="color:black">[datatable-help]
data.table and aggregating out-of-order columns in result from
by<br></span></div>
<br>
<blockquote type="cite" class="clean_bq">
<div>
<div>
<div dir="ltr">
<p style="margin:0px 0px 1em;padding:0px;border:0px;font-size:13.63636302947998px;vertical-align:baseline;clear:both;color:rgb(0,0,0);font-family:Arial,'Liberation Sans','DejaVu Sans',sans-serif;line-height:17.804800033569336px">
<span>Copied from this SO post: <a href="http://stackoverflow.com/questions/23097461">http://stackoverflow.com/questions/23097461</a></span></p>
<p style="margin:0px 0px 1em;padding:0px;border:0px;font-size:13.63636302947998px;vertical-align:baseline;clear:both;color:rgb(0,0,0);font-family:Arial,'Liberation Sans','DejaVu Sans',sans-serif;line-height:17.804800033569336px">
<span>Here's some interesting behavior that I noticed with
data.table 1.9.2</span></p>
<pre class="" style="margin-top:0px;margin-bottom:10px;padding:5px;border:0px;font-size:13.63636302947998px;vertical-align:baseline;background-color:rgb(238,238,238);font-family:Consolas,Menlo,Monaco,'Lucida Console','Liberation Mono','DejaVu Sans Mono','Bitstream Vera Sans Mono','Courier New',monospace,serif;overflow:auto;width:auto;max-height:600px;word-wrap:normal;color:rgb(0,0,0);line-height:17.804800033569336px"><span><code style="margin:0px;padding:0px;border:0px;vertical-align:baseline;font-family:Consolas,Menlo,Monaco,'Lucida Console','Liberation Mono','DejaVu Sans Mono','Bitstream Vera Sans Mono','Courier New',monospace,serif;white-space:inherit"><span class="" style="margin:0px;padding:0px;border:0px;vertical-align:baseline;background-color:transparent">></span><span class="" style="margin:0px;padding:0px;border:0px;vertical-align:baseline;background-color:transparent">   testFun </span><span class="" style="margin:0px;padding:0px;border:0px;vertical-align:baseline;background-color:transparent"><-</span><span class="" style="margin:0px;padding:0px;border:0px;vertical-align:baseline;background-color:transparent"> </span><span class="" style="margin:0px;padding:0px;border:0px;vertical-align:baseline;background-color:transparent;color:rgb(0,0,139)">function</span><span class="" style="margin:0px;padding:0px;border:0px;vertical-align:baseline;background-color:transparent">(</span><span class="" style="margin:0px;padding:0px;border:0px;vertical-align:baseline;background-color:transparent">val</span><span class="" style="margin:0px;padding:0px;border:0px;vertical-align:baseline;background-color:transparent">)</span><span class="" style="margin:0px;padding:0px;border:0px;vertical-align:baseline;background-color:transparent"> </span><span class="" style="margin:0px;padding:0px;border:0px;vertical-align:baseline;background-color:transparent">{</span><span class="" style="margin:0px;padding:0px;border:0px;vertical-align:baseline;background-color:transparent">
        </span><span class="" style="margin:0px;padding:0px;border:0px;vertical-align:baseline;background-color:transparent;color:rgb(0,0,139)">if</span><span class="" style="margin:0px;padding:0px;border:0px;vertical-align:baseline;background-color:transparent"> </span><span class="" style="margin:0px;padding:0px;border:0px;vertical-align:baseline;background-color:transparent">(</span><span class="" style="margin:0px;padding:0px;border:0px;vertical-align:baseline;background-color:transparent">val </span><span class="" style="margin:0px;padding:0px;border:0px;vertical-align:baseline;background-color:transparent">==</span><span class="" style="margin:0px;padding:0px;border:0px;vertical-align:baseline;background-color:transparent"> </span><span class="" style="margin:0px;padding:0px;border:0px;vertical-align:baseline;background-color:transparent;color:rgb(128,0,0)">'geteeee'</span><span class="" style="margin:0px;padding:0px;border:0px;vertical-align:baseline;background-color:transparent">)</span><span class="" style="margin:0px;padding:0px;border:0px;vertical-align:baseline;background-color:transparent"> </span><span class="" style="margin:0px;padding:0px;border:0px;vertical-align:baseline;background-color:transparent;color:rgb(0,0,139)">return</span><span class="" style="margin:0px;padding:0px;border:0px;vertical-align:baseline;background-color:transparent">(</span><span class="" style="margin:0px;padding:0px;border:0px;vertical-align:baseline;background-color:transparent">data</span><span class="" style="margin:0px;padding:0px;border:0px;vertical-align:baseline;background-color:transparent">.</span><span class="" style="margin:0px;padding:0px;border:0px;vertical-align:baseline;background-color:transparent">table</span><span class="" style="margin:0px;padding:0px;border:0px;vertical-align:baseline;background-color:transparent">(</span><span class="" style="margin:0px;padding:0px;border:0px;vertical-align:baseline;background-color:transparent">x</span><span class="" style="margin:0px;padding:0px;border:0px;vertical-align:baseline;background-color:transparent">=</span><span class="" style="margin:0px;padding:0px;border:0px;vertical-align:baseline;background-color:transparent;color:rgb(128,0,0)">4</span><span class="" style="margin:0px;padding:0px;border:0px;vertical-align:baseline;background-color:transparent">,</span><span class="" style="margin:0px;padding:0px;border:0px;vertical-align:baseline;background-color:transparent">y</span><span class="" style="margin:0px;padding:0px;border:0px;vertical-align:baseline;background-color:transparent">=</span><span class="" style="margin:0px;padding:0px;border:0px;vertical-align:baseline;background-color:transparent;color:rgb(128,0,0)">3</span><span class="" style="margin:0px;padding:0px;border:0px;vertical-align:baseline;background-color:transparent">))</span><span class="" style="margin:0px;padding:0px;border:0px;vertical-align:baseline;background-color:transparent">
        </span><span class="" style="margin:0px;padding:0px;border:0px;vertical-align:baseline;background-color:transparent;color:rgb(0,0,139)">if</span><span class="" style="margin:0px;padding:0px;border:0px;vertical-align:baseline;background-color:transparent"> </span><span class="" style="margin:0px;padding:0px;border:0px;vertical-align:baseline;background-color:transparent">(</span><span class="" style="margin:0px;padding:0px;border:0px;vertical-align:baseline;background-color:transparent">val </span><span class="" style="margin:0px;padding:0px;border:0px;vertical-align:baseline;background-color:transparent">==</span><span class="" style="margin:0px;padding:0px;border:0px;vertical-align:baseline;background-color:transparent"> </span><span class="" style="margin:0px;padding:0px;border:0px;vertical-align:baseline;background-color:transparent;color:rgb(128,0,0)">'get'</span><span class="" style="margin:0px;padding:0px;border:0px;vertical-align:baseline;background-color:transparent">)</span><span class="" style="margin:0px;padding:0px;border:0px;vertical-align:baseline;background-color:transparent"> </span><span class="" style="margin:0px;padding:0px;border:0px;vertical-align:baseline;background-color:transparent;color:rgb(0,0,139)">return</span><span class="" style="margin:0px;padding:0px;border:0px;vertical-align:baseline;background-color:transparent">(</span><span class="" style="margin:0px;padding:0px;border:0px;vertical-align:baseline;background-color:transparent">data</span><span class="" style="margin:0px;padding:0px;border:0px;vertical-align:baseline;background-color:transparent">.</span><span class="" style="margin:0px;padding:0px;border:0px;vertical-align:baseline;background-color:transparent">table</span><span class="" style="margin:0px;padding:0px;border:0px;vertical-align:baseline;background-color:transparent">(</span><span class="" style="margin:0px;padding:0px;border:0px;vertical-align:baseline;background-color:transparent">y</span><span class="" style="margin:0px;padding:0px;border:0px;vertical-align:baseline;background-color:transparent">=</span><span class="" style="margin:0px;padding:0px;border:0px;vertical-align:baseline;background-color:transparent;color:rgb(128,0,0)">3</span><span class="" style="margin:0px;padding:0px;border:0px;vertical-align:baseline;background-color:transparent">,</span><span class="" style="margin:0px;padding:0px;border:0px;vertical-align:baseline;background-color:transparent">x</span><span class="" style="margin:0px;padding:0px;border:0px;vertical-align:baseline;background-color:transparent">=</span><span class="" style="margin:0px;padding:0px;border:0px;vertical-align:baseline;background-color:transparent;color:rgb(128,0,0)">4</span><span class="" style="margin:0px;padding:0px;border:0px;vertical-align:baseline;background-color:transparent">))</span><span class="" style="margin:0px;padding:0px;border:0px;vertical-align:baseline;background-color:transparent">
    </span><span class="" style="margin:0px;padding:0px;border:0px;vertical-align:baseline;background-color:transparent">}</span><span class="" style="margin:0px;padding:0px;border:0px;vertical-align:baseline;background-color:transparent">
</span><span class="" style="margin:0px;padding:0px;border:0px;vertical-align:baseline;background-color:transparent">></span><span class="" style="margin:0px;padding:0px;border:0px;vertical-align:baseline;background-color:transparent">   tbl </span><span class="" style="margin:0px;padding:0px;border:0px;vertical-align:baseline;background-color:transparent">=</span><span class="" style="margin:0px;padding:0px;border:0px;vertical-align:baseline;background-color:transparent"> data</span><span class="" style="margin:0px;padding:0px;border:0px;vertical-align:baseline;background-color:transparent">.</span><span class="" style="margin:0px;padding:0px;border:0px;vertical-align:baseline;background-color:transparent">table</span><span class="" style="margin:0px;padding:0px;border:0px;vertical-align:baseline;background-color:transparent">(</span><span class="" style="margin:0px;padding:0px;border:0px;vertical-align:baseline;background-color:transparent">val</span><span class="" style="margin:0px;padding:0px;border:0px;vertical-align:baseline;background-color:transparent">=</span><span class="" style="margin:0px;padding:0px;border:0px;vertical-align:baseline;background-color:transparent">c</span><span class="" style="margin:0px;padding:0px;border:0px;vertical-align:baseline;background-color:transparent">(</span><span class="" style="margin:0px;padding:0px;border:0px;vertical-align:baseline;background-color:transparent;color:rgb(128,0,0)">'geteeee'</span><span class="" style="margin:0px;padding:0px;border:0px;vertical-align:baseline;background-color:transparent">,</span><span class="" style="margin:0px;padding:0px;border:0px;vertical-align:baseline;background-color:transparent"> </span><span class="" style="margin:0px;padding:0px;border:0px;vertical-align:baseline;background-color:transparent;color:rgb(128,0,0)">'get'</span><span class="" style="margin:0px;padding:0px;border:0px;vertical-align:baseline;background-color:transparent">))</span><span class="" style="margin:0px;padding:0px;border:0px;vertical-align:baseline;background-color:transparent">
</span><span class="" style="margin:0px;padding:0px;border:0px;vertical-align:baseline;background-color:transparent">></span><span class="" style="margin:0px;padding:0px;border:0px;vertical-align:baseline;background-color:transparent">   tbl</span><span class="" style="margin:0px;padding:0px;border:0px;vertical-align:baseline;background-color:transparent">[,</span><span class="" style="margin:0px;padding:0px;border:0px;vertical-align:baseline;background-color:transparent"> testFun</span><span class="" style="margin:0px;padding:0px;border:0px;vertical-align:baseline;background-color:transparent">(</span><span class="" style="margin:0px;padding:0px;border:0px;vertical-align:baseline;background-color:transparent">val</span><span class="" style="margin:0px;padding:0px;border:0px;vertical-align:baseline;background-color:transparent">),</span><span class="" style="margin:0px;padding:0px;border:0px;vertical-align:baseline;background-color:transparent"> </span><span class="" style="margin:0px;padding:0px;border:0px;vertical-align:baseline;background-color:transparent;color:rgb(0,0,139)">by</span><span class="" style="margin:0px;padding:0px;border:0px;vertical-align:baseline;background-color:transparent">=</span><span class="" style="margin:0px;padding:0px;border:0px;vertical-align:baseline;background-color:transparent">val</span><span class="" style="margin:0px;padding:0px;border:0px;vertical-align:baseline;background-color:transparent">]</span><span class="" style="margin:0px;padding:0px;border:0px;vertical-align:baseline;background-color:transparent">
       val x y
</span><span class="" style="margin:0px;padding:0px;border:0px;vertical-align:baseline;background-color:transparent;color:rgb(128,0,0)">1</span><span class="" style="margin:0px;padding:0px;border:0px;vertical-align:baseline;background-color:transparent">:</span><span class="" style="margin:0px;padding:0px;border:0px;vertical-align:baseline;background-color:transparent"> geteeee </span><span class="" style="margin:0px;padding:0px;border:0px;vertical-align:baseline;background-color:transparent;color:rgb(128,0,0)">4</span><span class="" style="margin:0px;padding:0px;border:0px;vertical-align:baseline;background-color:transparent"> </span><span class="" style="margin:0px;padding:0px;border:0px;vertical-align:baseline;background-color:transparent;color:rgb(128,0,0)">3</span><span class="" style="margin:0px;padding:0px;border:0px;vertical-align:baseline;background-color:transparent">
</span><span class="" style="margin:0px;padding:0px;border:0px;vertical-align:baseline;background-color:transparent;color:rgb(128,0,0)">2</span><span class="" style="margin:0px;padding:0px;border:0px;vertical-align:baseline;background-color:transparent">:</span><span class="" style="margin:0px;padding:0px;border:0px;vertical-align:baseline;background-color:transparent">     </span><span class="" style="margin:0px;padding:0px;border:0px;vertical-align:baseline;background-color:transparent;color:rgb(0,0,139)">get</span><span class="" style="margin:0px;padding:0px;border:0px;vertical-align:baseline;background-color:transparent"> </span><span class="" style="margin:0px;padding:0px;border:0px;vertical-align:baseline;background-color:transparent;color:rgb(128,0,0)">3</span><span class="" style="margin:0px;padding:0px;border:0px;vertical-align:baseline;background-color:transparent"> </span><span class="" style="margin:0px;padding:0px;border:0px;vertical-align:baseline;background-color:transparent;color:rgb(128,0,0)">4</span><span class="" style="margin:0px;padding:0px;border:0px;vertical-align:baseline;background-color:transparent">
</span><span class="" style="margin:0px;padding:0px;border:0px;vertical-align:baseline;background-color:transparent">></span><span class="" style="margin:0px;padding:0px;border:0px;vertical-align:baseline;background-color:transparent"> </span></code>
</span>
</pre>
<p style="margin:0px 0px 1em;padding:0px;border:0px;font-size:13.63636302947998px;vertical-align:baseline;clear:both;color:rgb(0,0,0);font-family:Arial,'Liberation Sans','DejaVu Sans',sans-serif;line-height:17.804800033569336px">
When the column order of the data tables returned from each call to
testFun are mixed (but have the same name and number of columns),
data.table silently binds the tables together without taking into
account that they are out of order. This was probably done for
speed, but I found the behavior quite unexpected, and would have
appreciated at least a warning.</p>
<p style="margin:0px 0px 1em;padding:0px;border:0px;font-size:13.63636302947998px;vertical-align:baseline;clear:both;color:rgb(0,0,0);font-family:Arial,'Liberation Sans','DejaVu Sans',sans-serif;line-height:17.804800033569336px">
Is there a way that I can get data.table to warn or error when this
situation happens?</p>
<p style="margin:0px 0px 1em;padding:0px;border:0px;font-size:13.63636302947998px;vertical-align:baseline;clear:both;color:rgb(0,0,0);font-family:Arial,'Liberation Sans','DejaVu Sans',sans-serif;line-height:17.804800033569336px">
This happened in my analysis code and caused values for two DVs to
be intermixed. The reason why it happened is that in the 'testFun'
there is a branch and the returned data table is created within
both sides of the branch. The branch is necessary to handle the
case where the data table used to create the final returned data
table is empty. So on one side of that branch I basically create an
empty data table with the correct columns, and on the other side
the data table is created from the first. The point is that the
column order for the data tables returned from each side of the
branch are different. Now this is certainly a bug on my part in
'testFun'. However I could have caught the issue much earlier if I
had received a warning from data.table when the by operation
completed and the resulting tables were bound together. </p>
<p style="margin:0px 0px 1em;padding:0px;border:0px;font-size:13.63636302947998px;vertical-align:baseline;clear:both;color:rgb(0,0,0);font-family:Arial,'Liberation Sans','DejaVu Sans',sans-serif;line-height:17.804800033569336px">
Also since there isn't a check for column order, it does make me
worry that there are other places in my analysis code where the
same thing could be happening. What would be ideal is if there was
some way for me to tell if that is the case. Perhaps a warning,
temporarily increasing a 'safety' level as an options call, etc.
Usually data.table is great at warning me when things are not quite
right, so I was surprised when I noticed the current behavior. I
understand that this was done for speed. So maybe temporarily
increasing a 'safety' level is a way to keep things fast by default
and have additional checks (for a speed cost) when the user wants
them? This sort of mimics how compiler optimization declarations
are done in common lisp.</p>
<p style="margin:0px 0px 1em;padding:0px;border:0px;font-size:13.63636302947998px;vertical-align:baseline;clear:both;color:rgb(0,0,0);font-family:Arial,'Liberation Sans','DejaVu Sans',sans-serif;line-height:17.804800033569336px">
-Clayton</p>
<p style="margin:0px 0px 1em;padding:0px;border:0px;font-size:13.63636302947998px;vertical-align:baseline;clear:both;color:rgb(0,0,0);font-family:Arial,'Liberation Sans','DejaVu Sans',sans-serif;line-height:17.804800033569336px">
<br></p>
<p style="margin:0px 0px 1em;padding:0px;border:0px;font-size:13.63636302947998px;vertical-align:baseline;clear:both;color:rgb(0,0,0);font-family:Arial,'Liberation Sans','DejaVu Sans',sans-serif;line-height:17.804800033569336px">
<br></p>
<p style="margin:0px 0px 1em;padding:0px;border:0px;font-size:13.63636302947998px;vertical-align:baseline;clear:both;color:rgb(0,0,0);font-family:Arial,'Liberation Sans','DejaVu Sans',sans-serif;line-height:17.804800033569336px">
<br></p>
<p style="margin:0px 0px 1em;padding:0px;border:0px;font-size:13.63636302947998px;vertical-align:baseline;clear:both;color:rgb(0,0,0);font-family:Arial,'Liberation Sans','DejaVu Sans',sans-serif;line-height:17.804800033569336px">
<br></p>
<p style="margin:0px 0px 1em;padding:0px;border:0px;font-size:13.63636302947998px;vertical-align:baseline;clear:both;color:rgb(0,0,0);font-family:Arial,'Liberation Sans','DejaVu Sans',sans-serif;line-height:17.804800033569336px">
<br></p>
</div>
_______________________________________________<br>
datatable-help mailing list<br>
datatable-help@lists.r-forge.r-project.org<br>
https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help</div>
</div>
</blockquote>


</div></div></span></blockquote><p></p></body></html>