[datatable-help] v1.6.3 has been submitted to CRAN

Matthew Dowle mdowle at mdowle.plus.com
Thu Aug 4 04:27:33 CEST 2011


NEW FEATURES
        
    o   Ad hoc grouping now returns results in the same order each 
        group first appears in the table, rather than sorting the
        groups. Thanks to Steve Lianoglou for highlighting. The order
        of the rows within each group always has and always will be 
        preserved. For larger datasets a 'keyed by' is still faster;
        e.g., by=key(DT).
        
    o   The 'key' argument of data.table() now accepts a vector of
        column names in addition to a single comma separated string
        of column names, for consistency. Thanks to Steve Lianoglou
        for highlighting.
        
    o   A new argument '.SDcols' has been added to [.data.table. This
        may be character column names or numeric positions and
        specifies the columns of x included in .SD. This is useful
        for speed when applying a function through a subset of
        (possibly very many) columns; e.g.,
            DT[,lapply(.SD,sum),by="x,y",.SDcols=301:350]

    o   as(character, "IDate") and as(character, "ITime") coercion
        functions have been added. Enables the user to declaring 
        colClasses as "IDate" and "ITime" in the various read.table
        (and sister) functions. Thanks to Chris Neff for the suggestion.
        
    o   DT[i,j]<-value is now handled by data.table in C rather
        than falling through to data.frame methods, FR#200. Thanks to
        Ivo Welch for raising speed issues on r-devel, to Simon Urbanek
        for the suggestion, and Luke Tierney and Simon for information
        on R internals.

        [<- syntax still incurs one working copy of the whole
        table (as of R 2.13.1) due to R's [<- dispatch mechanism
        copying to `*tmp*`, so, for ultimate speed and brevity,
        the operator := may now be used in j as follows.
        
    o   := is now available to j and means assign to the column by
        reference; e.g.,

            DT[i,colname:=value]
        
        This syntax makes no copies of any part of memory at all.
        
        m = matrix(1,nrow=100000,ncol=100)
        DF = as.data.frame(m)
        DT = as.data.table(m)
        
        system.time(for (i in 1:1000) DF[i,1] <- i)
             user  system elapsed 
          287.062 302.627 591.984 
        
        system.time(for (i in 1:1000) DT[i,V1:=i])
             user  system elapsed 
            1.148   0.000   1.158     ( 511 times faster )

        := in j can be combined with all types of i, such as binary
        search. It can be used to add and remove columns efficiently,
        too. Fast assigning within groups will be implemented in
        future.
        
        *Please note*, := is new and experimental.
        

BUG FIXES

    o   merge()ing two data.table's with user-defined `suffixes`
        was getting tripped up when column names in x ended in
        '.1'. This resulted in the `suffixes` parameter being
        ignored.
        
    o   Mistakenly wrapping a j expression inside quotes; e.g.,
            DT[,list("sum(a),sum(b)"),by=grp]
        was appearing to work, but with wrong column names. This
        now returns a character column (the quotes should not
        be used). Thanks to Joseph Voelkel for reporting.
        
    o   setkey has been made robust in several ways to fix issues
        introduced in 1.6.2: #1465 ('R crashes after setkey')
        reported by Eugene Tyurin and similar bug #1387 ('paste()
        by group to create long comma separated strings can crash')
        reported by Nicolas Servant and Jean-Francois Rami. This
        bug was not reproducible so we are especially grateful for
        the patience of these people in helping us find, fix and
        test it.
        
    o   Combining a join, j and by together in one query now works
        rather than giving an error, fixing bug #1468. Discovered
        indirectly thanks to a post from Jelmer Ypma.
        
    o   Invalid keys are no longer arise when a non-data.table-aware
        package reorders the data; e.g.,
            setkey(DT,x,y)
            plyr::arrange(DT,y)       # same as DT[order(y)]
        This now drops the key to avoid incorrect results being
        returned the next time the invalid key is joined to. Thanks
        to Chris Neff for reporting.


USER-VISIBLE CHANGES

    o   The startup banner has been shortened to one line.
    
    o   data.table does not support POSIXlt. Almost unbelievably
        POSIXlt uses 40 bytes to store a single datetime. If it worked
        before, that was unintentional. Please see ?IDateTime, or any
        other date class that uses a single atomic vector. This is
        regardless of whether the POSIXlt is a key column, or not. This
        resolves bug #1481 by documenting non support in ?data.table.
        

DEPRECATED & DEFUNCT

   o    Use of the DT() alias in j is no longer caught for backwards
        compatibility and is now fully removed. As warned in NEWS
        for v1.5.3, v1.4, and FAQs 2.6 and 2.7.


http://datatable.r-forge.r-project.org/






More information about the datatable-help mailing list