[datatable-help] 1.7.8 submitted to CRAN

Matthew Dowle mdowle at mdowle.plus.com
Wed Jan 25 07:22:20 CET 2012


BUG FIXES

o  unique(DT) now works when DT is keyed and a key
   column is called 'x' (an internal scoping conflict
   introduced in v1.6.1). Thanks to Steven Bagley for
   reporting.
        
o  Errors and seg faults could occur in grouping when
   j contained character or list columns. Many thanks
   to Jim Holtman for providing a reproducible example.
        
o  Setting a key on a table with over 268 million rows
   (2^31/8) now works (again), #1714. Bug introduced in
   v1.7.2. setkey works up to the regular R vector limit
   of 2^31 rows (2 billion). Thanks to Leon Baum
   for reporting.
        
o  Checks in := are now made up front (before starting to
   modify the data.table) so that the data.table isn't
   left in an invalid state should an error occur, #1711.
   Thanks to Chris Neff for reporting.
        
o  The 'Chris crash' is fixed. The root cause was that key<-
   always copies the whole table. The problem with that copy
   (other than being slower) is that R doesn't maintain the
   over allocated truelength, but it looks as though it has.
   key<- was used internally, in particular in merge(). So,
   adding a column using := after merge() was a memory overwrite,
   since the over allocated memory wasn't really there after
   key<-'s copy.
        
   data.tables now have a new attribute '.internal.selfref' to
   catch and warn about such copies in future. All internal
   use of key<- has been replaced with setkey(), or new function
   setkeyv() which accepts a vector, and do not copy.
        
   Many thanks to Chris Neff for extended dialogue, providing a
   reproducible example and his patience. This problem was not just
   in pre 2.14.0, but post 2.14.0 as well. Thanks also to Christoph
   Jäckel, Timothée Carayol and DM for investigations and suggestions,
   which in combination led to the solution.

o  An example in ?":=" fixed, and j and by descriptions
   improved in ?data.table. Thanks to Joseph Voelkel for
   reporting.
 
       
NEW FEATURES

o  Multiple new columns can be added by reference using
   := and with=FALSE; e.g.,
       DT[,c("foo","bar"):=1L,with=FALSE]
       DT[,c("foo","bar"):=list(1L,2L),with=FALSE]
       
o  := now recycles vectors of non divisible length, with
   a warning (previously an error).
       
o  When setkey coerces a numeric or character column, it
   no longer makes a copy of the whole table, FR#1744. Thanks
   to an investigation by DM.
        
o  New function setkeyv(DT,v) (v stands for vector) replaces
   key(DT)<-v syntax. Also added setattr(). See ?copy.
        
o  merge() now uses (manual) secondary keys, for speed.


USER VISIBLE CHANGES

o  The loc argument of setkey has been removed. This wasn't very
   useful and didn't warrant a period of deprecation.
        
o  datatable.alloccol has been removed. That warning is now
   controlled by datatable.verbose=TRUE. One option is easer.
        
o  If i is a keyed data.table, it is no longer an error if its
   key is longer than x's key; the first length(key(x)) columns
   of i's key are used to join.
        




More information about the datatable-help mailing list