[datatable-help] 1.7.8 submitted to CRAN
Matthew Dowle
mdowle at mdowle.plus.com
Wed Jan 25 07:22:20 CET 2012
BUG FIXES
o unique(DT) now works when DT is keyed and a key
column is called 'x' (an internal scoping conflict
introduced in v1.6.1). Thanks to Steven Bagley for
reporting.
o Errors and seg faults could occur in grouping when
j contained character or list columns. Many thanks
to Jim Holtman for providing a reproducible example.
o Setting a key on a table with over 268 million rows
(2^31/8) now works (again), #1714. Bug introduced in
v1.7.2. setkey works up to the regular R vector limit
of 2^31 rows (2 billion). Thanks to Leon Baum
for reporting.
o Checks in := are now made up front (before starting to
modify the data.table) so that the data.table isn't
left in an invalid state should an error occur, #1711.
Thanks to Chris Neff for reporting.
o The 'Chris crash' is fixed. The root cause was that key<-
always copies the whole table. The problem with that copy
(other than being slower) is that R doesn't maintain the
over allocated truelength, but it looks as though it has.
key<- was used internally, in particular in merge(). So,
adding a column using := after merge() was a memory overwrite,
since the over allocated memory wasn't really there after
key<-'s copy.
data.tables now have a new attribute '.internal.selfref' to
catch and warn about such copies in future. All internal
use of key<- has been replaced with setkey(), or new function
setkeyv() which accepts a vector, and do not copy.
Many thanks to Chris Neff for extended dialogue, providing a
reproducible example and his patience. This problem was not just
in pre 2.14.0, but post 2.14.0 as well. Thanks also to Christoph
Jäckel, Timothée Carayol and DM for investigations and suggestions,
which in combination led to the solution.
o An example in ?":=" fixed, and j and by descriptions
improved in ?data.table. Thanks to Joseph Voelkel for
reporting.
NEW FEATURES
o Multiple new columns can be added by reference using
:= and with=FALSE; e.g.,
DT[,c("foo","bar"):=1L,with=FALSE]
DT[,c("foo","bar"):=list(1L,2L),with=FALSE]
o := now recycles vectors of non divisible length, with
a warning (previously an error).
o When setkey coerces a numeric or character column, it
no longer makes a copy of the whole table, FR#1744. Thanks
to an investigation by DM.
o New function setkeyv(DT,v) (v stands for vector) replaces
key(DT)<-v syntax. Also added setattr(). See ?copy.
o merge() now uses (manual) secondary keys, for speed.
USER VISIBLE CHANGES
o The loc argument of setkey has been removed. This wasn't very
useful and didn't warrant a period of deprecation.
o datatable.alloccol has been removed. That warning is now
controlled by datatable.verbose=TRUE. One option is easer.
o If i is a keyed data.table, it is no longer an error if its
key is longer than x's key; the first length(key(x)) columns
of i's key are used to join.
More information about the datatable-help
mailing list