[datatable-help] data.table - sort key - columns with real numbers

Harish harishv_99 at yahoo.com
Wed Jun 30 20:04:48 CEST 2010


Desmond,

Since you used the word "sort key" in the subject, I am wondering whether
you are only trying to sort the data based on the decimal data and not
truly use it as a key.

If you are using the decimal data to cross reference information in other
data.tables as the identifier either using J() or merge(), then you have
to do what Matthew suggested -- multiply or use factor().

If you are only trying to sort the data, you really don't need to make it
a key.  You can use the order() function and use it in the "i" argument of
data.table to accomplish this.
   Example: DT[ order( a ), list( Col1, Col2 ) ]
Here you can sort decimal data directly.


Regards,
Harish


--- On Wed, 6/30/10, mdowle at mdowle.plus.com <mdowle at mdowle.plus.com> wrote:

> From: mdowle at mdowle.plus.com <mdowle at mdowle.plus.com>
> Subject: Re: [datatable-help] data.table - sort key - columns with real numbers
> To: "Desmond Wee" <globalvestor at optusnet.com.au>
> Cc: datatable-help at lists.r-forge.r-project.org
> Date: Wednesday, June 30, 2010, 10:13 AM
> 
> Thanks Desmond for your comments.
> 
> One reason for integers is that radix sorting can be used
> on integers and
> thats amazingly fast (Tom added radix to data.table using
> ?order.list).
> The nature of the radix algorithm itself means it _only_
> works for
> integers, see Wikipedia.
> 
> Also, keys are usually used in an equi-join, and this
> requires test of
> equality (==) internally. Integer equality doesn't have the
> machine
> tolerance issues of double.
> 
> Essentially, the idea of keys is they represent unique,
> discrete things.
> Whereas floating point is continuous.
> 
> If the distinct set of items happen to be described by
> floating point
> numbers, perhaps like longitude and latitude of distinct
> places on the
> earth,  then as you are doing by *1000 is what other
> people do, or using
> factor() to store the floats as strings.
> 
> To make it easier, you could define your own small class
> for your
> datatype, say coord(). The print method would automatically
> divide by 1000
> for you, so you wouldn't have to remember each time. Its
> pretty quick and
> easy to do. That way you retain the speed and memory
> advantage of integer
> (its half as big as double, and sorts and queries many
> times faster) but
> it _appears_ to be float. The particular implementation
> depends on your
> particular data so its something you would do rather than
> the data.table
> package.  If it really is truly continuous, then how
> can it be in a key ?
> 
> However, having said that, I may be easily persuaded to
> give it higher
> priority if someone can explain (e.g. provide an example)
> why float in
> keys is more valid than I currently think it is?
> 
> Maybe we should create a decimal() class?  A fixed
> precision float, stored
> as integer.  Maybe that could be in data.table.
> 
> When I made large changes internally earlier this
> year,  I did it in such
> a way that we could switch on integer/double. Before that
> change, the
> switch would have slowed things down too much as it would
> have been too
> deep.  Now, maybe.  Or maybe a decimal()
> class.  Maybe that exists already
> somewhere?
> 
> Matthew
> 
> 
> > Dear Mathew & Tom,
> >
> > I would like to thank you very much for contributing
> such an excellent
> > and useful package. I have been trying to write some
> form of R codes to
> > overcome some of the limitations data.frame and you
> have addressed the
> > issues.
> >
> > I have a data table which has columns containing
> decimal points. Your
> > current setkey ( ) only allows integer mode and do not
> allow decimals. I
> > figured out that to overcome the problem I need to
> multiple the column
> > by 10^7 to convert to integer and then to divide by
> 10^7 to obtain the
> > actual value. It is a very messy and cumbersome
> process. Could you
> > please make changes to allow keys to have real numbers
> and maybe other
> > modes too? I would like to suggest that your codes do
> the conversions
> > and would make the package more elegant.
> >
> > Please inform me whether you will be modifying your
> package and how soon
> > will you be attempting to incorporate the changes?
> >
> > Thanks again.
> >
> >
> > Regards,
> > Desmond Wee
> >
> > _______________________________________________
> > datatable-help mailing list
> > datatable-help at lists.r-forge.r-project.org
> > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
> >
> 
> 
> _______________________________________________
> datatable-help mailing list
> datatable-help at lists.r-forge.r-project.org
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
> 


      


More information about the datatable-help mailing list