[datatable-help] Possible FR - but just checking opinons

Matt Dowle mdowle at mdowle.plus.com
Fri Mar 14 11:55:21 CET 2014


Hi,

It sounds like you mean 'foreign' key.  This could be useful, yes.  In 
simple cases,  I've seen that used in SQL to do what R does 
automatically.  A de-normalised database in SQL may have lookup tables 
with two columns mapping say country id to country name,  to save 
storing long country names over and over in a CHAR() or VARCHAR() 
field.  We used to do that more simply in R using factors, and then R 
itself introduced the global string cache so it does that for us now.   
If you have a country name in full repeated 10 million times in a 
data.table (or data.frame or any character vector) then all R is storing 
there is 10 million pointers (4 or 8 bytes) to the unique strings it has 
already cached.  That's similar to what foreign keys in SQL do,  but 
much simpler.

That said, we're settling on i. and x. prefixes in j  (changes in v1.9.3 
for that to be checked ok please as per other email).  So to use a 
foreign key for more complicated cases could be an extension of this by 
using the table name as a prefix, provided that table was linked to x 
via a previous foreign key definition (similar to SQL).

'Secondary' keys on the other hand are different.  That's just like 
having several pre-saved indexes on a table so you can join to it in 
different ways.  Currently data.table's key is analogous to SQL's 
clustered index (actually how the rows are ordered on disk, in RAM),  
and secondary keys in data.table would be analogous to a regular SQL index.

Interesting area.  Any real world examples anyone has would be useful to 
illustrate.

Matt


On 14/03/14 08:31, carrieromichele wrote:
> Hello list,
>
> I know this may sound weird and I understand that what follows might 
> be considered as out of scope but I'd like your opinions on this.
>
> I've just seen a new comment to FR  #1007 and it got me thinking about 
> the SQL concept of primary and secondary key (where the latter is 
> linked to the primary key of another table). Again, this is a pure 
> speculation post. I just wanted your opinions about having such 
> features in R (via data.table)
>
> Thanks,
>
> Michele.
>
>
> <http://www.evolve-analytics.com>
>
>
>
> _______________________________________________
> datatable-help mailing list
> datatable-help at lists.r-forge.r-project.org
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/datatable-help/attachments/20140314/1fd5bad6/attachment.html>


More information about the datatable-help mailing list