[datatable-help] Video of talk at H2O World

James Eales jeales at gmail.com
Tue Dec 9 18:58:48 CET 2014


fread() and in particular that you can paste content directly into the
terminal i.e.  fread("ctrl-v")
fread() that it can read directly from a massive gzipped text file using a
call to a system command, with no hassle i.e. fread("gunzip -c
massive_file.txt")

foverlaps() just that it exists and how quick it is for region overlaps (I
do a lot of genomics)

subset.data.table() allows negation of column selection i.e.
subset(DT,select=-unwanted_column)

data.table allows chaining of different selection statements
DT[value<0.5][value>0.4][id %in% my_interesting_id_list]

I discover more every time I use it, just thought some more complex
examples (like the every-roof-in-the-uk machine learning example from your
talk) would be helpful to illustrate the range of expressions you can
supply to a data.table
The docs are very good and hugely comprehensive, just sometimes its best to
start with a complex example and then take it apart

On 8 December 2014 at 22:49, Matt Dowle <mdowle at mdowle.plus.com> wrote:

>
> James,
>
> Thanks. Just to avoid crossed-wires, which features do you mean exactly?
>
> Thanks, Matt
>
>
> On 08/12/14 15:25, James Eales wrote:
>
>  Matt,
> Very impressive show of what data.table can do
>  It would be helpful to have a wider set of these more 'advanced'
> data.table function calls in the FAQ
> I keep discovering more features, even after reading the FAQ, R-help and
> intro vignette multiple times (this is not a criticism of the docs, but
> praise for DT's flexibility)
>  Learning by example, even if you don't understand it fully the first
> time, can be very powerful
> James
>
> On 8 December 2014 at 15:03, Matt Dowle <mdowle at mdowle.plus.com> wrote:
>
>>
>> As a few have asked already, will upload slides later.  It was a
>> collection of different files and part was just an R script. I'll need to
>> merge together ...
>>
>>
>> On 08/12/14 14:44, Matt Dowle wrote:
>>
>>> Hi,
>>>
>>> A video of my talk at H2O World in San Francisco recently :
>>>
>>>    https://www.youtube.com/watch?v=MvH1eTdsekA
>>>
>>>   0:00   Examples from two insurance companies using data.table
>>> 12:00   What is data.table, benchmarks dplyr and pandas
>>> 16:55   Overlap joins
>>> 20:00   Rolling joins
>>> 22:30   data.table radix sorting is better than hashing (dplyr and
>>> pandas)
>>> 23:00   H2O (just parallel file reading and grouping as quick test)
>>> 30:00   Quick rerun of talk at Bay Area R User Group (sorting benchmark,
>>> automatic indexes flows through to dplyr, numeric rounding)
>>> 33:10   My status
>>> 36:45   Questions
>>> 49:26   End
>>>
>>> Comments/suggestions very welcome.
>>>
>>> Matt
>>>
>>>
>>>
>> _______________________________________________
>> datatable-help mailing list
>> datatable-help at lists.r-forge.r-project.org
>>
>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
>>
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/datatable-help/attachments/20141209/92cb2027/attachment.html>


More information about the datatable-help mailing list