[datatable-help] fread: skip

Gabor Grothendieck ggrothendieck at gmail.com
Sun May 12 01:47:01 CEST 2013


Not with the csv I tried.  The header is messed up (most of the header
fields are missing) and it misconstrues it as data.

The automation is great but some way to force its behavior when you
know what it should do seems essential since heuristics can't be
expected to work in all cases.

On Sat, May 11, 2013 at 6:35 PM, Matthew Dowle <mdowle at mdowle.plus.com> wrote:
>
> Hi,
>
> Does the auto skip feature of fread cover both of those?  From ?fread :
>
>   " Once the separator is found on line autostart, the number of columns is
> determined. Then the file is searched backwards from autostart until a row
> is found that doesn't have that number of columns, or the start of file is
> reached. Thus, the first data row is found and any human readable banners
> are automatically skipped. This feature can be particularly useful for
> loading a set of files which may not all have consistently sized banners. "
>
> There were also some issue with header=FALSE in the first release (1.8.8)
> which have since been fixed in 1.8.9.
>
> Matthew
>
>
>
> On 11.05.2013 23:16, Gabor Grothendieck wrote:
>>
>> I would find it useful if fread had a skip= argument as in read.table
>> since I have files from time to time that have garbage at the top.
>> Another situation I find from time to time is that the header is
>> messed up but one can still read the file if one can skip over the
>> header and specify header = FALSE.
>>
>> An extra feature that would be nice but less important would be if one
>> could specify skip = "string" and have it skip all lines until it
>> found one with "string": in it and then start reading from the matched
>> row onward.   Normally the string would be chosen to be a string found
>> in the header and not likely found prior to the header. read.xls in
>> gdata has a similar feature  and I find it quite handy at times.
>>
>> --
>> Statistics & Software Consulting
>> GKX Group, GKX Associates Inc.
>> tel: 1-877-GKX-GROUP
>> email: ggrothendieck at gmail.com
>> _______________________________________________
>> datatable-help mailing list
>> datatable-help at lists.r-forge.r-project.org
>>
>>
>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help



-- 
Statistics & Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com


More information about the datatable-help mailing list