[datatable-help] fread for flat files

Michael Smith my.r.help at gmail.com
Sun Apr 20 16:36:01 CEST 2014


Not sure exactly what you mean by "flat file." I previously assumed you
mean fixed width formatted data, but now you say they are concatenated
and there are no spaces. So what's the column separator? Tab, comma, ...?

If everything else fails, try Stat/Transfer.

M

On 04/20/2014 03:04 PM, Mark Danese wrote:
> Thanks Michael.  The flat file format doesn’t have spaces between fields.
> They are all concatenated.  It may be possible to use sed with a vector of
> widths, but I am not a command-line person (yet).
> 
> It just may be one of those things that isn’t easy to implement in fread.
> In healthcare in the US there are still a lot of flat files out there.  We
> usually use SAS but I am trying to get away from that.  And R can read
> flat files(read.fwf), but it is pretty slow.  From what I understand,
> read.fwf actually does insert commas and then reads the file.  So, it
> might be possible to hack read.fwf and fread together somehow.
> 
> My first experience with fread was to read in a 1.6 GB file in 30 seconds.
>  That was pretty impressive.
> 
> 
> On 4/19/14, 5:07 AM, "Michael Smith" <my.r.help at gmail.com> wrote:
> 
>> Probably you could do this from the Linux command line using `sed`, i.e.
>> to replace several spaces with a comma.
>>
>> https://www.google.com/search?q=sed+replace+space+with+comma
>>
>> If you're on Windows, you probably can do the same using Cygwin.
>>
>> M
>>
>>
>> On 04/19/2014 12:37 AM, Mark Danese wrote:
>>> Is it possible to pass a vector of column widths to have fread read in a
>>> flat file?  I saw that someone suggested using csvkit to add commas and
>>> then use data table, but that is beyond my skill set.
>>>
>>>
>>> _______________________________________________
>>> datatable-help mailing list
>>> datatable-help at lists.r-forge.r-project.org
>>>
>>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-he
>>> lp
>>>
> 


More information about the datatable-help mailing list