[datatable-help] fread(character string) limited to strings less than 4096 long?

Timothée Carayol timothee.carayol at gmail.com
Thu Mar 28 15:38:37 CET 2013


Curiouser and curiouser..

I can reproduce on two computers with different versions of R and of
data.table.



Computer 1 (it says unknown-linux but is actually ubuntu):

R version 2.15.3 (2013-03-01)

Platform: x86_64-unknown-linux-gnu (64-bit)



locale:

 [1] LC_CTYPE=en_GB.UTF-8       LC_NUMERIC=C
LC_TIME=en_GB.UTF-8        LC_COLLATE=en_GB.UTF-8
LC_MONETARY=en_GB.UTF-8
   LC_MESSAGES=en_GB.UTF-8    LC_PAPER=C                 LC_NAME=C
         LC_ADDRESS=C
[10] LC_TELEPHONE=C             LC_MEASUREMENT=en_GB.UTF-8
LC_IDENTIFICATION=C



attached base packages:

[1] stats     graphics  grDevices utils     datasets  methods   base



other attached packages:

[1] bit64_0.9-2      bit_1.1-10       data.table_1.8.9 colorout_1.0-0



Computer 2:

R version 2.15.2 (2012-10-26)
Platform: x86_64-redhat-linux-gnu (64-bit)

locale:
 [1] LC_CTYPE=en_GB.UTF-8       LC_NUMERIC=C
 [3] LC_TIME=en_GB.UTF-8        LC_COLLATE=en_GB.UTF-8
 [5] LC_MONETARY=en_GB.UTF-8    LC_MESSAGES=en_GB.UTF-8
 [7] LC_PAPER=C                 LC_NAME=C
 [9] LC_ADDRESS=C               LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
[1] data.table_1.8.8

loaded via a namespace (and not attached):
[1] tools_2.15.2




On Thu, Mar 28, 2013 at 2:31 PM, Matthew Dowle <mdowle at mdowle.plus.com>wrote:

> **
>
>
>
> Interesting, what's your sessionInfo() please?
>
> For me it seems to work ok :
>
> [1] 1022
> [1] 1023
> [1] 1024
> [1] 9999
>
> > sessionInfo()
> R version 2.15.2 (2012-10-26)
> Platform: x86_64-w64-mingw32/x64 (64-bit)
>
>
>
> On 27.03.2013 22:49, Timothée Carayol wrote:
>
>  Agree with Muhammad, longer character strings are definitely permitted
> in R.
> A minimal example that show something strange happening with fread:
>  for (n in c(1023:1025, 10000)) {
>   A
>            paste(
>                  rep('a\tb\n', n),
>                  collapse=''
>                  ),
>            sep='\t'
>            )
>   print(nrow(A))
> }
> On my computer, I obtain:
>  [1] 1022
> [1] 1023
> [1] 1023
> [1] 1023
>  Hope this helps
> Timothée
>
>
> On Wed, Mar 27, 2013 at 9:23 PM, Matthew Dowle <mdowle at mdowle.plus.com>wrote:
>
>> Hi,
>> Nice to hear from you. Nope not known to me. Obviously 4096 is 4k, is that
>> the R limit for a character string length? What happens at 4097?
>> Matthew
>>
>> > Hi,
>> >
>> > I have an example of a string of 4097 characters which can't be parsed
>> by
>> > fread; however, if I remove any character, it can be parsed just fine.
>> Is
>> > that a known limitation?
>> >
>> > (If I write the string to a file and then fread the file name, it works
>> > too.)
>> >
>> > Let me know if you need the string and/or a bug report.
>> >
>> > Thanks
>> > Timothée
>>  > _______________________________________________
>> > datatable-help mailing list
>> > datatable-help at lists.r-forge.r-project.org
>> >
>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
>>
>>
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/datatable-help/attachments/20130328/358bfb64/attachment.html>


More information about the datatable-help mailing list