[datatable-help] fread(character string) limited to strings less than 4096 long?

Matthew Dowle mdowle at mdowle.plus.com
Thu Mar 28 15:55:17 CET 2013


 

Hm this is odd. 

Could you run the following and paste back the
(verbose) results please. 

for (n in c(1023:1025, 10000)) {
 input =
paste( rep('atbn', n), collapse='')
 A = fread(input,verbose=TRUE)

cat(nchar(input), nrow(A), "n")
}

On 28.03.2013 14:38, Timothée Carayol
wrote: 

> Curiouser and curiouser.. 
> 
> I can reproduce on two
computers with different versions of R and of data.table. 
> 
> Computer
1 (it says unknown-linux but is actually ubuntu): 
> 
> R version 2.15.3
(2013-03-01) 
> Platform: x86_64-unknown-linux-gnu (64-bit) 
> 
>
locale: 
> [1] LC_CTYPE=en_GB.UTF-8 LC_NUMERIC=C LC_TIME=en_GB.UTF-8
LC_COLLATE=en_GB.UTF-8 LC_MONETARY=en_GB.UTF-8 
>
LC_MESSAGES=en_GB.UTF-8 LC_PAPER=C LC_NAME=C LC_ADDRESS=C 
> [10]
LC_TELEPHONE=C LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C 
> 
>
attached base packages: 
> [1] stats graphics grDevices utils datasets
methods base 
> 
> other attached packages: 
> [1] bit64_0.9-2
bit_1.1-10 data.table_1.8.9 colorout_1.0-0 
> Computer 2: 
> 
> R
version 2.15.2 (2012-10-26) 
> Platform: x86_64-redhat-linux-gnu
(64-bit) 
> 
> locale: 
> [1] LC_CTYPE=en_GB.UTF-8 LC_NUMERIC=C 
> [3]
LC_TIME=en_GB.UTF-8 LC_COLLATE=en_GB.UTF-8 
> [5]
LC_MONETARY=en_GB.UTF-8 LC_MESSAGES=en_GB.UTF-8 
> [7] LC_PAPER=C
LC_NAME=C 
> [9] LC_ADDRESS=C LC_TELEPHONE=C 
> [11]
LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C 
> 
> attached base
packages: 
> [1] stats graphics grDevices utils datasets methods base 
>

> other attached packages: 
> [1] data.table_1.8.8 
> 
> loaded via a
namespace (and not attached): 
> [1] tools_2.15.2 
> 
> On Thu, Mar 28,
2013 at 2:31 PM, Matthew Dowle <mdowle at mdowle.plus.com [4]> wrote:
> 
>>
Interesting, what's your sessionInfo() please? 
>> 
>> For me it seems
to work ok : 
>> 
>> [1] 1022
>> [1] 1023
>> [1] 1024 
>> [1] 9999
>>

>>> sessionInfo()
>> R version 2.15.2 (2012-10-26)
>> Platform:
x86_64-w64-mingw32/x64 (64-bit)
>> 
>> On 27.03.2013 22:49, Timothée
Carayol wrote: 
>> 
>>> Agree with Muhammad, longer character strings
are definitely permitted in R. 
>>> A minimal example that show
something strange happening with fread: 
>>> 
>>> for (n in c(1023:1025,
10000)) { 
>>> A 
>>> 
>>> paste( 
>>> rep('atbn', n), 
>>> collapse=''

>>> ), 
>>> sep='t' 
>>> ) 
>>> print(nrow(A)) 
>>> } 
>>> On my
computer, I obtain: 
>>> 
>>> [1] 1022 
>>> [1] 1023 
>>> [1] 1023 
>>>
[1] 1023 
>>> Hope this helps 
>>> Timothée 
>>> 
>>> On Wed, Mar 27,
2013 at 9:23 PM, Matthew Dowle <mdowle at mdowle.plus.com [3]> wrote:
>>>

>>>> Hi,
>>>> Nice to hear from you. Nope not known to me. Obviously
4096 is 4k, is that
>>>> the R limit for a character string length? What
happens at 4097?
>>>> Matthew
>>>> 
>>>> > Hi,
>>>> >
>>>> > I have an
example of a string of 4097 characters which can't be parsed by
>>>> >
fread; however, if I remove any character, it can be parsed just fine.
Is
>>>> > that a known limitation?
>>>> >
>>>> > (If I write the string
to a file and then fread the file name, it works
>>>> > too.)
>>>>
>
>>>> > Let me know if you need the string and/or a bug report.
>>>>
>
>>>> > Thanks
>>>> > Timothée >
_______________________________________________
>>>> > datatable-help
mailing list
>>>> > datatable-help at lists.r-forge.r-project.org [1]
>>>>
>
https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
[2]

 

Links:
------
[1]
mailto:datatable-help at lists.r-forge.r-project.org
[2]
https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
[3]
mailto:mdowle at mdowle.plus.com
[4] mailto:mdowle at mdowle.plus.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/datatable-help/attachments/20130328/e2470899/attachment-0001.html>


More information about the datatable-help mailing list