[datatable-help] fread(character string) limited to strings less than 4096 long?
Matthew Dowle
mdowle at mdowle.plus.com
Thu Mar 28 15:55:17 CET 2013
Hm this is odd.
Could you run the following and paste back the
(verbose) results please.
for (n in c(1023:1025, 10000)) {
input =
paste( rep('atbn', n), collapse='')
A = fread(input,verbose=TRUE)
cat(nchar(input), nrow(A), "n")
}
On 28.03.2013 14:38, Timothée Carayol
wrote:
> Curiouser and curiouser..
>
> I can reproduce on two
computers with different versions of R and of data.table.
>
> Computer
1 (it says unknown-linux but is actually ubuntu):
>
> R version 2.15.3
(2013-03-01)
> Platform: x86_64-unknown-linux-gnu (64-bit)
>
>
locale:
> [1] LC_CTYPE=en_GB.UTF-8 LC_NUMERIC=C LC_TIME=en_GB.UTF-8
LC_COLLATE=en_GB.UTF-8 LC_MONETARY=en_GB.UTF-8
>
LC_MESSAGES=en_GB.UTF-8 LC_PAPER=C LC_NAME=C LC_ADDRESS=C
> [10]
LC_TELEPHONE=C LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C
>
>
attached base packages:
> [1] stats graphics grDevices utils datasets
methods base
>
> other attached packages:
> [1] bit64_0.9-2
bit_1.1-10 data.table_1.8.9 colorout_1.0-0
> Computer 2:
>
> R
version 2.15.2 (2012-10-26)
> Platform: x86_64-redhat-linux-gnu
(64-bit)
>
> locale:
> [1] LC_CTYPE=en_GB.UTF-8 LC_NUMERIC=C
> [3]
LC_TIME=en_GB.UTF-8 LC_COLLATE=en_GB.UTF-8
> [5]
LC_MONETARY=en_GB.UTF-8 LC_MESSAGES=en_GB.UTF-8
> [7] LC_PAPER=C
LC_NAME=C
> [9] LC_ADDRESS=C LC_TELEPHONE=C
> [11]
LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C
>
> attached base
packages:
> [1] stats graphics grDevices utils datasets methods base
>
> other attached packages:
> [1] data.table_1.8.8
>
> loaded via a
namespace (and not attached):
> [1] tools_2.15.2
>
> On Thu, Mar 28,
2013 at 2:31 PM, Matthew Dowle <mdowle at mdowle.plus.com [4]> wrote:
>
>>
Interesting, what's your sessionInfo() please?
>>
>> For me it seems
to work ok :
>>
>> [1] 1022
>> [1] 1023
>> [1] 1024
>> [1] 9999
>>
>>> sessionInfo()
>> R version 2.15.2 (2012-10-26)
>> Platform:
x86_64-w64-mingw32/x64 (64-bit)
>>
>> On 27.03.2013 22:49, Timothée
Carayol wrote:
>>
>>> Agree with Muhammad, longer character strings
are definitely permitted in R.
>>> A minimal example that show
something strange happening with fread:
>>>
>>> for (n in c(1023:1025,
10000)) {
>>> A
>>>
>>> paste(
>>> rep('atbn', n),
>>> collapse=''
>>> ),
>>> sep='t'
>>> )
>>> print(nrow(A))
>>> }
>>> On my
computer, I obtain:
>>>
>>> [1] 1022
>>> [1] 1023
>>> [1] 1023
>>>
[1] 1023
>>> Hope this helps
>>> Timothée
>>>
>>> On Wed, Mar 27,
2013 at 9:23 PM, Matthew Dowle <mdowle at mdowle.plus.com [3]> wrote:
>>>
>>>> Hi,
>>>> Nice to hear from you. Nope not known to me. Obviously
4096 is 4k, is that
>>>> the R limit for a character string length? What
happens at 4097?
>>>> Matthew
>>>>
>>>> > Hi,
>>>> >
>>>> > I have an
example of a string of 4097 characters which can't be parsed by
>>>> >
fread; however, if I remove any character, it can be parsed just fine.
Is
>>>> > that a known limitation?
>>>> >
>>>> > (If I write the string
to a file and then fread the file name, it works
>>>> > too.)
>>>>
>
>>>> > Let me know if you need the string and/or a bug report.
>>>>
>
>>>> > Thanks
>>>> > Timothée >
_______________________________________________
>>>> > datatable-help
mailing list
>>>> > datatable-help at lists.r-forge.r-project.org [1]
>>>>
>
https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
[2]
Links:
------
[1]
mailto:datatable-help at lists.r-forge.r-project.org
[2]
https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
[3]
mailto:mdowle at mdowle.plus.com
[4] mailto:mdowle at mdowle.plus.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/datatable-help/attachments/20130328/e2470899/attachment-0001.html>
More information about the datatable-help
mailing list