[datatable-help] Memory issue
Matthew Dowle
mdowle at mdowle.plus.com
Tue Oct 23 18:50:01 CEST 2012
Hi Gene,
Thanks for all this. Sorry for the delay. Have looked through. It does
seem likely to do with those very long character strings. Could you save
head() of the data, before setting the key, and either email it or save
online somewhere please?
Matthew
> Ok, here is my very lengthy reply with lots of diagnostics.
>
>
>>
>> ## Clear the workspace
>> rm(list=ls())
>>
>> ## I use a function called "loader" to load single data objects
>> if(!require('geneorama')){
> + source('https://raw.github.com/geneorama/geneorama/master/R/loader.R')
> + cat('loading function \"loader\"')
> + }
>>
>> ## Load the data
>> Small = loader('test0')
>> Large = loader('test1')
>>
>> ## The two files will be different because their order is different
>> str(Small)
> Classes data.table and 'data.frame': 3103314 obs. of 42 variables:
> $ index : int 1 2 3 4 5 6 7 8 9 10 ...
> $ char1 : chr "http://conradhotels3.hilton.com" "
> http://conradhotels3.hilton.com" "http://conradhotels3.hilton.com" "
> http://conradhotels3.hilton.com" ...
> $ char2 : chr "/en/index.html" "/en/index.html" "/en/index.html"
> "/en/index.html" ...
> $ char3 : chr "" "" "" "" ...
> $ int1 : int 44903 44903 44903 44903 44903 44903 44903 44903 44903
> 44903
> ...
> $ int2 : int 411 411 254 254 336 336 118 118 386 386 ...
> $ char4 : chr "2012-05-09 20:17:40.587" "2012-05-09 21:17:54.427"
> "2012-05-09 20:10:49.560" "2012-05-09 21:11:05.107" ...
> $ int3 : int 0 0 0 0 0 0 0 0 0 0 ...
> $ int4 : int 0 0 0 0 0 0 0 0 0 0 ...
> $ int5 : int 69 69 69 69 69 69 69 68 68 68 ...
> $ int6 : int 68 68 68 68 68 68 68 67 67 67 ...
> $ int7 : int 35 35 37 35 35 35 33 38 38 40 ...
> $ int8 : int 0 0 0 0 0 0 0 0 0 0 ...
> $ int9 : int 0 0 0 0 0 0 0 0 0 0 ...
> $ int10 : int 0 0 0 0 0 0 0 0 0 0 ...
> $ int11 : int 1 1 1 1 1 1 1 1 1 1 ...
> $ int12 : int 334830 334847 335102 334838 334836 342687 334521 318626
> 318578 326800 ...
> $ int13 : int 36 36 37 36 36 36 35 38 37 39 ...
> $ int14 : int 44 44 49 47 45 45 45 46 45 48 ...
> $ char5 : chr "" "" "" "" ...
> $ int15 : int NA NA NA NA NA NA NA NA NA NA ...
> $ int16 : int 0 0 0 0 0 0 0 0 0 0 ...
> $ int17 : int 0 0 0 0 0 0 0 0 0 0 ...
> $ int18 : int 2 2 2 2 2 2 2 2 2 2 ...
> $ int19 : int 1381 1152 424 3728 1772 921 385 725 401 314 ...
> $ int20 : int 36 36 37 36 36 36 35 38 37 39 ...
> $ int21 : int 2199 2201 1492 1448 2559 2529 1084 1432 1876 1984 ...
> $ int22 : int 44 44 49 47 45 45 45 46 45 48 ...
> $ int23 : int 2203 2188 1199 1162 2324 2346 821 897 1386 1189 ...
> $ int24 : int 13 13 14 13 13 13 12 13 13 14 ...
> $ int25 : int 5166 5761 3755 3794 5614 7779 2830 3971 4637 5871 ...
> $ int26 : int 103 103 105 103 103 103 101 105 105 107 ...
> $ int27 : int 70 183 159 197 217 165 153 232 92 102 ...
> $ int28 : int 103 103 105 103 103 103 101 105 105 107 ...
> $ int29 : int 0 0 0 0 0 0 0 0 0 0 ...
> $ int30 : int 161 146 200 158 150 160 190 161 163 169 ...
> $ char6 : chr "Limelight" "Limelight" "Fusepoint/Savvis"
> "Fusepoint/Savvis" ...
> $ char7 : chr "Paris" "Paris" "Toronto" "Toronto" ...
> $ char8 : chr "-1" "-1" "-1" "-1" ...
> $ char9 : chr "FRANCE" "FRANCE" "CANADA" "CANADA" ...
> $ char10: chr "FR" "FR" "CA" "CA" ...
> $ char11: chr "FRANCE" "FRANCE" "CANADA" "CANADA" ...
> - attr(*, ".internal.selfref")=<externalptr>
>> str(Large)
> Classes data.table and 'data.frame': 3103314 obs. of 42 variables:
> $ index : int 716234 716235 1007651 2679944 1550732 1932010 2879445
> 1007670 1736006 666363 ...
> $ char1 : chr "http://go.compuware.com" "http://go.compuware.com" "
> http://www.achmeacollectief.nl" "https://db3.notify.windows.com" ...
> $ char2 : chr "/default.aspx" "/dynaTraceMonitor" "/unilever/" "/ping"
> ...
> $ char3 : chr "?rurl=
> http://frontline.compuware.com//products/BU/default.aspx"
> "?url=http%3A%2F%
> 2Fgo.compuware.com%2Fdefault.aspx%3Frurl%3Dhttp%3A%2F%
> 2Ffrontline.compuware.com%2F%2Fproducts%2FBU%2Fdefault.as"| __truncated__
> "" "" ...
> $ int1 : int 2812881 2812881 3149757 4286896 3618836 3861870 4315803
> 3149760 3779387 2754629 ...
> $ int2 : int 133 133 133 133 340 340 326 133 133 340 ...
> $ char4 : chr "2012-05-09 20:00:00.000" "2012-05-09 20:00:00.000"
> "2012-05-09 20:00:00.000" "2012-05-09 20:00:00.000" ...
> $ int3 : int 0 1 0 0 0 0 0 0 0 0 ...
> $ int4 : int 2264 2496 1782 461 1953 1418 641 1207 167 278 ...
> $ int5 : int 26 20 6 1 71 64 1 6 1 15 ...
> $ int6 : int 26 20 6 1 69 64 1 6 1 15 ...
> $ int7 : int 2 2 4 0 2 12 0 2 0 0 ...
> $ int8 : int 0 0 0 0 2 0 0 0 0 0 ...
> $ int9 : int 0 0 0 0 0 0 0 0 0 0 ...
> $ int10 : int 0 0 0 0 0 0 0 0 0 0 ...
> $ int11 : int 0 0 0 0 0 0 0 0 0 0 ...
> $ int12 : int 392752 417195 43107 0 1419015 1031349 187344 62969 43
> 428189 ...
> $ int13 : int 4 4 5 1 8 22 1 3 1 1 ...
> $ int14 : int 9 11 8 1 17 38 1 6 1 15 ...
> $ char5 : chr "" "" "" "" ...
> $ int15 : int NA NA NA NA 0 NA NA NA NA 0 ...
> $ int16 : int 0 0 0 0 0 0 0 0 0 0 ...
> $ int17 : int 0 0 0 0 0 0 0 0 0 0 ...
> $ int18 : int 2 28 3 0 0 1 0 1 0 0 ...
> $ int19 : int 137 0 136 298 277 255 147 141 137 209 ...
> $ int20 : int 4 0 5 1 8 22 1 3 1 1 ...
> $ int21 : int 945 612 59 22 689 1153 54 29 13 59 ...
> $ int22 : int 9 5 8 1 17 38 1 6 1 15 ...
> $ int23 : int 0 0 0 118 0 0 0 0 0 0 ...
> $ int24 : int 0 0 0 1 0 0 0 0 0 0 ...
> $ int25 : int 3243 2653 1585 22 3292 3076 64 1043 13 81 ...
> $ int26 : int 28 22 10 1 73 76 1 8 1 15 ...
> $ int27 : int 2060 3365 257 1 3304 1038 376 258 4 80 ...
> $ int28 : int 28 22 10 1 73 76 1 8 1 15 ...
> $ int29 : int 0 0 0 0 0 0 0 0 0 0 ...
> $ int30 : int 921 750 203 578 609 1078 234 187 31 140 ...
> $ char6 : chr "Interoute" "Interoute" "Interoute" "Interoute" ...
> $ char7 : chr "Amsterdam" "Amsterdam" "Amsterdam" "Amsterdam" ...
> $ char8 : chr "-1" "-1" "-1" "-1" ...
> $ char9 : chr "NETHERLANDS" "NETHERLANDS" "NETHERLANDS" "NETHERLANDS"
> ...
> $ char10: chr "NL" "NL" "NL" "NL" ...
> $ char11: chr "NETHERLANDS" "NETHERLANDS" "NETHERLANDS" "NETHERLANDS"
> ...
> - attr(*, ".internal.selfref")=<externalptr>
> - attr(*, "sorted")= chr "char4"
>>
>> ## The difference is shown here
>> mapply(identical, Small, Large)
> index char1 char2 char3 int1 int2 char4 int3 int4 int5
> int6 int7 int8 int9
> FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
> FALSE FALSE FALSE FALSE
> int10 int11 int12 int13 int14 char5 int15 int16 int17 int18
> int19 int20 int21 int22
> FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
> FALSE FALSE FALSE FALSE
> int23 int24 int25 int26 int27 int28 int29 int30 char6 char7
> char8 char9 char10 char11
> FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
> FALSE FALSE FALSE FALSE
>> mapply(all.equal, Small, Large)
> index
> "Mean relative difference: 0.6660698"
> char1
> "3100674 string mismatches"
> char2
> "2961621 string mismatches"
> char3
> "1753352 string mismatches"
> int1
> "Mean relative difference: 0.2945024"
> int2
> "Mean relative difference: 0.4866954"
> char4
> "3103308 string mismatches"
> int3
> "Mean relative difference: 1.759713"
> int4
> "Mean relative difference: 1.408616"
> int5
> "Mean relative difference: 1.411817"
> int6
> "Mean relative difference: 1.415648"
> int7
> "Mean relative difference: 1.705137"
> int8
> "Mean relative difference: 1.954795"
> int9
> "Mean relative difference: 1.99701"
> int10
> "Mean relative difference: 1.995529"
> int11
> "Mean relative difference: 2"
> int12
> "Mean relative difference: 1.479043"
> int13
> "Mean relative difference: 1.323619"
> int14
> "Mean relative difference: 1.360022"
> char5
> "1454309 string mismatches"
> int15
> "'is.NA' value mismatch: 2260789 in current 2260789 in target"
> int16
> "Mean relative difference: 1.997195"
> int17
> "Mean relative difference: 2"
> int18
> "Mean relative difference: 1.799441"
> int19
> "Mean relative difference: 1.571321"
> int20
> "Mean relative difference: 1.474492"
> int21
> "Mean relative difference: 1.669488"
> int22
> "Mean relative difference: 1.465307"
> int23
> "Mean relative difference: 1.842191"
> int24
> "Mean relative difference: 1.76578"
> int25
> "Mean relative difference: 1.481612"
> int26
> "Mean relative difference: 1.403655"
> int27
> "Mean relative difference: 1.722723"
> int28
> "Mean relative difference: 1.403655"
> int29
> "Mean relative difference: 2"
> int30
> "Mean relative difference: 1.535987"
> char6
> "2899128 string mismatches"
> char7
> "3008489 string mismatches"
> char8
> "2503189 string mismatches"
> char9
> "2957002 string mismatches"
> char10
> "1933196 string mismatches"
> char11
> "1933196 string mismatches"
>>
>> ## I re-ran the steps to create the files (almost the same the last
> email),
>> ## but added an "index" equal to 1:nrow(datMod)
>> ## This index is used to reorder the files to be consistent
>> LargeOrd = Large[order(Large$index), ]
>> str(LargeOrd)
> Classes data.table and 'data.frame': 3103314 obs. of 42 variables:
> $ index : int 1 2 3 4 5 6 7 8 9 10 ...
> $ char1 : chr "http://conradhotels3.hilton.com" "
> http://conradhotels3.hilton.com" "http://conradhotels3.hilton.com" "
> http://conradhotels3.hilton.com" ...
> $ char2 : chr "/en/index.html" "/en/index.html" "/en/index.html"
> "/en/index.html" ...
> $ char3 : chr "" "" "" "" ...
> $ int1 : int 44903 44903 44903 44903 44903 44903 44903 44903 44903
> 44903
> ...
> $ int2 : int 411 411 254 254 336 336 118 118 386 386 ...
> $ char4 : chr "2012-05-09 20:17:40.587" "2012-05-09 21:17:54.427"
> "2012-05-09 20:10:49.560" "2012-05-09 21:11:05.107" ...
> $ int3 : int 0 0 0 0 0 0 0 0 0 0 ...
> $ int4 : int 0 0 0 0 0 0 0 0 0 0 ...
> $ int5 : int 69 69 69 69 69 69 69 68 68 68 ...
> $ int6 : int 68 68 68 68 68 68 68 67 67 67 ...
> $ int7 : int 35 35 37 35 35 35 33 38 38 40 ...
> $ int8 : int 0 0 0 0 0 0 0 0 0 0 ...
> $ int9 : int 0 0 0 0 0 0 0 0 0 0 ...
> $ int10 : int 0 0 0 0 0 0 0 0 0 0 ...
> $ int11 : int 1 1 1 1 1 1 1 1 1 1 ...
> $ int12 : int 334830 334847 335102 334838 334836 342687 334521 318626
> 318578 326800 ...
> $ int13 : int 36 36 37 36 36 36 35 38 37 39 ...
> $ int14 : int 44 44 49 47 45 45 45 46 45 48 ...
> $ char5 : chr "" "" "" "" ...
> $ int15 : int NA NA NA NA NA NA NA NA NA NA ...
> $ int16 : int 0 0 0 0 0 0 0 0 0 0 ...
> $ int17 : int 0 0 0 0 0 0 0 0 0 0 ...
> $ int18 : int 2 2 2 2 2 2 2 2 2 2 ...
> $ int19 : int 1381 1152 424 3728 1772 921 385 725 401 314 ...
> $ int20 : int 36 36 37 36 36 36 35 38 37 39 ...
> $ int21 : int 2199 2201 1492 1448 2559 2529 1084 1432 1876 1984 ...
> $ int22 : int 44 44 49 47 45 45 45 46 45 48 ...
> $ int23 : int 2203 2188 1199 1162 2324 2346 821 897 1386 1189 ...
> $ int24 : int 13 13 14 13 13 13 12 13 13 14 ...
> $ int25 : int 5166 5761 3755 3794 5614 7779 2830 3971 4637 5871 ...
> $ int26 : int 103 103 105 103 103 103 101 105 105 107 ...
> $ int27 : int 70 183 159 197 217 165 153 232 92 102 ...
> $ int28 : int 103 103 105 103 103 103 101 105 105 107 ...
> $ int29 : int 0 0 0 0 0 0 0 0 0 0 ...
> $ int30 : int 161 146 200 158 150 160 190 161 163 169 ...
> $ char6 : chr "Limelight" "Limelight" "Fusepoint/Savvis"
> "Fusepoint/Savvis" ...
> $ char7 : chr "Paris" "Paris" "Toronto" "Toronto" ...
> $ char8 : chr "-1" "-1" "-1" "-1" ...
> $ char9 : chr "FRANCE" "FRANCE" "CANADA" "CANADA" ...
> $ char10: chr "FR" "FR" "CA" "CA" ...
> $ char11: chr "FRANCE" "FRANCE" "CANADA" "CANADA" ...
> - attr(*, ".internal.selfref")=<externalptr>
>>
>> ## Here the ordered files come out the be equivalent
>> mapply(identical, Small, LargeOrd)
> index char1 char2 char3 int1 int2 char4 int3 int4 int5
> int6 int7 int8 int9
> TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
> TRUE TRUE TRUE TRUE
> int10 int11 int12 int13 int14 char5 int15 int16 int17 int18
> int19 int20 int21 int22
> TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
> TRUE TRUE TRUE TRUE
> int23 int24 int25 int26 int27 int28 int29 int30 char6 char7
> char8 char9 char10 char11
> TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
> TRUE TRUE TRUE TRUE
>> mapply(all.equal, Small, LargeOrd)
> index char1 char2 char3 int1 int2 char4 int3 int4 int5
> int6 int7 int8 int9
> TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
> TRUE TRUE TRUE TRUE
> int10 int11 int12 int13 int14 char5 int15 int16 int17 int18
> int19 int20 int21 int22
> TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
> TRUE TRUE TRUE TRUE
> int23 int24 int25 int26 int27 int28 int29 int30 char6 char7
> char8 char9 char10 char11
> TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
> TRUE TRUE TRUE TRUE
>>
>> ## The inspection results
>> .Internal(inspect(Small))
> @0x00000000128068e8 19 VECSXP g1c7 [OBJ,MARK,NAM(2),ATT] (len=42, tl=0)
> @0x000007ff8a3e0010 13 INTSXP g1c7 [MARK,NAM(2)] (len=3103314, tl=0)
> 1,2,3,4,5,...
> @0x000007ff4fb30010 16 STRSXP g1c7 [MARK,NAM(2)] (len=3103314, tl=0)
> @0x0000000012114550 09 CHARSXP g1c3 [MARK,gp=0x60] [ASCII] [cached] "
> http://conradhotels3.hilton.com"
> @0x0000000012114550 09 CHARSXP g1c3 [MARK,gp=0x60] [ASCII] [cached] "
> http://conradhotels3.hilton.com"
> @0x0000000012114550 09 CHARSXP g1c3 [MARK,gp=0x60] [ASCII] [cached] "
> http://conradhotels3.hilton.com"
> @0x0000000012114550 09 CHARSXP g1c3 [MARK,gp=0x60] [ASCII] [cached] "
> http://conradhotels3.hilton.com"
> @0x0000000012114550 09 CHARSXP g1c3 [MARK,gp=0x60] [ASCII] [cached] "
> http://conradhotels3.hilton.com"
> ...
> @0x000007ff4e380010 16 STRSXP g1c7 [MARK,NAM(2)] (len=3103314, tl=0)
> @0x00000000205bf0d8 09 CHARSXP g1c2 [MARK,gp=0x60] [ASCII] [cached]
> "/en/index.html"
> @0x00000000205bf0d8 09 CHARSXP g1c2 [MARK,gp=0x60] [ASCII] [cached]
> "/en/index.html"
> @0x00000000205bf0d8 09 CHARSXP g1c2 [MARK,gp=0x60] [ASCII] [cached]
> "/en/index.html"
> @0x00000000205bf0d8 09 CHARSXP g1c2 [MARK,gp=0x60] [ASCII] [cached]
> "/en/index.html"
> @0x00000000205bf0d8 09 CHARSXP g1c2 [MARK,gp=0x60] [ASCII] [cached]
> "/en/index.html"
> ...
> @0x000007ff4cbd0010 16 STRSXP g1c7 [MARK,NAM(2)] (len=3103314, tl=0)
> @0x0000000000120f20 09 CHARSXP g1c1 [MARK,gp=0x60] [ASCII] [cached] ""
> @0x0000000000120f20 09 CHARSXP g1c1 [MARK,gp=0x60] [ASCII] [cached] ""
> @0x0000000000120f20 09 CHARSXP g1c1 [MARK,gp=0x60] [ASCII] [cached] ""
> @0x0000000000120f20 09 CHARSXP g1c1 [MARK,gp=0x60] [ASCII] [cached] ""
> @0x0000000000120f20 09 CHARSXP g1c1 [MARK,gp=0x60] [ASCII] [cached] ""
> ...
> @0x000007ff88c20010 13 INTSXP g1c7 [MARK,NAM(2)] (len=3103314, tl=0)
> 44903,44903,44903,44903,44903,...
> ...
> ATTRIB:
> @0x0000000012cab8e0 02 LISTSXP g1c0 [MARK]
> TAG: @0x0000000000120088 01 SYMSXP g1c0 [MARK,NAM(2),LCK,gp=0x4000]
> "names" (has value)
> @0x0000000016868d68 16 STRSXP g1c7 [MARK,NAM(2)] (len=42, tl=0)
> @0x0000000010112b98 09 CHARSXP g1c1 [MARK,gp=0x61] [ASCII] [cached]
> "index"
> @0x0000000016b28fd0 09 CHARSXP g1c1 [MARK,gp=0x61] [ASCII] [cached]
> "char1"
> @0x0000000016b291e0 09 CHARSXP g1c1 [MARK,gp=0x61,ATT] [ASCII]
> [cached] "char2"
> @0x0000000016b293c0 09 CHARSXP g1c1 [MARK,gp=0x61] [ASCII] [cached]
> "char3"
> @0x0000000016b29600 09 CHARSXP g1c1 [MARK,gp=0x61] [ASCII] [cached]
> "int1"
> ...
> TAG: @0x0000000000120558 01 SYMSXP g1c0 [MARK,NAM(2),LCK,gp=0x4000]
> "class" (has value)
> @0x00000000138b5318 16 STRSXP g1c2 [MARK,NAM(2)] (len=2, tl=0)
> @0x000000000b42c760 09 CHARSXP g1c2 [MARK,gp=0x61] [ASCII] [cached]
> "data.table"
> @0x000000000027d230 09 CHARSXP g1c2 [MARK,gp=0x61] [ASCII] [cached]
> "data.frame"
> TAG: @0x0000000000121d98 01 SYMSXP g1c0 [MARK,LCK,gp=0x4000]
> "row.names" (has value)
> @0x0000000012c38050 13 INTSXP g1c1 [MARK,NAM(2)] (len=2, tl=0)
> -2147483648,-3103314
> TAG: @0x000000001497ac10 01 SYMSXP g1c0 [MARK] ".internal.selfref"
> @0x0000000012caaa60 22 EXTPTRSXP g1c0 [MARK,NAM(2)]
>> .Internal(inspect(Large))
> @0x0000000012c24c68 19 VECSXP g1c7 [OBJ,MARK,NAM(2),ATT] (len=42, tl=0)
> @0x000007ff314d0010 13 INTSXP g1c7 [MARK,NAM(2)] (len=3103314, tl=0)
> 716234,716235,1007651,2679944,1550732,...
> @0x000007ff2fd20010 16 STRSXP g1c7 [MARK,NAM(2)] (len=3103314, tl=0)
> @0x000000001253d8e0 09 CHARSXP g1c3 [MARK,gp=0x60,ATT] [ASCII]
> [cached]
> "http://go.compuware.com"
> @0x000000001253d8e0 09 CHARSXP g1c3 [MARK,gp=0x60,ATT] [ASCII]
> [cached]
> "http://go.compuware.com"
> @0x000000001e6d7ab0 09 CHARSXP g1c3 [MARK,gp=0x60] [ASCII] [cached] "
> http://www.achmeacollectief.nl"
> @0x000000001e4a59b8 09 CHARSXP g1c3 [MARK,gp=0x60] [ASCII] [cached] "
> https://db3.notify.windows.com"
> @0x000000001e63ee70 09 CHARSXP g1c3 [MARK,gp=0x60] [ASCII] [cached] "
> http://www.christushealth.org"
> ...
> @0x000007ff2e570010 16 STRSXP g1c7 [MARK,NAM(2)] (len=3103314, tl=0)
> @0x00000000200aa218 09 CHARSXP g1c2 [MARK,gp=0x60,ATT] [ASCII]
> [cached]
> "/default.aspx"
> @0x000000001e444d78 09 CHARSXP g1c3 [MARK,gp=0x60,ATT] [ASCII]
> [cached]
> "/dynaTraceMonitor"
> @0x000000001eb64790 09 CHARSXP g1c2 [MARK,gp=0x60] [ASCII] [cached]
> "/unilever/"
> @0x000000000feb4e98 09 CHARSXP g1c1 [MARK,gp=0x60] [ASCII] [cached]
> "/ping"
> @0x0000000000124950 09 CHARSXP g1c1 [MARK,gp=0x61] [ASCII] [cached]
> "/"
> ...
> @0x000007ff2cdc0010 16 STRSXP g1c7 [MARK,NAM(2)] (len=3103314, tl=0)
> @0x0000000017a39430 09 CHARSXP g1c5 [MARK,gp=0x60] [ASCII] [cached]
> "?rurl=http://frontline.compuware.com//products/BU/default.aspx"
> @0x000000001b721a50 09 CHARSXP g1c7 [MARK,gp=0x60] [ASCII] [cached]
> "?url=http%3A%2F%2Fgo.compuware.com%2Fdefault.aspx%3Frurl%3Dhttp%3A%2F%
> 2Ffrontline.compuware.com
> %2F%2Fproducts%2FBU%2Fdefault.aspx$title=$frames=0$pId=G_1336593601673$fId=G_1336593601673$pFId=$rId=RID_73295254$rpId=1059475658$actions=1%7C_load_%7C-%7C_load_%7C1336593601673%7C1336593602736%7C375%2C2%7C_onload_%7C-%7C_load_%7C1336593602626%7C1336593602704%7C375$domR=1336593602642$dtV=410$3p=
> www.google-analytics.com
> %7C0%7C0%7C0%7C%7C0%7C0%7C0%7C1%7C828_859%7C31%7C31%7C31%7C0%7C%7C0%7C0%7C0%2Cs%7C828%7C859%7C_load_%7Chttp%253A%252F%
> 252Fwww.google-analytics.com%252Fga.js%3B2264ff.r.axf8.net
> %7C0%7C0%7C0%7C%7C0%7C0%7C0%7C1%7C953_1078%7C125%7C125%7C125%7C0%7C%7C0%7C0%7C0%2Cs%7C953%7C1078%7C_load_%7Chttp%253A%252F%
> 252F2264FF.r.axf8.net
> %252Fmr%252Fe.gif%253Finfo%253D%25257Bn%25253Ac%25257Cc%25253A38695455749817%25257Cd%25253A1%25257Ca%25253A2264FF%25257Ch%25253A1%25257Ce%25253A%25257Cb%25253A%25257Cl%25253Ahttp%252524%252A%252524%25252F%
> 25252Fgo.compuware.com
> %25252Fdefault.aspx%25257Cm%25253A1024%25257Co%25253A768%25257Cp%25253AWin32%25257Cq%25253Ax86%25257Ck%25253Alan%25257Cg%25253AMSIE%25257Cf%25253A8.0%25257D%25257Bn%25253Au%25257Ce%25253A1%25257D%2526a%253D2264FF%2526r%253D1%2526s%253D1$time=1336593603689$"
> @0x0000000000120f20 09 CHARSXP g1c1 [MARK,gp=0x60] [ASCII] [cached] ""
> @0x0000000000120f20 09 CHARSXP g1c1 [MARK,gp=0x60] [ASCII] [cached] ""
> @0x0000000000120f20 09 CHARSXP g1c1 [MARK,gp=0x60] [ASCII] [cached] ""
> ...
> @0x000007ff2c1e0010 13 INTSXP g1c7 [MARK,NAM(2)] (len=3103314, tl=0)
> 2812881,2812881,3149757,4286896,3618836,...
> ...
> ATTRIB:
> @0x000000001f163298 02 LISTSXP g1c0 [MARK]
> TAG: @0x0000000000120088 01 SYMSXP g1c0 [MARK,NAM(2),LCK,gp=0x4000]
> "names" (has value)
> @0x000000001283e0e0 16 STRSXP g1c7 [MARK,NAM(2)] (len=42, tl=0)
> @0x0000000010112b98 09 CHARSXP g1c1 [MARK,gp=0x61] [ASCII] [cached]
> "index"
> @0x0000000016b28fd0 09 CHARSXP g1c1 [MARK,gp=0x61] [ASCII] [cached]
> "char1"
> @0x0000000016b291e0 09 CHARSXP g1c1 [MARK,gp=0x61,ATT] [ASCII]
> [cached] "char2"
> @0x0000000016b293c0 09 CHARSXP g1c1 [MARK,gp=0x61] [ASCII] [cached]
> "char3"
> @0x0000000016b29600 09 CHARSXP g1c1 [MARK,gp=0x61] [ASCII] [cached]
> "int1"
> ...
> TAG: @0x0000000000120558 01 SYMSXP g1c0 [MARK,NAM(2),LCK,gp=0x4000]
> "class" (has value)
> @0x000000001368f078 16 STRSXP g1c2 [MARK,NAM(2)] (len=2, tl=0)
> @0x000000000b42c760 09 CHARSXP g1c2 [MARK,gp=0x61] [ASCII] [cached]
> "data.table"
> @0x000000000027d230 09 CHARSXP g1c2 [MARK,gp=0x61] [ASCII] [cached]
> "data.frame"
> TAG: @0x0000000000121d98 01 SYMSXP g1c0 [MARK,LCK,gp=0x4000]
> "row.names" (has value)
> @0x000000000fb9a988 13 INTSXP g1c1 [MARK,NAM(2)] (len=2, tl=0)
> -2147483648,-3103314
> TAG: @0x000000001497ac10 01 SYMSXP g1c0 [MARK] ".internal.selfref"
> @0x000000001f163110 22 EXTPTRSXP g1c0 [MARK,NAM(2)]
> TAG: @0x0000000016c8d648 01 SYMSXP g1c0 [MARK] "sorted"
> @0x000000001ece5f88 16 STRSXP g1c1 [MARK,NAM(2)] (len=1, tl=0)
> @0x0000000016b27b78 09 CHARSXP g1c1 [MARK,gp=0x61] [ASCII] [cached]
> "char4"
>> .Internal(inspect(LargeOrd))
> @0x0000000012b69468 19 VECSXP g1c7 [OBJ,MARK,NAM(2),ATT] (len=42, tl=100)
> @0x000007ffc4fb0010 13 INTSXP g1c7 [MARK,NAM(2)] (len=3103314, tl=0)
> 1,2,3,4,5,...
> @0x000007ffc2c20010 16 STRSXP g1c7 [MARK,NAM(2)] (len=3103314, tl=0)
> @0x0000000012114550 09 CHARSXP g1c3 [MARK,gp=0x60] [ASCII] [cached] "
> http://conradhotels3.hilton.com"
> @0x0000000012114550 09 CHARSXP g1c3 [MARK,gp=0x60] [ASCII] [cached] "
> http://conradhotels3.hilton.com"
> @0x0000000012114550 09 CHARSXP g1c3 [MARK,gp=0x60] [ASCII] [cached] "
> http://conradhotels3.hilton.com"
> @0x0000000012114550 09 CHARSXP g1c3 [MARK,gp=0x60] [ASCII] [cached] "
> http://conradhotels3.hilton.com"
> @0x0000000012114550 09 CHARSXP g1c3 [MARK,gp=0x60] [ASCII] [cached] "
> http://conradhotels3.hilton.com"
> ...
> @0x000007ffc0890010 16 STRSXP g1c7 [MARK,NAM(2)] (len=3103314, tl=0)
> @0x00000000205bf0d8 09 CHARSXP g1c2 [MARK,gp=0x60] [ASCII] [cached]
> "/en/index.html"
> @0x00000000205bf0d8 09 CHARSXP g1c2 [MARK,gp=0x60] [ASCII] [cached]
> "/en/index.html"
> @0x00000000205bf0d8 09 CHARSXP g1c2 [MARK,gp=0x60] [ASCII] [cached]
> "/en/index.html"
> @0x00000000205bf0d8 09 CHARSXP g1c2 [MARK,gp=0x60] [ASCII] [cached]
> "/en/index.html"
> @0x00000000205bf0d8 09 CHARSXP g1c2 [MARK,gp=0x60] [ASCII] [cached]
> "/en/index.html"
> ...
> @0x000007ffbe500010 16 STRSXP g1c7 [MARK,NAM(2)] (len=3103314, tl=0)
> @0x0000000000120f20 09 CHARSXP g1c1 [MARK,gp=0x60] [ASCII] [cached] ""
> @0x0000000000120f20 09 CHARSXP g1c1 [MARK,gp=0x60] [ASCII] [cached] ""
> @0x0000000000120f20 09 CHARSXP g1c1 [MARK,gp=0x60] [ASCII] [cached] ""
> @0x0000000000120f20 09 CHARSXP g1c1 [MARK,gp=0x60] [ASCII] [cached] ""
> @0x0000000000120f20 09 CHARSXP g1c1 [MARK,gp=0x60] [ASCII] [cached] ""
> ...
> @0x000007ffbcd40010 13 INTSXP g1c7 [MARK,NAM(2)] (len=3103314, tl=0)
> 44903,44903,44903,44903,44903,...
> ...
> ATTRIB:
> @0x0000000012cec058 02 LISTSXP g1c0 [MARK]
> TAG: @0x0000000000120088 01 SYMSXP g1c0 [MARK,NAM(2),LCK,gp=0x4000]
> "names" (has value)
> @0x0000000012b60620 16 STRSXP g1c7 [MARK,NAM(2)] (len=42, tl=100)
> @0x0000000010112b98 09 CHARSXP g1c1 [MARK,gp=0x61] [ASCII] [cached]
> "index"
> @0x0000000016b28fd0 09 CHARSXP g1c1 [MARK,gp=0x61] [ASCII] [cached]
> "char1"
> @0x0000000016b291e0 09 CHARSXP g1c1 [MARK,gp=0x61,ATT] [ASCII]
> [cached] "char2"
> @0x0000000016b293c0 09 CHARSXP g1c1 [MARK,gp=0x61] [ASCII] [cached]
> "char3"
> @0x0000000016b29600 09 CHARSXP g1c1 [MARK,gp=0x61] [ASCII] [cached]
> "int1"
> ...
> TAG: @0x0000000000120558 01 SYMSXP g1c0 [MARK,NAM(2),LCK,gp=0x4000]
> "class" (has value)
> @0x0000000013a16be0 16 STRSXP g1c2 [MARK,NAM(2)] (len=2, tl=0)
> @0x000000000b42c760 09 CHARSXP g1c2 [MARK,gp=0x61] [ASCII] [cached]
> "data.table"
> @0x000000000027d230 09 CHARSXP g1c2 [MARK,gp=0x61] [ASCII] [cached]
> "data.frame"
> TAG: @0x0000000000121d98 01 SYMSXP g1c0 [MARK,LCK,gp=0x4000]
> "row.names" (has value)
> @0x0000000012c2f2f0 13 INTSXP g1c1 [MARK,NAM(2)] (len=2, tl=0)
> -2147483648,-3103314
> TAG: @0x000000001497ac10 01 SYMSXP g1c0 [MARK] ".internal.selfref"
> @0x0000000012cec170 22 EXTPTRSXP g1c0 [MARK,NAM(2)]
>>
>>
>> ## A little size tester function
>> ## This will set a key, save the result, print the result's size
>> keytest = function(dt, key){
> + setkeyv(dt, key)
> + save(dt, file='dt_temp.Rdata')
> + tempfilesize = file.info('dt_temp.Rdata')$size
> + tempfilesize = formatC(tempfilesize, big.mark=',', format='f',
> digits=0)
> + cat(key, tempfilesize, '\n\n')
> + unlink('dt_temp.Rdata')
> + invisible(NULL)
> + }
>>
>> str(Small)
> Classes data.table and 'data.frame': 3103314 obs. of 42 variables:
> $ index : int 1 2 3 4 5 6 7 8 9 10 ...
> $ char1 : chr "http://conradhotels3.hilton.com" "
> http://conradhotels3.hilton.com" "http://conradhotels3.hilton.com" "
> http://conradhotels3.hilton.com" ...
> $ char2 : chr "/en/index.html" "/en/index.html" "/en/index.html"
> "/en/index.html" ...
> $ char3 : chr "" "" "" "" ...
> $ int1 : int 44903 44903 44903 44903 44903 44903 44903 44903 44903
> 44903
> ...
> $ int2 : int 411 411 254 254 336 336 118 118 386 386 ...
> $ char4 : chr "2012-05-09 20:17:40.587" "2012-05-09 21:17:54.427"
> "2012-05-09 20:10:49.560" "2012-05-09 21:11:05.107" ...
> $ int3 : int 0 0 0 0 0 0 0 0 0 0 ...
> $ int4 : int 0 0 0 0 0 0 0 0 0 0 ...
> $ int5 : int 69 69 69 69 69 69 69 68 68 68 ...
> $ int6 : int 68 68 68 68 68 68 68 67 67 67 ...
> $ int7 : int 35 35 37 35 35 35 33 38 38 40 ...
> $ int8 : int 0 0 0 0 0 0 0 0 0 0 ...
> $ int9 : int 0 0 0 0 0 0 0 0 0 0 ...
> $ int10 : int 0 0 0 0 0 0 0 0 0 0 ...
> $ int11 : int 1 1 1 1 1 1 1 1 1 1 ...
> $ int12 : int 334830 334847 335102 334838 334836 342687 334521 318626
> 318578 326800 ...
> $ int13 : int 36 36 37 36 36 36 35 38 37 39 ...
> $ int14 : int 44 44 49 47 45 45 45 46 45 48 ...
> $ char5 : chr "" "" "" "" ...
> $ int15 : int NA NA NA NA NA NA NA NA NA NA ...
> $ int16 : int 0 0 0 0 0 0 0 0 0 0 ...
> $ int17 : int 0 0 0 0 0 0 0 0 0 0 ...
> $ int18 : int 2 2 2 2 2 2 2 2 2 2 ...
> $ int19 : int 1381 1152 424 3728 1772 921 385 725 401 314 ...
> $ int20 : int 36 36 37 36 36 36 35 38 37 39 ...
> $ int21 : int 2199 2201 1492 1448 2559 2529 1084 1432 1876 1984 ...
> $ int22 : int 44 44 49 47 45 45 45 46 45 48 ...
> $ int23 : int 2203 2188 1199 1162 2324 2346 821 897 1386 1189 ...
> $ int24 : int 13 13 14 13 13 13 12 13 13 14 ...
> $ int25 : int 5166 5761 3755 3794 5614 7779 2830 3971 4637 5871 ...
> $ int26 : int 103 103 105 103 103 103 101 105 105 107 ...
> $ int27 : int 70 183 159 197 217 165 153 232 92 102 ...
> $ int28 : int 103 103 105 103 103 103 101 105 105 107 ...
> $ int29 : int 0 0 0 0 0 0 0 0 0 0 ...
> $ int30 : int 161 146 200 158 150 160 190 161 163 169 ...
> $ char6 : chr "Limelight" "Limelight" "Fusepoint/Savvis"
> "Fusepoint/Savvis" ...
> $ char7 : chr "Paris" "Paris" "Toronto" "Toronto" ...
> $ char8 : chr "-1" "-1" "-1" "-1" ...
> $ char9 : chr "FRANCE" "FRANCE" "CANADA" "CANADA" ...
> $ char10: chr "FR" "FR" "CA" "CA" ...
> $ char11: chr "FRANCE" "FRANCE" "CANADA" "CANADA" ...
> - attr(*, ".internal.selfref")=<externalptr>
>> keytest(Small, colnames(Small)[1])
> index 77,694,801
>
>> keytest(Small, colnames(Small)[2])
> char1 75,876,250
>
>> keytest(Small, colnames(Small)[3])
> char2 77,218,972
>
>> keytest(Small, colnames(Small)[4])
> char3 80,585,449
>
>> keytest(Small, colnames(Small)[5])
> int1 77,558,982
>
>> keytest(Small, colnames(Small)[6])
> int2 95,185,248
>
>> keytest(Small, colnames(Small)[7])
> char4 204,037,056
>
>> keytest(Small, colnames(Small)[8])
> int3 206,450,705
>
>> keytest(Small, colnames(Small)[9])
> int4 211,520,888
>
>> keytest(Small, colnames(Small)[10])
> int5 156,095,150
>
>>
>>
>> keytest(Small, colnames(Small)[11])
> int6 150,431,716
>
>> keytest(Small, colnames(Small)[12])
> int7 136,077,306
>
>> keytest(Small, colnames(Small)[13])
> int8 134,981,911
>
>> keytest(Small, colnames(Small)[14])
> int9 134,871,952
>
>> keytest(Small, colnames(Small)[15])
> int10 134,678,104
>
>> keytest(Small, colnames(Small)[16])
> int11 134,682,904
>
>> keytest(Small, colnames(Small)[17])
> int12 112,097,493
>
>> keytest(Small, colnames(Small)[18])
> int13 101,734,541
>
>> keytest(Small, colnames(Small)[19])
> int14 101,160,920
>
>>
>
More information about the datatable-help
mailing list