[datatable-help] Memory issue

Gene Leynes gleynes+r at gmail.com
Fri Oct 19 06:11:13 CEST 2012


Ok, here is my very lengthy reply with lots of diagnostics.


>
> ## Clear the workspace
> rm(list=ls())
>
> ## I use a function called "loader" to load single data objects
> if(!require('geneorama')){
+   source('https://raw.github.com/geneorama/geneorama/master/R/loader.R')
+   cat('loading function \"loader\"')
+ }
>
> ## Load the data
> Small = loader('test0')
> Large = loader('test1')
>
> ## The two files will be different because their order is different
> str(Small)
Classes ‘data.table’ and 'data.frame': 3103314 obs. of  42 variables:
 $ index : int  1 2 3 4 5 6 7 8 9 10 ...
 $ char1 : chr  "http://conradhotels3.hilton.com" "
http://conradhotels3.hilton.com" "http://conradhotels3.hilton.com" "
http://conradhotels3.hilton.com" ...
 $ char2 : chr  "/en/index.html" "/en/index.html" "/en/index.html"
"/en/index.html" ...
 $ char3 : chr  "" "" "" "" ...
 $ int1  : int  44903 44903 44903 44903 44903 44903 44903 44903 44903 44903
...
 $ int2  : int  411 411 254 254 336 336 118 118 386 386 ...
 $ char4 : chr  "2012-05-09 20:17:40.587" "2012-05-09 21:17:54.427"
"2012-05-09 20:10:49.560" "2012-05-09 21:11:05.107" ...
 $ int3  : int  0 0 0 0 0 0 0 0 0 0 ...
 $ int4  : int  0 0 0 0 0 0 0 0 0 0 ...
 $ int5  : int  69 69 69 69 69 69 69 68 68 68 ...
 $ int6  : int  68 68 68 68 68 68 68 67 67 67 ...
 $ int7  : int  35 35 37 35 35 35 33 38 38 40 ...
 $ int8  : int  0 0 0 0 0 0 0 0 0 0 ...
 $ int9  : int  0 0 0 0 0 0 0 0 0 0 ...
 $ int10 : int  0 0 0 0 0 0 0 0 0 0 ...
 $ int11 : int  1 1 1 1 1 1 1 1 1 1 ...
 $ int12 : int  334830 334847 335102 334838 334836 342687 334521 318626
318578 326800 ...
 $ int13 : int  36 36 37 36 36 36 35 38 37 39 ...
 $ int14 : int  44 44 49 47 45 45 45 46 45 48 ...
 $ char5 : chr  "" "" "" "" ...
 $ int15 : int  NA NA NA NA NA NA NA NA NA NA ...
 $ int16 : int  0 0 0 0 0 0 0 0 0 0 ...
 $ int17 : int  0 0 0 0 0 0 0 0 0 0 ...
 $ int18 : int  2 2 2 2 2 2 2 2 2 2 ...
 $ int19 : int  1381 1152 424 3728 1772 921 385 725 401 314 ...
 $ int20 : int  36 36 37 36 36 36 35 38 37 39 ...
 $ int21 : int  2199 2201 1492 1448 2559 2529 1084 1432 1876 1984 ...
 $ int22 : int  44 44 49 47 45 45 45 46 45 48 ...
 $ int23 : int  2203 2188 1199 1162 2324 2346 821 897 1386 1189 ...
 $ int24 : int  13 13 14 13 13 13 12 13 13 14 ...
 $ int25 : int  5166 5761 3755 3794 5614 7779 2830 3971 4637 5871 ...
 $ int26 : int  103 103 105 103 103 103 101 105 105 107 ...
 $ int27 : int  70 183 159 197 217 165 153 232 92 102 ...
 $ int28 : int  103 103 105 103 103 103 101 105 105 107 ...
 $ int29 : int  0 0 0 0 0 0 0 0 0 0 ...
 $ int30 : int  161 146 200 158 150 160 190 161 163 169 ...
 $ char6 : chr  "Limelight" "Limelight" "Fusepoint/Savvis"
"Fusepoint/Savvis" ...
 $ char7 : chr  "Paris" "Paris" "Toronto" "Toronto" ...
 $ char8 : chr  "-1" "-1" "-1" "-1" ...
 $ char9 : chr  "FRANCE" "FRANCE" "CANADA" "CANADA" ...
 $ char10: chr  "FR" "FR" "CA" "CA" ...
 $ char11: chr  "FRANCE" "FRANCE" "CANADA" "CANADA" ...
 - attr(*, ".internal.selfref")=<externalptr>
> str(Large)
Classes ‘data.table’ and 'data.frame': 3103314 obs. of  42 variables:
 $ index : int  716234 716235 1007651 2679944 1550732 1932010 2879445
1007670 1736006 666363 ...
 $ char1 : chr  "http://go.compuware.com" "http://go.compuware.com" "
http://www.achmeacollectief.nl" "https://db3.notify.windows.com" ...
 $ char2 : chr  "/default.aspx" "/dynaTraceMonitor" "/unilever/" "/ping" ...
 $ char3 : chr  "?rurl=
http://frontline.compuware.com//products/BU/default.aspx" "?url=http%3A%2F%
2Fgo.compuware.com%2Fdefault.aspx%3Frurl%3Dhttp%3A%2F%
2Ffrontline.compuware.com%2F%2Fproducts%2FBU%2Fdefault.as"| __truncated__
"" "" ...
 $ int1  : int  2812881 2812881 3149757 4286896 3618836 3861870 4315803
3149760 3779387 2754629 ...
 $ int2  : int  133 133 133 133 340 340 326 133 133 340 ...
 $ char4 : chr  "2012-05-09 20:00:00.000" "2012-05-09 20:00:00.000"
"2012-05-09 20:00:00.000" "2012-05-09 20:00:00.000" ...
 $ int3  : int  0 1 0 0 0 0 0 0 0 0 ...
 $ int4  : int  2264 2496 1782 461 1953 1418 641 1207 167 278 ...
 $ int5  : int  26 20 6 1 71 64 1 6 1 15 ...
 $ int6  : int  26 20 6 1 69 64 1 6 1 15 ...
 $ int7  : int  2 2 4 0 2 12 0 2 0 0 ...
 $ int8  : int  0 0 0 0 2 0 0 0 0 0 ...
 $ int9  : int  0 0 0 0 0 0 0 0 0 0 ...
 $ int10 : int  0 0 0 0 0 0 0 0 0 0 ...
 $ int11 : int  0 0 0 0 0 0 0 0 0 0 ...
 $ int12 : int  392752 417195 43107 0 1419015 1031349 187344 62969 43
428189 ...
 $ int13 : int  4 4 5 1 8 22 1 3 1 1 ...
 $ int14 : int  9 11 8 1 17 38 1 6 1 15 ...
 $ char5 : chr  "" "" "" "" ...
 $ int15 : int  NA NA NA NA 0 NA NA NA NA 0 ...
 $ int16 : int  0 0 0 0 0 0 0 0 0 0 ...
 $ int17 : int  0 0 0 0 0 0 0 0 0 0 ...
 $ int18 : int  2 28 3 0 0 1 0 1 0 0 ...
 $ int19 : int  137 0 136 298 277 255 147 141 137 209 ...
 $ int20 : int  4 0 5 1 8 22 1 3 1 1 ...
 $ int21 : int  945 612 59 22 689 1153 54 29 13 59 ...
 $ int22 : int  9 5 8 1 17 38 1 6 1 15 ...
 $ int23 : int  0 0 0 118 0 0 0 0 0 0 ...
 $ int24 : int  0 0 0 1 0 0 0 0 0 0 ...
 $ int25 : int  3243 2653 1585 22 3292 3076 64 1043 13 81 ...
 $ int26 : int  28 22 10 1 73 76 1 8 1 15 ...
 $ int27 : int  2060 3365 257 1 3304 1038 376 258 4 80 ...
 $ int28 : int  28 22 10 1 73 76 1 8 1 15 ...
 $ int29 : int  0 0 0 0 0 0 0 0 0 0 ...
 $ int30 : int  921 750 203 578 609 1078 234 187 31 140 ...
 $ char6 : chr  "Interoute" "Interoute" "Interoute" "Interoute" ...
 $ char7 : chr  "Amsterdam" "Amsterdam" "Amsterdam" "Amsterdam" ...
 $ char8 : chr  "-1" "-1" "-1" "-1" ...
 $ char9 : chr  "NETHERLANDS" "NETHERLANDS" "NETHERLANDS" "NETHERLANDS" ...
 $ char10: chr  "NL" "NL" "NL" "NL" ...
 $ char11: chr  "NETHERLANDS" "NETHERLANDS" "NETHERLANDS" "NETHERLANDS" ...
 - attr(*, ".internal.selfref")=<externalptr>
 - attr(*, "sorted")= chr "char4"
>
> ## The difference is shown here
> mapply(identical, Small, Large)
 index  char1  char2  char3   int1   int2  char4   int3   int4   int5
int6   int7   int8   int9
 FALSE  FALSE  FALSE  FALSE  FALSE  FALSE  FALSE  FALSE  FALSE  FALSE
 FALSE  FALSE  FALSE  FALSE
 int10  int11  int12  int13  int14  char5  int15  int16  int17  int18
 int19  int20  int21  int22
 FALSE  FALSE  FALSE  FALSE  FALSE  FALSE  FALSE  FALSE  FALSE  FALSE
 FALSE  FALSE  FALSE  FALSE
 int23  int24  int25  int26  int27  int28  int29  int30  char6  char7
 char8  char9 char10 char11
 FALSE  FALSE  FALSE  FALSE  FALSE  FALSE  FALSE  FALSE  FALSE  FALSE
 FALSE  FALSE  FALSE  FALSE
> mapply(all.equal, Small, Large)
                                                         index
                         "Mean relative difference: 0.6660698"
                                                         char1
                                   "3100674 string mismatches"
                                                         char2
                                   "2961621 string mismatches"
                                                         char3
                                   "1753352 string mismatches"
                                                          int1
                         "Mean relative difference: 0.2945024"
                                                          int2
                         "Mean relative difference: 0.4866954"
                                                         char4
                                   "3103308 string mismatches"
                                                          int3
                          "Mean relative difference: 1.759713"
                                                          int4
                          "Mean relative difference: 1.408616"
                                                          int5
                          "Mean relative difference: 1.411817"
                                                          int6
                          "Mean relative difference: 1.415648"
                                                          int7
                          "Mean relative difference: 1.705137"
                                                          int8
                          "Mean relative difference: 1.954795"
                                                          int9
                           "Mean relative difference: 1.99701"
                                                         int10
                          "Mean relative difference: 1.995529"
                                                         int11
                                 "Mean relative difference: 2"
                                                         int12
                          "Mean relative difference: 1.479043"
                                                         int13
                          "Mean relative difference: 1.323619"
                                                         int14
                          "Mean relative difference: 1.360022"
                                                         char5
                                   "1454309 string mismatches"
                                                         int15
"'is.NA' value mismatch: 2260789 in current 2260789 in target"
                                                         int16
                          "Mean relative difference: 1.997195"
                                                         int17
                                 "Mean relative difference: 2"
                                                         int18
                          "Mean relative difference: 1.799441"
                                                         int19
                          "Mean relative difference: 1.571321"
                                                         int20
                          "Mean relative difference: 1.474492"
                                                         int21
                          "Mean relative difference: 1.669488"
                                                         int22
                          "Mean relative difference: 1.465307"
                                                         int23
                          "Mean relative difference: 1.842191"
                                                         int24
                           "Mean relative difference: 1.76578"
                                                         int25
                          "Mean relative difference: 1.481612"
                                                         int26
                          "Mean relative difference: 1.403655"
                                                         int27
                          "Mean relative difference: 1.722723"
                                                         int28
                          "Mean relative difference: 1.403655"
                                                         int29
                                 "Mean relative difference: 2"
                                                         int30
                          "Mean relative difference: 1.535987"
                                                         char6
                                   "2899128 string mismatches"
                                                         char7
                                   "3008489 string mismatches"
                                                         char8
                                   "2503189 string mismatches"
                                                         char9
                                   "2957002 string mismatches"
                                                        char10
                                   "1933196 string mismatches"
                                                        char11
                                   "1933196 string mismatches"
>
> ## I re-ran the steps to create the files (almost the same the last
email),
> ## but added an "index" equal to 1:nrow(datMod)
> ## This index is used to reorder the files to be consistent
> LargeOrd = Large[order(Large$index), ]
> str(LargeOrd)
Classes ‘data.table’ and 'data.frame': 3103314 obs. of  42 variables:
 $ index : int  1 2 3 4 5 6 7 8 9 10 ...
 $ char1 : chr  "http://conradhotels3.hilton.com" "
http://conradhotels3.hilton.com" "http://conradhotels3.hilton.com" "
http://conradhotels3.hilton.com" ...
 $ char2 : chr  "/en/index.html" "/en/index.html" "/en/index.html"
"/en/index.html" ...
 $ char3 : chr  "" "" "" "" ...
 $ int1  : int  44903 44903 44903 44903 44903 44903 44903 44903 44903 44903
...
 $ int2  : int  411 411 254 254 336 336 118 118 386 386 ...
 $ char4 : chr  "2012-05-09 20:17:40.587" "2012-05-09 21:17:54.427"
"2012-05-09 20:10:49.560" "2012-05-09 21:11:05.107" ...
 $ int3  : int  0 0 0 0 0 0 0 0 0 0 ...
 $ int4  : int  0 0 0 0 0 0 0 0 0 0 ...
 $ int5  : int  69 69 69 69 69 69 69 68 68 68 ...
 $ int6  : int  68 68 68 68 68 68 68 67 67 67 ...
 $ int7  : int  35 35 37 35 35 35 33 38 38 40 ...
 $ int8  : int  0 0 0 0 0 0 0 0 0 0 ...
 $ int9  : int  0 0 0 0 0 0 0 0 0 0 ...
 $ int10 : int  0 0 0 0 0 0 0 0 0 0 ...
 $ int11 : int  1 1 1 1 1 1 1 1 1 1 ...
 $ int12 : int  334830 334847 335102 334838 334836 342687 334521 318626
318578 326800 ...
 $ int13 : int  36 36 37 36 36 36 35 38 37 39 ...
 $ int14 : int  44 44 49 47 45 45 45 46 45 48 ...
 $ char5 : chr  "" "" "" "" ...
 $ int15 : int  NA NA NA NA NA NA NA NA NA NA ...
 $ int16 : int  0 0 0 0 0 0 0 0 0 0 ...
 $ int17 : int  0 0 0 0 0 0 0 0 0 0 ...
 $ int18 : int  2 2 2 2 2 2 2 2 2 2 ...
 $ int19 : int  1381 1152 424 3728 1772 921 385 725 401 314 ...
 $ int20 : int  36 36 37 36 36 36 35 38 37 39 ...
 $ int21 : int  2199 2201 1492 1448 2559 2529 1084 1432 1876 1984 ...
 $ int22 : int  44 44 49 47 45 45 45 46 45 48 ...
 $ int23 : int  2203 2188 1199 1162 2324 2346 821 897 1386 1189 ...
 $ int24 : int  13 13 14 13 13 13 12 13 13 14 ...
 $ int25 : int  5166 5761 3755 3794 5614 7779 2830 3971 4637 5871 ...
 $ int26 : int  103 103 105 103 103 103 101 105 105 107 ...
 $ int27 : int  70 183 159 197 217 165 153 232 92 102 ...
 $ int28 : int  103 103 105 103 103 103 101 105 105 107 ...
 $ int29 : int  0 0 0 0 0 0 0 0 0 0 ...
 $ int30 : int  161 146 200 158 150 160 190 161 163 169 ...
 $ char6 : chr  "Limelight" "Limelight" "Fusepoint/Savvis"
"Fusepoint/Savvis" ...
 $ char7 : chr  "Paris" "Paris" "Toronto" "Toronto" ...
 $ char8 : chr  "-1" "-1" "-1" "-1" ...
 $ char9 : chr  "FRANCE" "FRANCE" "CANADA" "CANADA" ...
 $ char10: chr  "FR" "FR" "CA" "CA" ...
 $ char11: chr  "FRANCE" "FRANCE" "CANADA" "CANADA" ...
 - attr(*, ".internal.selfref")=<externalptr>
>
> ## Here the ordered  files come out the be equivalent
> mapply(identical, Small, LargeOrd)
 index  char1  char2  char3   int1   int2  char4   int3   int4   int5
int6   int7   int8   int9
  TRUE   TRUE   TRUE   TRUE   TRUE   TRUE   TRUE   TRUE   TRUE   TRUE
TRUE   TRUE   TRUE   TRUE
 int10  int11  int12  int13  int14  char5  int15  int16  int17  int18
 int19  int20  int21  int22
  TRUE   TRUE   TRUE   TRUE   TRUE   TRUE   TRUE   TRUE   TRUE   TRUE
TRUE   TRUE   TRUE   TRUE
 int23  int24  int25  int26  int27  int28  int29  int30  char6  char7
 char8  char9 char10 char11
  TRUE   TRUE   TRUE   TRUE   TRUE   TRUE   TRUE   TRUE   TRUE   TRUE
TRUE   TRUE   TRUE   TRUE
> mapply(all.equal, Small, LargeOrd)
 index  char1  char2  char3   int1   int2  char4   int3   int4   int5
int6   int7   int8   int9
  TRUE   TRUE   TRUE   TRUE   TRUE   TRUE   TRUE   TRUE   TRUE   TRUE
TRUE   TRUE   TRUE   TRUE
 int10  int11  int12  int13  int14  char5  int15  int16  int17  int18
 int19  int20  int21  int22
  TRUE   TRUE   TRUE   TRUE   TRUE   TRUE   TRUE   TRUE   TRUE   TRUE
TRUE   TRUE   TRUE   TRUE
 int23  int24  int25  int26  int27  int28  int29  int30  char6  char7
 char8  char9 char10 char11
  TRUE   TRUE   TRUE   TRUE   TRUE   TRUE   TRUE   TRUE   TRUE   TRUE
TRUE   TRUE   TRUE   TRUE
>
> ## The inspection results
> .Internal(inspect(Small))
@0x00000000128068e8 19 VECSXP g1c7 [OBJ,MARK,NAM(2),ATT] (len=42, tl=0)
  @0x000007ff8a3e0010 13 INTSXP g1c7 [MARK,NAM(2)] (len=3103314, tl=0)
1,2,3,4,5,...
  @0x000007ff4fb30010 16 STRSXP g1c7 [MARK,NAM(2)] (len=3103314, tl=0)
    @0x0000000012114550 09 CHARSXP g1c3 [MARK,gp=0x60] [ASCII] [cached] "
http://conradhotels3.hilton.com"
    @0x0000000012114550 09 CHARSXP g1c3 [MARK,gp=0x60] [ASCII] [cached] "
http://conradhotels3.hilton.com"
    @0x0000000012114550 09 CHARSXP g1c3 [MARK,gp=0x60] [ASCII] [cached] "
http://conradhotels3.hilton.com"
    @0x0000000012114550 09 CHARSXP g1c3 [MARK,gp=0x60] [ASCII] [cached] "
http://conradhotels3.hilton.com"
    @0x0000000012114550 09 CHARSXP g1c3 [MARK,gp=0x60] [ASCII] [cached] "
http://conradhotels3.hilton.com"
    ...
  @0x000007ff4e380010 16 STRSXP g1c7 [MARK,NAM(2)] (len=3103314, tl=0)
    @0x00000000205bf0d8 09 CHARSXP g1c2 [MARK,gp=0x60] [ASCII] [cached]
"/en/index.html"
    @0x00000000205bf0d8 09 CHARSXP g1c2 [MARK,gp=0x60] [ASCII] [cached]
"/en/index.html"
    @0x00000000205bf0d8 09 CHARSXP g1c2 [MARK,gp=0x60] [ASCII] [cached]
"/en/index.html"
    @0x00000000205bf0d8 09 CHARSXP g1c2 [MARK,gp=0x60] [ASCII] [cached]
"/en/index.html"
    @0x00000000205bf0d8 09 CHARSXP g1c2 [MARK,gp=0x60] [ASCII] [cached]
"/en/index.html"
    ...
  @0x000007ff4cbd0010 16 STRSXP g1c7 [MARK,NAM(2)] (len=3103314, tl=0)
    @0x0000000000120f20 09 CHARSXP g1c1 [MARK,gp=0x60] [ASCII] [cached] ""
    @0x0000000000120f20 09 CHARSXP g1c1 [MARK,gp=0x60] [ASCII] [cached] ""
    @0x0000000000120f20 09 CHARSXP g1c1 [MARK,gp=0x60] [ASCII] [cached] ""
    @0x0000000000120f20 09 CHARSXP g1c1 [MARK,gp=0x60] [ASCII] [cached] ""
    @0x0000000000120f20 09 CHARSXP g1c1 [MARK,gp=0x60] [ASCII] [cached] ""
    ...
  @0x000007ff88c20010 13 INTSXP g1c7 [MARK,NAM(2)] (len=3103314, tl=0)
44903,44903,44903,44903,44903,...
  ...
ATTRIB:
  @0x0000000012cab8e0 02 LISTSXP g1c0 [MARK]
    TAG: @0x0000000000120088 01 SYMSXP g1c0 [MARK,NAM(2),LCK,gp=0x4000]
"names" (has value)
    @0x0000000016868d68 16 STRSXP g1c7 [MARK,NAM(2)] (len=42, tl=0)
      @0x0000000010112b98 09 CHARSXP g1c1 [MARK,gp=0x61] [ASCII] [cached]
"index"
      @0x0000000016b28fd0 09 CHARSXP g1c1 [MARK,gp=0x61] [ASCII] [cached]
"char1"
      @0x0000000016b291e0 09 CHARSXP g1c1 [MARK,gp=0x61,ATT] [ASCII]
[cached] "char2"
      @0x0000000016b293c0 09 CHARSXP g1c1 [MARK,gp=0x61] [ASCII] [cached]
"char3"
      @0x0000000016b29600 09 CHARSXP g1c1 [MARK,gp=0x61] [ASCII] [cached]
"int1"
      ...
    TAG: @0x0000000000120558 01 SYMSXP g1c0 [MARK,NAM(2),LCK,gp=0x4000]
"class" (has value)
    @0x00000000138b5318 16 STRSXP g1c2 [MARK,NAM(2)] (len=2, tl=0)
      @0x000000000b42c760 09 CHARSXP g1c2 [MARK,gp=0x61] [ASCII] [cached]
"data.table"
      @0x000000000027d230 09 CHARSXP g1c2 [MARK,gp=0x61] [ASCII] [cached]
"data.frame"
    TAG: @0x0000000000121d98 01 SYMSXP g1c0 [MARK,LCK,gp=0x4000]
"row.names" (has value)
    @0x0000000012c38050 13 INTSXP g1c1 [MARK,NAM(2)] (len=2, tl=0)
-2147483648,-3103314
    TAG: @0x000000001497ac10 01 SYMSXP g1c0 [MARK] ".internal.selfref"
    @0x0000000012caaa60 22 EXTPTRSXP g1c0 [MARK,NAM(2)]
> .Internal(inspect(Large))
@0x0000000012c24c68 19 VECSXP g1c7 [OBJ,MARK,NAM(2),ATT] (len=42, tl=0)
  @0x000007ff314d0010 13 INTSXP g1c7 [MARK,NAM(2)] (len=3103314, tl=0)
716234,716235,1007651,2679944,1550732,...
  @0x000007ff2fd20010 16 STRSXP g1c7 [MARK,NAM(2)] (len=3103314, tl=0)
    @0x000000001253d8e0 09 CHARSXP g1c3 [MARK,gp=0x60,ATT] [ASCII] [cached]
"http://go.compuware.com"
    @0x000000001253d8e0 09 CHARSXP g1c3 [MARK,gp=0x60,ATT] [ASCII] [cached]
"http://go.compuware.com"
    @0x000000001e6d7ab0 09 CHARSXP g1c3 [MARK,gp=0x60] [ASCII] [cached] "
http://www.achmeacollectief.nl"
    @0x000000001e4a59b8 09 CHARSXP g1c3 [MARK,gp=0x60] [ASCII] [cached] "
https://db3.notify.windows.com"
    @0x000000001e63ee70 09 CHARSXP g1c3 [MARK,gp=0x60] [ASCII] [cached] "
http://www.christushealth.org"
    ...
  @0x000007ff2e570010 16 STRSXP g1c7 [MARK,NAM(2)] (len=3103314, tl=0)
    @0x00000000200aa218 09 CHARSXP g1c2 [MARK,gp=0x60,ATT] [ASCII] [cached]
"/default.aspx"
    @0x000000001e444d78 09 CHARSXP g1c3 [MARK,gp=0x60,ATT] [ASCII] [cached]
"/dynaTraceMonitor"
    @0x000000001eb64790 09 CHARSXP g1c2 [MARK,gp=0x60] [ASCII] [cached]
"/unilever/"
    @0x000000000feb4e98 09 CHARSXP g1c1 [MARK,gp=0x60] [ASCII] [cached]
"/ping"
    @0x0000000000124950 09 CHARSXP g1c1 [MARK,gp=0x61] [ASCII] [cached] "/"
    ...
  @0x000007ff2cdc0010 16 STRSXP g1c7 [MARK,NAM(2)] (len=3103314, tl=0)
    @0x0000000017a39430 09 CHARSXP g1c5 [MARK,gp=0x60] [ASCII] [cached]
"?rurl=http://frontline.compuware.com//products/BU/default.aspx"
    @0x000000001b721a50 09 CHARSXP g1c7 [MARK,gp=0x60] [ASCII] [cached]
"?url=http%3A%2F%2Fgo.compuware.com%2Fdefault.aspx%3Frurl%3Dhttp%3A%2F%
2Ffrontline.compuware.com
%2F%2Fproducts%2FBU%2Fdefault.aspx$title=$frames=0$pId=G_1336593601673$fId=G_1336593601673$pFId=$rId=RID_73295254$rpId=1059475658$actions=1%7C_load_%7C-%7C_load_%7C1336593601673%7C1336593602736%7C375%2C2%7C_onload_%7C-%7C_load_%7C1336593602626%7C1336593602704%7C375$domR=1336593602642$dtV=410$3p=
www.google-analytics.com
%7C0%7C0%7C0%7C%7C0%7C0%7C0%7C1%7C828_859%7C31%7C31%7C31%7C0%7C%7C0%7C0%7C0%2Cs%7C828%7C859%7C_load_%7Chttp%253A%252F%
252Fwww.google-analytics.com%252Fga.js%3B2264ff.r.axf8.net
%7C0%7C0%7C0%7C%7C0%7C0%7C0%7C1%7C953_1078%7C125%7C125%7C125%7C0%7C%7C0%7C0%7C0%2Cs%7C953%7C1078%7C_load_%7Chttp%253A%252F%
252F2264FF.r.axf8.net
%252Fmr%252Fe.gif%253Finfo%253D%25257Bn%25253Ac%25257Cc%25253A38695455749817%25257Cd%25253A1%25257Ca%25253A2264FF%25257Ch%25253A1%25257Ce%25253A%25257Cb%25253A%25257Cl%25253Ahttp%252524%252A%252524%25252F%
25252Fgo.compuware.com
%25252Fdefault.aspx%25257Cm%25253A1024%25257Co%25253A768%25257Cp%25253AWin32%25257Cq%25253Ax86%25257Ck%25253Alan%25257Cg%25253AMSIE%25257Cf%25253A8.0%25257D%25257Bn%25253Au%25257Ce%25253A1%25257D%2526a%253D2264FF%2526r%253D1%2526s%253D1$time=1336593603689$"
    @0x0000000000120f20 09 CHARSXP g1c1 [MARK,gp=0x60] [ASCII] [cached] ""
    @0x0000000000120f20 09 CHARSXP g1c1 [MARK,gp=0x60] [ASCII] [cached] ""
    @0x0000000000120f20 09 CHARSXP g1c1 [MARK,gp=0x60] [ASCII] [cached] ""
    ...
  @0x000007ff2c1e0010 13 INTSXP g1c7 [MARK,NAM(2)] (len=3103314, tl=0)
2812881,2812881,3149757,4286896,3618836,...
  ...
ATTRIB:
  @0x000000001f163298 02 LISTSXP g1c0 [MARK]
    TAG: @0x0000000000120088 01 SYMSXP g1c0 [MARK,NAM(2),LCK,gp=0x4000]
"names" (has value)
    @0x000000001283e0e0 16 STRSXP g1c7 [MARK,NAM(2)] (len=42, tl=0)
      @0x0000000010112b98 09 CHARSXP g1c1 [MARK,gp=0x61] [ASCII] [cached]
"index"
      @0x0000000016b28fd0 09 CHARSXP g1c1 [MARK,gp=0x61] [ASCII] [cached]
"char1"
      @0x0000000016b291e0 09 CHARSXP g1c1 [MARK,gp=0x61,ATT] [ASCII]
[cached] "char2"
      @0x0000000016b293c0 09 CHARSXP g1c1 [MARK,gp=0x61] [ASCII] [cached]
"char3"
      @0x0000000016b29600 09 CHARSXP g1c1 [MARK,gp=0x61] [ASCII] [cached]
"int1"
      ...
    TAG: @0x0000000000120558 01 SYMSXP g1c0 [MARK,NAM(2),LCK,gp=0x4000]
"class" (has value)
    @0x000000001368f078 16 STRSXP g1c2 [MARK,NAM(2)] (len=2, tl=0)
      @0x000000000b42c760 09 CHARSXP g1c2 [MARK,gp=0x61] [ASCII] [cached]
"data.table"
      @0x000000000027d230 09 CHARSXP g1c2 [MARK,gp=0x61] [ASCII] [cached]
"data.frame"
    TAG: @0x0000000000121d98 01 SYMSXP g1c0 [MARK,LCK,gp=0x4000]
"row.names" (has value)
    @0x000000000fb9a988 13 INTSXP g1c1 [MARK,NAM(2)] (len=2, tl=0)
-2147483648,-3103314
    TAG: @0x000000001497ac10 01 SYMSXP g1c0 [MARK] ".internal.selfref"
    @0x000000001f163110 22 EXTPTRSXP g1c0 [MARK,NAM(2)]
    TAG: @0x0000000016c8d648 01 SYMSXP g1c0 [MARK] "sorted"
    @0x000000001ece5f88 16 STRSXP g1c1 [MARK,NAM(2)] (len=1, tl=0)
      @0x0000000016b27b78 09 CHARSXP g1c1 [MARK,gp=0x61] [ASCII] [cached]
"char4"
> .Internal(inspect(LargeOrd))
@0x0000000012b69468 19 VECSXP g1c7 [OBJ,MARK,NAM(2),ATT] (len=42, tl=100)
  @0x000007ffc4fb0010 13 INTSXP g1c7 [MARK,NAM(2)] (len=3103314, tl=0)
1,2,3,4,5,...
  @0x000007ffc2c20010 16 STRSXP g1c7 [MARK,NAM(2)] (len=3103314, tl=0)
    @0x0000000012114550 09 CHARSXP g1c3 [MARK,gp=0x60] [ASCII] [cached] "
http://conradhotels3.hilton.com"
    @0x0000000012114550 09 CHARSXP g1c3 [MARK,gp=0x60] [ASCII] [cached] "
http://conradhotels3.hilton.com"
    @0x0000000012114550 09 CHARSXP g1c3 [MARK,gp=0x60] [ASCII] [cached] "
http://conradhotels3.hilton.com"
    @0x0000000012114550 09 CHARSXP g1c3 [MARK,gp=0x60] [ASCII] [cached] "
http://conradhotels3.hilton.com"
    @0x0000000012114550 09 CHARSXP g1c3 [MARK,gp=0x60] [ASCII] [cached] "
http://conradhotels3.hilton.com"
    ...
  @0x000007ffc0890010 16 STRSXP g1c7 [MARK,NAM(2)] (len=3103314, tl=0)
    @0x00000000205bf0d8 09 CHARSXP g1c2 [MARK,gp=0x60] [ASCII] [cached]
"/en/index.html"
    @0x00000000205bf0d8 09 CHARSXP g1c2 [MARK,gp=0x60] [ASCII] [cached]
"/en/index.html"
    @0x00000000205bf0d8 09 CHARSXP g1c2 [MARK,gp=0x60] [ASCII] [cached]
"/en/index.html"
    @0x00000000205bf0d8 09 CHARSXP g1c2 [MARK,gp=0x60] [ASCII] [cached]
"/en/index.html"
    @0x00000000205bf0d8 09 CHARSXP g1c2 [MARK,gp=0x60] [ASCII] [cached]
"/en/index.html"
    ...
  @0x000007ffbe500010 16 STRSXP g1c7 [MARK,NAM(2)] (len=3103314, tl=0)
    @0x0000000000120f20 09 CHARSXP g1c1 [MARK,gp=0x60] [ASCII] [cached] ""
    @0x0000000000120f20 09 CHARSXP g1c1 [MARK,gp=0x60] [ASCII] [cached] ""
    @0x0000000000120f20 09 CHARSXP g1c1 [MARK,gp=0x60] [ASCII] [cached] ""
    @0x0000000000120f20 09 CHARSXP g1c1 [MARK,gp=0x60] [ASCII] [cached] ""
    @0x0000000000120f20 09 CHARSXP g1c1 [MARK,gp=0x60] [ASCII] [cached] ""
    ...
  @0x000007ffbcd40010 13 INTSXP g1c7 [MARK,NAM(2)] (len=3103314, tl=0)
44903,44903,44903,44903,44903,...
  ...
ATTRIB:
  @0x0000000012cec058 02 LISTSXP g1c0 [MARK]
    TAG: @0x0000000000120088 01 SYMSXP g1c0 [MARK,NAM(2),LCK,gp=0x4000]
"names" (has value)
    @0x0000000012b60620 16 STRSXP g1c7 [MARK,NAM(2)] (len=42, tl=100)
      @0x0000000010112b98 09 CHARSXP g1c1 [MARK,gp=0x61] [ASCII] [cached]
"index"
      @0x0000000016b28fd0 09 CHARSXP g1c1 [MARK,gp=0x61] [ASCII] [cached]
"char1"
      @0x0000000016b291e0 09 CHARSXP g1c1 [MARK,gp=0x61,ATT] [ASCII]
[cached] "char2"
      @0x0000000016b293c0 09 CHARSXP g1c1 [MARK,gp=0x61] [ASCII] [cached]
"char3"
      @0x0000000016b29600 09 CHARSXP g1c1 [MARK,gp=0x61] [ASCII] [cached]
"int1"
      ...
    TAG: @0x0000000000120558 01 SYMSXP g1c0 [MARK,NAM(2),LCK,gp=0x4000]
"class" (has value)
    @0x0000000013a16be0 16 STRSXP g1c2 [MARK,NAM(2)] (len=2, tl=0)
      @0x000000000b42c760 09 CHARSXP g1c2 [MARK,gp=0x61] [ASCII] [cached]
"data.table"
      @0x000000000027d230 09 CHARSXP g1c2 [MARK,gp=0x61] [ASCII] [cached]
"data.frame"
    TAG: @0x0000000000121d98 01 SYMSXP g1c0 [MARK,LCK,gp=0x4000]
"row.names" (has value)
    @0x0000000012c2f2f0 13 INTSXP g1c1 [MARK,NAM(2)] (len=2, tl=0)
-2147483648,-3103314
    TAG: @0x000000001497ac10 01 SYMSXP g1c0 [MARK] ".internal.selfref"
    @0x0000000012cec170 22 EXTPTRSXP g1c0 [MARK,NAM(2)]
>
>
> ## A little size tester function
> ## This will set a key, save the result, print the result's size
> keytest = function(dt, key){
+   setkeyv(dt, key)
+   save(dt, file='dt_temp.Rdata')
+   tempfilesize = file.info('dt_temp.Rdata')$size
+   tempfilesize = formatC(tempfilesize, big.mark=',', format='f',
digits=0)
+   cat(key, tempfilesize, '\n\n')
+   unlink('dt_temp.Rdata')
+   invisible(NULL)
+ }
>
> str(Small)
Classes ‘data.table’ and 'data.frame': 3103314 obs. of  42 variables:
 $ index : int  1 2 3 4 5 6 7 8 9 10 ...
 $ char1 : chr  "http://conradhotels3.hilton.com" "
http://conradhotels3.hilton.com" "http://conradhotels3.hilton.com" "
http://conradhotels3.hilton.com" ...
 $ char2 : chr  "/en/index.html" "/en/index.html" "/en/index.html"
"/en/index.html" ...
 $ char3 : chr  "" "" "" "" ...
 $ int1  : int  44903 44903 44903 44903 44903 44903 44903 44903 44903 44903
...
 $ int2  : int  411 411 254 254 336 336 118 118 386 386 ...
 $ char4 : chr  "2012-05-09 20:17:40.587" "2012-05-09 21:17:54.427"
"2012-05-09 20:10:49.560" "2012-05-09 21:11:05.107" ...
 $ int3  : int  0 0 0 0 0 0 0 0 0 0 ...
 $ int4  : int  0 0 0 0 0 0 0 0 0 0 ...
 $ int5  : int  69 69 69 69 69 69 69 68 68 68 ...
 $ int6  : int  68 68 68 68 68 68 68 67 67 67 ...
 $ int7  : int  35 35 37 35 35 35 33 38 38 40 ...
 $ int8  : int  0 0 0 0 0 0 0 0 0 0 ...
 $ int9  : int  0 0 0 0 0 0 0 0 0 0 ...
 $ int10 : int  0 0 0 0 0 0 0 0 0 0 ...
 $ int11 : int  1 1 1 1 1 1 1 1 1 1 ...
 $ int12 : int  334830 334847 335102 334838 334836 342687 334521 318626
318578 326800 ...
 $ int13 : int  36 36 37 36 36 36 35 38 37 39 ...
 $ int14 : int  44 44 49 47 45 45 45 46 45 48 ...
 $ char5 : chr  "" "" "" "" ...
 $ int15 : int  NA NA NA NA NA NA NA NA NA NA ...
 $ int16 : int  0 0 0 0 0 0 0 0 0 0 ...
 $ int17 : int  0 0 0 0 0 0 0 0 0 0 ...
 $ int18 : int  2 2 2 2 2 2 2 2 2 2 ...
 $ int19 : int  1381 1152 424 3728 1772 921 385 725 401 314 ...
 $ int20 : int  36 36 37 36 36 36 35 38 37 39 ...
 $ int21 : int  2199 2201 1492 1448 2559 2529 1084 1432 1876 1984 ...
 $ int22 : int  44 44 49 47 45 45 45 46 45 48 ...
 $ int23 : int  2203 2188 1199 1162 2324 2346 821 897 1386 1189 ...
 $ int24 : int  13 13 14 13 13 13 12 13 13 14 ...
 $ int25 : int  5166 5761 3755 3794 5614 7779 2830 3971 4637 5871 ...
 $ int26 : int  103 103 105 103 103 103 101 105 105 107 ...
 $ int27 : int  70 183 159 197 217 165 153 232 92 102 ...
 $ int28 : int  103 103 105 103 103 103 101 105 105 107 ...
 $ int29 : int  0 0 0 0 0 0 0 0 0 0 ...
 $ int30 : int  161 146 200 158 150 160 190 161 163 169 ...
 $ char6 : chr  "Limelight" "Limelight" "Fusepoint/Savvis"
"Fusepoint/Savvis" ...
 $ char7 : chr  "Paris" "Paris" "Toronto" "Toronto" ...
 $ char8 : chr  "-1" "-1" "-1" "-1" ...
 $ char9 : chr  "FRANCE" "FRANCE" "CANADA" "CANADA" ...
 $ char10: chr  "FR" "FR" "CA" "CA" ...
 $ char11: chr  "FRANCE" "FRANCE" "CANADA" "CANADA" ...
 - attr(*, ".internal.selfref")=<externalptr>
> keytest(Small, colnames(Small)[1])
index 77,694,801

> keytest(Small, colnames(Small)[2])
char1 75,876,250

> keytest(Small, colnames(Small)[3])
char2 77,218,972

> keytest(Small, colnames(Small)[4])
char3 80,585,449

> keytest(Small, colnames(Small)[5])
int1 77,558,982

> keytest(Small, colnames(Small)[6])
int2 95,185,248

> keytest(Small, colnames(Small)[7])
char4 204,037,056

> keytest(Small, colnames(Small)[8])
int3 206,450,705

> keytest(Small, colnames(Small)[9])
int4 211,520,888

> keytest(Small, colnames(Small)[10])
int5 156,095,150

>
>
> keytest(Small, colnames(Small)[11])
int6 150,431,716

> keytest(Small, colnames(Small)[12])
int7 136,077,306

> keytest(Small, colnames(Small)[13])
int8 134,981,911

> keytest(Small, colnames(Small)[14])
int9 134,871,952

> keytest(Small, colnames(Small)[15])
int10 134,678,104

> keytest(Small, colnames(Small)[16])
int11 134,682,904

> keytest(Small, colnames(Small)[17])
int12 112,097,493

> keytest(Small, colnames(Small)[18])
int13 101,734,541

> keytest(Small, colnames(Small)[19])
int14 101,160,920

>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/datatable-help/attachments/20121018/3dc3f4e1/attachment-0001.html>


More information about the datatable-help mailing list