[datatable-help] Help with biglm() package

aniluw anilkp at uw.edu
Wed Apr 4 22:59:58 CEST 2018


Due to my interest in data analytics, was working on large volumes of data
(Air traffic performance data – Bureau of Transportation Statistics) over
the break.

As part of my exercise, I was trying to predict the minutes delayed on
departure (one of the outcome variable) from a selected set of variables (to
minimize the data chunk and correlation between predictors). I found biglm()
function useful to perform this activity using ffdf objects. 

I was able to build a linear model for training dataset (3,972,233 records)
with the biglm() package. However, when I try generate fitted-values, the
model returns all NAs. Thinking if the model interpret the outcome as
“categorical” (given the “response” attr checked in the $terms dimension of
the biglm object), than discrete integer values.

Also, the predictors CRS_DEP_TIME, CRS_ARR_TIME are integers (rest all the
predictors are factors), is it interpreted as “factors” ? Would really
appreciate your response.

> biglm.Train.Air.2017.Dep.Delay.mins$terms
DEP_DELAY ~ YEAR + MONTH + DAY_OF_MONTH + DAY_OF_WEEK + UNIQUE_CARRIER + 
    ORIGIN + DEST + CRS_DEP_TIME + CRS_ARR_TIME
attr(,"variables")
list(DEP_DELAY, YEAR, MONTH, DAY_OF_MONTH, DAY_OF_WEEK, UNIQUE_CARRIER, 
    ORIGIN, DEST, CRS_DEP_TIME, CRS_ARR_TIME)
attr(,"factors")
               YEAR MONTH DAY_OF_MONTH DAY_OF_WEEK UNIQUE_CARRIER ORIGIN
DEST CRS_DEP_TIME CRS_ARR_TIME
DEP_DELAY         0     0            0           0              0      0   
0            0            0
YEAR              1     0            0           0              0      0   
0            0            0
MONTH             0     1            0           0              0      0   
0            0            0
DAY_OF_MONTH      0     0            1           0              0      0   
0            0            0
DAY_OF_WEEK       0     0            0           1              0      0   
0            0            0
UNIQUE_CARRIER    0     0            0           0              1      0   
0            0            0
ORIGIN            0     0            0           0              0      1   
0            0            0
DEST              0     0            0           0              0      0   
1            0            0
CRS_DEP_TIME      0     0            0           0              0      0   
0            1            0
CRS_ARR_TIME      0     0            0           0              0      0   
0            0            1
attr(,"term.labels")
[1] "YEAR"           "MONTH"          "DAY_OF_MONTH"   "DAY_OF_WEEK"   
"UNIQUE_CARRIER" "ORIGIN"         "DEST"           "CRS_DEP_TIME"  
"CRS_ARR_TIME"  
attr(,"order")
[1] 1 1 1 1 1 1 1 1 1
attr(,"intercept")
[1] 1
attr(,"response")
[1] 1
attr(,".Environment")
<environment: R_GlobalEnv>

> 

Structure of the Training ffdf dataset
> str(Train.Air.OnTime2017.SelectCols.ffdf)
List of 3
 $ virtual: 'data.frame':     10 obs. of  7 variables:
 .. $ VirtualVmode     : chr  "integer" "integer" "integer" "integer" ...
 .. $ AsIs             : logi  FALSE FALSE FALSE FALSE FALSE FALSE ...
 .. $ VirtualIsMatrix  : logi  FALSE FALSE FALSE FALSE FALSE FALSE ...
 .. $ PhysicalIsMatrix : logi  FALSE FALSE FALSE FALSE FALSE FALSE ...
 .. $ PhysicalElementNo: int  1 2 3 4 5 6 7 8 9 10
 .. $ PhysicalFirstCol : int  1 1 1 1 1 1 1 1 1 1
 .. $ PhysicalLastCol  : int  1 1 1 1 1 1 1 1 1 1
 .. - attr(*, "Dim")= int  3972233 10
 .. - attr(*, "Dimorder")= int  1 2
 $ physical: List of 10
 .. $ YEAR          : list()
 ..  ..- attr(*, "physical")=Class 'ff_pointer' <externalptr> 
 ..  .. ..- attr(*, "vmode")= chr "integer"
 ..  .. ..- attr(*, "maxlength")= int 3972233
 ..  .. ..- attr(*, "pattern")= chr "clone"
 ..  .. ..- attr(*, "filename")= chr
"C:/Users/ANILKU~1/AppData/Local/Temp/RtmpSKCATB/clone2d3467395e5d.ff"
 ..  .. ..- attr(*, "pagesize")= int 65536
 ..  .. ..- attr(*, "finalizer")= chr "delete"
 ..  .. ..- attr(*, "finonexit")= logi TRUE
 ..  .. ..- attr(*, "readonly")= logi FALSE
 ..  .. ..- attr(*, "caching")= chr "mmnoflush"
 ..  ..- attr(*, "virtual")= list()
 ..  .. ..- attr(*, "Length")= int 3972233
 ..  .. ..- attr(*, "Symmetric")= logi FALSE
 .. .. - attr(*, "class") =  chr [1:2] "ff_vector" "ff"
 .. $ MONTH         : list()
 ..  ..- attr(*, "physical")=Class 'ff_pointer' <externalptr> 
 ..  .. ..- attr(*, "vmode")= chr "integer"
 ..  .. ..- attr(*, "maxlength")= int 3972233
 ..  .. ..- attr(*, "pattern")= chr "clone"
 ..  .. ..- attr(*, "filename")= chr
"C:/Users/ANILKU~1/AppData/Local/Temp/RtmpSKCATB/clone2d34628a688c.ff"
 ..  .. ..- attr(*, "pagesize")= int 65536
 ..  .. ..- attr(*, "finalizer")= chr "delete"
 ..  .. ..- attr(*, "finonexit")= logi TRUE
 ..  .. ..- attr(*, "readonly")= logi FALSE
 ..  .. ..- attr(*, "caching")= chr "mmnoflush"
 ..  ..- attr(*, "virtual")= list()
 ..  .. ..- attr(*, "Length")= int 3972233
 ..  .. ..- attr(*, "Symmetric")= logi FALSE
 ..  .. ..- attr(*, "Levels")= chr [1:12] "January" "February" "March"
"April" ...
 ..  .. ..- attr(*, "ramclass")= chr "factor"
 .. .. - attr(*, "class") =  chr [1:2] "ff_vector" "ff"
 .. $ DAY_OF_MONTH  : list()
 ..  ..- attr(*, "physical")=Class 'ff_pointer' <externalptr> 
 ..  .. ..- attr(*, "vmode")= chr "integer"
 ..  .. ..- attr(*, "maxlength")= int 3972233
 ..  .. ..- attr(*, "pattern")= chr "clone"
 ..  .. ..- attr(*, "filename")= chr
"C:/Users/ANILKU~1/AppData/Local/Temp/RtmpSKCATB/clone2d34868459e.ff"
 ..  .. ..- attr(*, "pagesize")= int 65536
 ..  .. ..- attr(*, "finalizer")= chr "delete"
 ..  .. ..- attr(*, "finonexit")= logi TRUE
 ..  .. ..- attr(*, "readonly")= logi FALSE
 ..  .. ..- attr(*, "caching")= chr "mmnoflush"
 ..  ..- attr(*, "virtual")= list()
 ..  .. ..- attr(*, "Length")= int 3972233
 ..  .. ..- attr(*, "Symmetric")= logi FALSE
 ..  .. ..- attr(*, "Levels")= chr [1:31] "1" "2" "3" "4" ...
 ..  .. ..- attr(*, "ramclass")= chr "factor"
 .. .. - attr(*, "class") =  chr [1:2] "ff_vector" "ff"
 .. $ DAY_OF_WEEK   : list()
 ..  ..- attr(*, "physical")=Class 'ff_pointer' <externalptr> 
 ..  .. ..- attr(*, "vmode")= chr "integer"
 ..  .. ..- attr(*, "maxlength")= int 3972233
 ..  .. ..- attr(*, "pattern")= chr "clone"
 ..  .. ..- attr(*, "filename")= chr
"C:/Users/ANILKU~1/AppData/Local/Temp/RtmpSKCATB/clone2d3419f71e6a.ff"
 ..  .. ..- attr(*, "pagesize")= int 65536
 ..  .. ..- attr(*, "finalizer")= chr "delete"
 ..  .. ..- attr(*, "finonexit")= logi TRUE
 ..  .. ..- attr(*, "readonly")= logi FALSE
 ..  .. ..- attr(*, "caching")= chr "mmnoflush"
 ..  ..- attr(*, "virtual")= list()
 ..  .. ..- attr(*, "Length")= int 3972233
 ..  .. ..- attr(*, "Symmetric")= logi FALSE
 ..  .. ..- attr(*, "Levels")= chr [1:7] "Mon" "Tue" "Wed" "Thu" ...
 ..  .. ..- attr(*, "ramclass")= chr "factor"
 .. .. - attr(*, "class") =  chr [1:2] "ff_vector" "ff"
 .. $ UNIQUE_CARRIER: list()
 ..  ..- attr(*, "physical")=Class 'ff_pointer' <externalptr> 
 ..  .. ..- attr(*, "vmode")= chr "integer"
 ..  .. ..- attr(*, "maxlength")= int 3972233
 ..  .. ..- attr(*, "pattern")= chr "clone"
 ..  .. ..- attr(*, "filename")= chr
"C:/Users/ANILKU~1/AppData/Local/Temp/RtmpSKCATB/clone2d342b7d2482.ff"
 ..  .. ..- attr(*, "pagesize")= int 65536
 ..  .. ..- attr(*, "finalizer")= chr "delete"
 ..  .. ..- attr(*, "finonexit")= logi TRUE
 ..  .. ..- attr(*, "readonly")= logi FALSE
 ..  .. ..- attr(*, "caching")= chr "mmnoflush"
 ..  ..- attr(*, "virtual")= list()
 ..  .. ..- attr(*, "Length")= int 3972233
 ..  .. ..- attr(*, "Symmetric")= logi FALSE
 ..  .. ..- attr(*, "Levels")= chr [1:12] "AA" "AS" "B6" "DL" ...
 ..  .. ..- attr(*, "ramclass")= chr "factor"
 .. .. - attr(*, "class") =  chr [1:2] "ff_vector" "ff"
 .. $ ORIGIN        : list()
 ..  ..- attr(*, "physical")=Class 'ff_pointer' <externalptr> 
 ..  .. ..- attr(*, "vmode")= chr "integer"
 ..  .. ..- attr(*, "maxlength")= int 3972233
 ..  .. ..- attr(*, "pattern")= chr "clone"
 ..  .. ..- attr(*, "filename")= chr
"C:/Users/ANILKU~1/AppData/Local/Temp/RtmpSKCATB/clone2d342cf382a.ff"
 ..  .. ..- attr(*, "pagesize")= int 65536
 ..  .. ..- attr(*, "finalizer")= chr "delete"
 ..  .. ..- attr(*, "finonexit")= logi TRUE
 ..  .. ..- attr(*, "readonly")= logi FALSE
 ..  .. ..- attr(*, "caching")= chr "mmnoflush"
 ..  ..- attr(*, "virtual")= list()
 ..  .. ..- attr(*, "Length")= int 3972233
 ..  .. ..- attr(*, "Symmetric")= logi FALSE
 ..  .. ..- attr(*, "Levels")= chr [1:320] "ABE" "ABI" "ABQ" "ABR" ...
 ..  .. ..- attr(*, "ramclass")= chr "factor"
 .. .. - attr(*, "class") =  chr [1:2] "ff_vector" "ff"
 .. $ DEST          : list()
 ..  ..- attr(*, "physical")=Class 'ff_pointer' <externalptr> 
 ..  .. ..- attr(*, "vmode")= chr "integer"
 ..  .. ..- attr(*, "maxlength")= int 3972233
 ..  .. ..- attr(*, "pattern")= chr "clone"
 ..  .. ..- attr(*, "filename")= chr
"C:/Users/ANILKU~1/AppData/Local/Temp/RtmpSKCATB/clone2d34344c1f1b.ff"
 ..  .. ..- attr(*, "pagesize")= int 65536
 ..  .. ..- attr(*, "finalizer")= chr "delete"
 ..  .. ..- attr(*, "finonexit")= logi TRUE
 ..  .. ..- attr(*, "readonly")= logi FALSE
 ..  .. ..- attr(*, "caching")= chr "mmnoflush"
 ..  ..- attr(*, "virtual")= list()
 ..  .. ..- attr(*, "Length")= int 3972233
 ..  .. ..- attr(*, "Symmetric")= logi FALSE
 ..  .. ..- attr(*, "Levels")= chr [1:320] "ABE" "ABI" "ABQ" "ABR" ...
 ..  .. ..- attr(*, "ramclass")= chr "factor"
 .. .. - attr(*, "class") =  chr [1:2] "ff_vector" "ff"
 .. $ CRS_DEP_TIME  : list()
 ..  ..- attr(*, "physical")=Class 'ff_pointer' <externalptr> 
 ..  .. ..- attr(*, "vmode")= chr "integer"
 ..  .. ..- attr(*, "maxlength")= int 3972233
 ..  .. ..- attr(*, "pattern")= chr "clone"
 ..  .. ..- attr(*, "filename")= chr
"C:/Users/ANILKU~1/AppData/Local/Temp/RtmpSKCATB/clone2d347efb4a28.ff"
 ..  .. ..- attr(*, "pagesize")= int 65536
 ..  .. ..- attr(*, "finalizer")= chr "delete"
 ..  .. ..- attr(*, "finonexit")= logi TRUE
 ..  .. ..- attr(*, "readonly")= logi FALSE
 ..  .. ..- attr(*, "caching")= chr "mmnoflush"
 ..  ..- attr(*, "virtual")= list()
 ..  .. ..- attr(*, "Length")= int 3972233
 ..  .. ..- attr(*, "Symmetric")= logi FALSE
 .. .. - attr(*, "class") =  chr [1:2] "ff_vector" "ff"
 .. $ CRS_ARR_TIME  : list()
 ..  ..- attr(*, "physical")=Class 'ff_pointer' <externalptr> 
 ..  .. ..- attr(*, "vmode")= chr "integer"
 ..  .. ..- attr(*, "maxlength")= int 3972233
 ..  .. ..- attr(*, "pattern")= chr "clone"
 ..  .. ..- attr(*, "filename")= chr
"C:/Users/ANILKU~1/AppData/Local/Temp/RtmpSKCATB/clone2d34c6e593e.ff"
 ..  .. ..- attr(*, "pagesize")= int 65536
 ..  .. ..- attr(*, "finalizer")= chr "delete"
 ..  .. ..- attr(*, "finonexit")= logi TRUE
 ..  .. ..- attr(*, "readonly")= logi FALSE
 ..  .. ..- attr(*, "caching")= chr "mmnoflush"
 ..  ..- attr(*, "virtual")= list()
 ..  .. ..- attr(*, "Length")= int 3972233
 ..  .. ..- attr(*, "Symmetric")= logi FALSE
 .. .. - attr(*, "class") =  chr [1:2] "ff_vector" "ff"
 .. $ DEP_DELAY     : list()
 ..  ..- attr(*, "physical")=Class 'ff_pointer' <externalptr> 
 ..  .. ..- attr(*, "vmode")= chr "integer"
 ..  .. ..- attr(*, "maxlength")= int 3972233
 ..  .. ..- attr(*, "pattern")= chr "clone"
 ..  .. ..- attr(*, "filename")= chr
"C:/Users/ANILKU~1/AppData/Local/Temp/RtmpSKCATB/clone2d3419b6864.ff"
 ..  .. ..- attr(*, "pagesize")= int 65536
 ..  .. ..- attr(*, "finalizer")= chr "delete"
 ..  .. ..- attr(*, "finonexit")= logi TRUE
 ..  .. ..- attr(*, "readonly")= logi FALSE
 ..  .. ..- attr(*, "caching")= chr "mmnoflush"
 ..  ..- attr(*, "virtual")= list()
 ..  .. ..- attr(*, "Length")= int 3972233
 ..  .. ..- attr(*, "Symmetric")= logi FALSE
 .. .. - attr(*, "class") =  chr [1:2] "ff_vector" "ff"
 $ row.names:  NULL
- attributes: List of 2
 .. $ names: chr [1:3] "virtual" "physical" "row.names"
 .. $ class: chr "ffdf"






--
Sent from: http://r.789695.n4.nabble.com/datatable-help-f2315188.html


More information about the datatable-help mailing list