[datatable-help] Help with biglm() package
aniluw
anilkp at uw.edu
Wed Apr 4 22:59:58 CEST 2018
Due to my interest in data analytics, was working on large volumes of data
(Air traffic performance data – Bureau of Transportation Statistics) over
the break.
As part of my exercise, I was trying to predict the minutes delayed on
departure (one of the outcome variable) from a selected set of variables (to
minimize the data chunk and correlation between predictors). I found biglm()
function useful to perform this activity using ffdf objects.
I was able to build a linear model for training dataset (3,972,233 records)
with the biglm() package. However, when I try generate fitted-values, the
model returns all NAs. Thinking if the model interpret the outcome as
“categorical” (given the “response” attr checked in the $terms dimension of
the biglm object), than discrete integer values.
Also, the predictors CRS_DEP_TIME, CRS_ARR_TIME are integers (rest all the
predictors are factors), is it interpreted as “factors” ? Would really
appreciate your response.
> biglm.Train.Air.2017.Dep.Delay.mins$terms
DEP_DELAY ~ YEAR + MONTH + DAY_OF_MONTH + DAY_OF_WEEK + UNIQUE_CARRIER +
ORIGIN + DEST + CRS_DEP_TIME + CRS_ARR_TIME
attr(,"variables")
list(DEP_DELAY, YEAR, MONTH, DAY_OF_MONTH, DAY_OF_WEEK, UNIQUE_CARRIER,
ORIGIN, DEST, CRS_DEP_TIME, CRS_ARR_TIME)
attr(,"factors")
YEAR MONTH DAY_OF_MONTH DAY_OF_WEEK UNIQUE_CARRIER ORIGIN
DEST CRS_DEP_TIME CRS_ARR_TIME
DEP_DELAY 0 0 0 0 0 0
0 0 0
YEAR 1 0 0 0 0 0
0 0 0
MONTH 0 1 0 0 0 0
0 0 0
DAY_OF_MONTH 0 0 1 0 0 0
0 0 0
DAY_OF_WEEK 0 0 0 1 0 0
0 0 0
UNIQUE_CARRIER 0 0 0 0 1 0
0 0 0
ORIGIN 0 0 0 0 0 1
0 0 0
DEST 0 0 0 0 0 0
1 0 0
CRS_DEP_TIME 0 0 0 0 0 0
0 1 0
CRS_ARR_TIME 0 0 0 0 0 0
0 0 1
attr(,"term.labels")
[1] "YEAR" "MONTH" "DAY_OF_MONTH" "DAY_OF_WEEK"
"UNIQUE_CARRIER" "ORIGIN" "DEST" "CRS_DEP_TIME"
"CRS_ARR_TIME"
attr(,"order")
[1] 1 1 1 1 1 1 1 1 1
attr(,"intercept")
[1] 1
attr(,"response")
[1] 1
attr(,".Environment")
<environment: R_GlobalEnv>
>
Structure of the Training ffdf dataset
> str(Train.Air.OnTime2017.SelectCols.ffdf)
List of 3
$ virtual: 'data.frame': 10 obs. of 7 variables:
.. $ VirtualVmode : chr "integer" "integer" "integer" "integer" ...
.. $ AsIs : logi FALSE FALSE FALSE FALSE FALSE FALSE ...
.. $ VirtualIsMatrix : logi FALSE FALSE FALSE FALSE FALSE FALSE ...
.. $ PhysicalIsMatrix : logi FALSE FALSE FALSE FALSE FALSE FALSE ...
.. $ PhysicalElementNo: int 1 2 3 4 5 6 7 8 9 10
.. $ PhysicalFirstCol : int 1 1 1 1 1 1 1 1 1 1
.. $ PhysicalLastCol : int 1 1 1 1 1 1 1 1 1 1
.. - attr(*, "Dim")= int 3972233 10
.. - attr(*, "Dimorder")= int 1 2
$ physical: List of 10
.. $ YEAR : list()
.. ..- attr(*, "physical")=Class 'ff_pointer' <externalptr>
.. .. ..- attr(*, "vmode")= chr "integer"
.. .. ..- attr(*, "maxlength")= int 3972233
.. .. ..- attr(*, "pattern")= chr "clone"
.. .. ..- attr(*, "filename")= chr
"C:/Users/ANILKU~1/AppData/Local/Temp/RtmpSKCATB/clone2d3467395e5d.ff"
.. .. ..- attr(*, "pagesize")= int 65536
.. .. ..- attr(*, "finalizer")= chr "delete"
.. .. ..- attr(*, "finonexit")= logi TRUE
.. .. ..- attr(*, "readonly")= logi FALSE
.. .. ..- attr(*, "caching")= chr "mmnoflush"
.. ..- attr(*, "virtual")= list()
.. .. ..- attr(*, "Length")= int 3972233
.. .. ..- attr(*, "Symmetric")= logi FALSE
.. .. - attr(*, "class") = chr [1:2] "ff_vector" "ff"
.. $ MONTH : list()
.. ..- attr(*, "physical")=Class 'ff_pointer' <externalptr>
.. .. ..- attr(*, "vmode")= chr "integer"
.. .. ..- attr(*, "maxlength")= int 3972233
.. .. ..- attr(*, "pattern")= chr "clone"
.. .. ..- attr(*, "filename")= chr
"C:/Users/ANILKU~1/AppData/Local/Temp/RtmpSKCATB/clone2d34628a688c.ff"
.. .. ..- attr(*, "pagesize")= int 65536
.. .. ..- attr(*, "finalizer")= chr "delete"
.. .. ..- attr(*, "finonexit")= logi TRUE
.. .. ..- attr(*, "readonly")= logi FALSE
.. .. ..- attr(*, "caching")= chr "mmnoflush"
.. ..- attr(*, "virtual")= list()
.. .. ..- attr(*, "Length")= int 3972233
.. .. ..- attr(*, "Symmetric")= logi FALSE
.. .. ..- attr(*, "Levels")= chr [1:12] "January" "February" "March"
"April" ...
.. .. ..- attr(*, "ramclass")= chr "factor"
.. .. - attr(*, "class") = chr [1:2] "ff_vector" "ff"
.. $ DAY_OF_MONTH : list()
.. ..- attr(*, "physical")=Class 'ff_pointer' <externalptr>
.. .. ..- attr(*, "vmode")= chr "integer"
.. .. ..- attr(*, "maxlength")= int 3972233
.. .. ..- attr(*, "pattern")= chr "clone"
.. .. ..- attr(*, "filename")= chr
"C:/Users/ANILKU~1/AppData/Local/Temp/RtmpSKCATB/clone2d34868459e.ff"
.. .. ..- attr(*, "pagesize")= int 65536
.. .. ..- attr(*, "finalizer")= chr "delete"
.. .. ..- attr(*, "finonexit")= logi TRUE
.. .. ..- attr(*, "readonly")= logi FALSE
.. .. ..- attr(*, "caching")= chr "mmnoflush"
.. ..- attr(*, "virtual")= list()
.. .. ..- attr(*, "Length")= int 3972233
.. .. ..- attr(*, "Symmetric")= logi FALSE
.. .. ..- attr(*, "Levels")= chr [1:31] "1" "2" "3" "4" ...
.. .. ..- attr(*, "ramclass")= chr "factor"
.. .. - attr(*, "class") = chr [1:2] "ff_vector" "ff"
.. $ DAY_OF_WEEK : list()
.. ..- attr(*, "physical")=Class 'ff_pointer' <externalptr>
.. .. ..- attr(*, "vmode")= chr "integer"
.. .. ..- attr(*, "maxlength")= int 3972233
.. .. ..- attr(*, "pattern")= chr "clone"
.. .. ..- attr(*, "filename")= chr
"C:/Users/ANILKU~1/AppData/Local/Temp/RtmpSKCATB/clone2d3419f71e6a.ff"
.. .. ..- attr(*, "pagesize")= int 65536
.. .. ..- attr(*, "finalizer")= chr "delete"
.. .. ..- attr(*, "finonexit")= logi TRUE
.. .. ..- attr(*, "readonly")= logi FALSE
.. .. ..- attr(*, "caching")= chr "mmnoflush"
.. ..- attr(*, "virtual")= list()
.. .. ..- attr(*, "Length")= int 3972233
.. .. ..- attr(*, "Symmetric")= logi FALSE
.. .. ..- attr(*, "Levels")= chr [1:7] "Mon" "Tue" "Wed" "Thu" ...
.. .. ..- attr(*, "ramclass")= chr "factor"
.. .. - attr(*, "class") = chr [1:2] "ff_vector" "ff"
.. $ UNIQUE_CARRIER: list()
.. ..- attr(*, "physical")=Class 'ff_pointer' <externalptr>
.. .. ..- attr(*, "vmode")= chr "integer"
.. .. ..- attr(*, "maxlength")= int 3972233
.. .. ..- attr(*, "pattern")= chr "clone"
.. .. ..- attr(*, "filename")= chr
"C:/Users/ANILKU~1/AppData/Local/Temp/RtmpSKCATB/clone2d342b7d2482.ff"
.. .. ..- attr(*, "pagesize")= int 65536
.. .. ..- attr(*, "finalizer")= chr "delete"
.. .. ..- attr(*, "finonexit")= logi TRUE
.. .. ..- attr(*, "readonly")= logi FALSE
.. .. ..- attr(*, "caching")= chr "mmnoflush"
.. ..- attr(*, "virtual")= list()
.. .. ..- attr(*, "Length")= int 3972233
.. .. ..- attr(*, "Symmetric")= logi FALSE
.. .. ..- attr(*, "Levels")= chr [1:12] "AA" "AS" "B6" "DL" ...
.. .. ..- attr(*, "ramclass")= chr "factor"
.. .. - attr(*, "class") = chr [1:2] "ff_vector" "ff"
.. $ ORIGIN : list()
.. ..- attr(*, "physical")=Class 'ff_pointer' <externalptr>
.. .. ..- attr(*, "vmode")= chr "integer"
.. .. ..- attr(*, "maxlength")= int 3972233
.. .. ..- attr(*, "pattern")= chr "clone"
.. .. ..- attr(*, "filename")= chr
"C:/Users/ANILKU~1/AppData/Local/Temp/RtmpSKCATB/clone2d342cf382a.ff"
.. .. ..- attr(*, "pagesize")= int 65536
.. .. ..- attr(*, "finalizer")= chr "delete"
.. .. ..- attr(*, "finonexit")= logi TRUE
.. .. ..- attr(*, "readonly")= logi FALSE
.. .. ..- attr(*, "caching")= chr "mmnoflush"
.. ..- attr(*, "virtual")= list()
.. .. ..- attr(*, "Length")= int 3972233
.. .. ..- attr(*, "Symmetric")= logi FALSE
.. .. ..- attr(*, "Levels")= chr [1:320] "ABE" "ABI" "ABQ" "ABR" ...
.. .. ..- attr(*, "ramclass")= chr "factor"
.. .. - attr(*, "class") = chr [1:2] "ff_vector" "ff"
.. $ DEST : list()
.. ..- attr(*, "physical")=Class 'ff_pointer' <externalptr>
.. .. ..- attr(*, "vmode")= chr "integer"
.. .. ..- attr(*, "maxlength")= int 3972233
.. .. ..- attr(*, "pattern")= chr "clone"
.. .. ..- attr(*, "filename")= chr
"C:/Users/ANILKU~1/AppData/Local/Temp/RtmpSKCATB/clone2d34344c1f1b.ff"
.. .. ..- attr(*, "pagesize")= int 65536
.. .. ..- attr(*, "finalizer")= chr "delete"
.. .. ..- attr(*, "finonexit")= logi TRUE
.. .. ..- attr(*, "readonly")= logi FALSE
.. .. ..- attr(*, "caching")= chr "mmnoflush"
.. ..- attr(*, "virtual")= list()
.. .. ..- attr(*, "Length")= int 3972233
.. .. ..- attr(*, "Symmetric")= logi FALSE
.. .. ..- attr(*, "Levels")= chr [1:320] "ABE" "ABI" "ABQ" "ABR" ...
.. .. ..- attr(*, "ramclass")= chr "factor"
.. .. - attr(*, "class") = chr [1:2] "ff_vector" "ff"
.. $ CRS_DEP_TIME : list()
.. ..- attr(*, "physical")=Class 'ff_pointer' <externalptr>
.. .. ..- attr(*, "vmode")= chr "integer"
.. .. ..- attr(*, "maxlength")= int 3972233
.. .. ..- attr(*, "pattern")= chr "clone"
.. .. ..- attr(*, "filename")= chr
"C:/Users/ANILKU~1/AppData/Local/Temp/RtmpSKCATB/clone2d347efb4a28.ff"
.. .. ..- attr(*, "pagesize")= int 65536
.. .. ..- attr(*, "finalizer")= chr "delete"
.. .. ..- attr(*, "finonexit")= logi TRUE
.. .. ..- attr(*, "readonly")= logi FALSE
.. .. ..- attr(*, "caching")= chr "mmnoflush"
.. ..- attr(*, "virtual")= list()
.. .. ..- attr(*, "Length")= int 3972233
.. .. ..- attr(*, "Symmetric")= logi FALSE
.. .. - attr(*, "class") = chr [1:2] "ff_vector" "ff"
.. $ CRS_ARR_TIME : list()
.. ..- attr(*, "physical")=Class 'ff_pointer' <externalptr>
.. .. ..- attr(*, "vmode")= chr "integer"
.. .. ..- attr(*, "maxlength")= int 3972233
.. .. ..- attr(*, "pattern")= chr "clone"
.. .. ..- attr(*, "filename")= chr
"C:/Users/ANILKU~1/AppData/Local/Temp/RtmpSKCATB/clone2d34c6e593e.ff"
.. .. ..- attr(*, "pagesize")= int 65536
.. .. ..- attr(*, "finalizer")= chr "delete"
.. .. ..- attr(*, "finonexit")= logi TRUE
.. .. ..- attr(*, "readonly")= logi FALSE
.. .. ..- attr(*, "caching")= chr "mmnoflush"
.. ..- attr(*, "virtual")= list()
.. .. ..- attr(*, "Length")= int 3972233
.. .. ..- attr(*, "Symmetric")= logi FALSE
.. .. - attr(*, "class") = chr [1:2] "ff_vector" "ff"
.. $ DEP_DELAY : list()
.. ..- attr(*, "physical")=Class 'ff_pointer' <externalptr>
.. .. ..- attr(*, "vmode")= chr "integer"
.. .. ..- attr(*, "maxlength")= int 3972233
.. .. ..- attr(*, "pattern")= chr "clone"
.. .. ..- attr(*, "filename")= chr
"C:/Users/ANILKU~1/AppData/Local/Temp/RtmpSKCATB/clone2d3419b6864.ff"
.. .. ..- attr(*, "pagesize")= int 65536
.. .. ..- attr(*, "finalizer")= chr "delete"
.. .. ..- attr(*, "finonexit")= logi TRUE
.. .. ..- attr(*, "readonly")= logi FALSE
.. .. ..- attr(*, "caching")= chr "mmnoflush"
.. ..- attr(*, "virtual")= list()
.. .. ..- attr(*, "Length")= int 3972233
.. .. ..- attr(*, "Symmetric")= logi FALSE
.. .. - attr(*, "class") = chr [1:2] "ff_vector" "ff"
$ row.names: NULL
- attributes: List of 2
.. $ names: chr [1:3] "virtual" "physical" "row.names"
.. $ class: chr "ffdf"
--
Sent from: http://r.789695.n4.nabble.com/datatable-help-f2315188.html
More information about the datatable-help
mailing list