So, the next topic I want to take on is serial correlation.
For this example, I wanted to use some real data, so I pulled out my OLD copy of Pindyck and Rubinfeld (copyright 1976) and I borrowed one of their examples. They have an example that looks at coal consumption as a function of four variables: (1) iron & steel production, (2) electrical utility production, (3) the price of coal, and (4) the price of natural gas. The question here is to see how sensitive coal consumption is the relative price of coal versus natural gas. For example, it is interesting to know how easily electric utilities can swap coal for gas depending on their relative prices.
(2) Write the F# Code to read it from the database. For this purpose, I will use Fluent NHibernate.
So, here's my class object. I like vb, so it's in vb.
Public Overridable Property COAL_CONS_ELEC() As Double
Get
COAL_CONS_ELEC = _COAL_CONS_ELEC
End Get
Set(ByVal value As Double)
_COAL_CONS_ELEC = value
End Set
End Property
Public Overridable Property COAL_CONS() As Double
Get
COAL_CONS = _COAL_CONS
End Get
Set(ByVal value As Double)
_COAL_CONS = value
End Set
End Property
Public Overridable Property COAL_PROD() As Double
Get
COAL_PROD = _COAL_PROD
End Get
Set(ByVal value As Double)
_COAL_PROD = value
End Set
End Property
Public Overridable Property IRON_STEEL() As Double
Get
IRON_STEEL = _IRON_STEEL
End Get
Set(ByVal value As Double)
_IRON_STEEL = value
End Set
End Property
Public Overridable Property ELECT_PROD() As Double
Get
ELECT_PROD = _ELECT_PROD
End Get
Set(ByVal value As Double)
_ELECT_PROD = value
End Set
End Property
Public Overridable Property GAS_PRICE() As Double
Get
GAS_PRICE = _GAS_PRICE
End Get
Set(ByVal value As Double)
_GAS_PRICE = value
End Set
End Property
Public Overridable Property COAL_PRICE() As Double
Get
COAL_PRICE = _COAL_PRICE
End Get
Set(ByVal value As Double)
_COAL_PRICE = value
End Set
End Property
End Class
End Namespace
Fluent NHibernate Code
Here is the Fluent NHibernate code to read the data from the database and store the coal consumption data in a matrix called y and our dependent variable data in a matrix called X
//-------------------------------------------------------------------
#light
open System
open System.Collections.Generic
open System.IO
open Xunit
open MathMod
open Microsoft.FSharp.Math
open CoalData
open FluentNHibernate.AutoMap
open FluentNHibernate
let properties = new Dictionary<string, string>()
let connString = "server='BIG_ROCK\LOGGERSEDGE';Initial Catalog=COAL;User ID=sa;Password=XXXXXX"
properties.Add("connection.provider", "NHibernate.Connection.DriverConnectionProvider")
properties.Add("dialect", "NHibernate.Dialect.MsSql2000Dialect")
properties.Add("connection.driver_class", "NHibernate.Driver.SqlClientDriver")
properties.Add("show_sql", "true")
properties.Add("connection.connection_string", connString)
let autoMappings = (AutoPersistenceModel.MapEntitiesFromAssemblyOf<CoalData.COAL_DATA>()).Where(fun t -> if t.Namespace = "CoalData.CoalData" then true else false)
let aConfig = (new NHibernate.Cfg.Configuration()).AddProperties(properties).AddAutoMappings(autoMappings)
let sessionFactory = aConfig.BuildSessionFactory()
let aSession = sessionFactory.OpenSession()
aSession.BeginTransaction()
let DataSet = aSession.CreateCriteria(typeof<CoalData.COAL_DATA>).List()
let n = 436
let k = 5
let X = Array2D.zeroCreate<float> n k
let y = Array2D.zeroCreate<float> n 1
let mutable i = -1
for someObj in DataSet do
let xCo = someObj :?> CoalData.COAL_DATA
//
// Everything is Good
//
//printfn "CoalData: %s %10.2f %10.2f %10.2f %10.2f %10.2f %10.2f %10.2f " xCo.THE_DATE xCo.COAL_PROD xCo.COAL_CONS xCo.COAL_CONS_ELEC xCo.IRON_STEEL xCo.ELECT_PROD xCo.GAS_PRICE xCo.GAS_PRICE
i <- i+1
y.[i, 0] <- xCo.COAL_CONS
X.[i, 0] <- 1.0
X.[i, 1] <- xCo.IRON_STEEL
X.[i, 2] <- xCo.ELECT_PROD
X.[i, 3] <- xCo.COAL_PRICE
X.[i, 4] <- xCo.GAS_PRICE
Nothing very interesting here, except perhaps this statement:
let xCo = someObj :?> CoalData.COAL_DATA
that is needed to cast some generic object type into a CoalData.COAL_DATA type using the funky :?> operator.
I then call my regress function:
let ret = regress X y
The function takes the matrices X and y as inputs and runs the regression
//---------------------------------------------------------
// Regression function
//---------------------------------------------------------
let regress (x0 : float [,] ) (y : float [,] ) =
let xp = transpose2 x0
let xpx = matMult2 xp x0
let X = xpx
let XI, ret = matInverse X
let k = X.GetUpperBound 0
let ni = 1 + x0.GetUpperBound 0
//-----------------------------------------
//Now Test
//Multiply XI and X to calculate Identity Matrix
//-----------------------------------------
let XTest = Array2D.zeroCreate<float> (k+1) (k+1)
let XTest = matMult XI X XTest
for i = 0 to k do
printfn "XTest Matrix: %10.5f %10.5f %10.5f %10.5f %10.5f " XTest.[i,0] XTest.[i,1] XTest.[i,2] XTest.[i,3] XTest.[i,4]
let b = matMult2 (matMult2 XI xp) y
for i = 0 to k do
printfn "Beta Hat: %i %10.5f" i b.[i,0]
//---------------------------------------
// Calc Std Errors
//---------------------------------------
// Calc Residual Vector = Y - Yhat
// First Calc Y Hat = X BetaHat
let YHat = matMult2 x0 b
let e = Array2D.zeroCreate<float> ni 1
for i = 0 to (ni-1) do
e.[i,0] <- YHat.[i,0] - y.[i,0]
for i = 0 to (ni-1) do
printfn "Y, Hat, e: %i %10.5f %10.5f %10.5f " i y.[i,0] YHat.[i,0] e.[i,0]
//--------------------------------------------------
//Calc SSE -> Sum of Squared Errors e'e
//--------------------------------------------------
let ssV = matMult2 (transpose2 e) e
let sse = ssV.[0,0]
printfn "SSE: %10.2f" sse
let sigma2 = sse / float (ni - (k+1))
printfn "SE: %10.2f" sigma2
//--------------------------------------------------
//Calc Variance/Covariance Matrix of B Hat
//--------------------------------------------------
//---------------------------------------
// Calc T-Stats
//---------------------------------------
let bVar = Array2D.zeroCreate<float> (k+1) 1
let bSE = Array2D.zeroCreate<float> (k+1) 1
let TStat = Array2D.zeroCreate<float> (k+1) 1
for i = 0 to k do
bVar.[i,0] <- sigma2 * XI.[i,i]
bSE.[i,0] <- Math.Sqrt bVar.[i,0]
TStat.[i,0] <- b.[i,0] / bSE.[i,0]
printfn "B, Var, SE, TStat: %10.5f %10.5f %10.5f %10.5f" b.[i,0] bVar.[i,0] bSE.[i,0] TStat.[i,0]
//---------------------------------------
// Calc R-Squared
//---------------------------------------
let N = Array2D.zeroCreate<float> ni ni
let I = Array2D.init<float> ni ni (fun i j -> if i=j then 1.0 else 0.0)
let ix = Array2D.init<float> ni ni (fun i j -> 1.0/float ni)
for i = 0 to (ni-1) do
for j = 0 to (ni-1) do
N.[i,j] <- I.[i,j] - ix.[i,j]
let sstV = matMult2 (matMult2 (transpose2 y) N) y
let sst = sstV.[0,0]
printfn "SST: %10.2f" sst
let R2 = 1.0 - (sse / sst)
printfn "R-Square: %10.6f" R2
let RBar2 = 1.0 - (((float ni - 1.0) / (float ni - (float k + 1.0))) * (1.0 - R2))
printfn "RBar-Square: %10.6f" RBar2
//-------------------------------------------------
// Now Do F Statistic
//-------------------------------------------------
let FStat = (R2 / (1.0 - R2)) * ((float ni - (float k + 1.0)) / ((float k + 1.0) - 1.0))
printfn "F Stat: %10.6f" FStat
//--------------------------------------------------
//Calc DW Statistic
//--------------------------------------------------
let mutable e2 = 0.0
let mutable eLag2 = 0.0
let mutable e_eLag = 0.0
for i = 1 to (ni-1) do
e2 <- e2 + (e.[i,0]*e.[i,0])
eLag2 <- eLag2 + (e.[i-1,0]*e.[i-1,0])
e_eLag <- e_eLag + (e.[i,0]*e.[i-1,0])
let DW = (e2 - 2.0*e_eLag + eLag2) / sse
printfn "DW: %10.6f" DW ()
//------------------------------------------------------------------------------------
I have covered everything here in prior posts, with exception of the calculation of the Durbin-Watson (DW) statistic. This statistic equals the sum of squared deviations between a period's residual and its lagged (one period) value divided by the total sum or squared residuals for the regression. This is calculated in the code just above.
So what do we get?
Here it is:
NHibernate: SELECT this_.Id as Id0_0_, this_.THE_DATE as THE2_0_0_, this_.COAL_CONS_ELEC as COAL3_0_0_, this_.COAL_CONS as COAL4_0_0_, this_.COAL_PROD as COAL5_0_0_, this_.IRON_STEEL as IRON6_0_0_, this_.ELECT_PROD as ELECT7_0_0_, this_.GAS_PRICE as GAS8_0_0_, this_.COAL_PRICE as COAL9_0_0_ FROM [COAL_DATA] this_
XI Matrix: 0.2991808728 -0.0011711136 -0.0004791834 -0.0017702552 0.0003130596
XI Matrix: -0.0011711136 0.0000063639 0.0000005656 0.0000056077 -0.0000008231
XI Matrix: -0.0004791834 0.0000005656 0.0000107535 -0.0000025800 -0.0000013383
XI Matrix: -0.0017702552 0.0000056077 -0.0000025800 0.0000169030 -0.0000023115
XI Matrix: 0.0003130596 -0.0000008231 -0.0000013383 -0.0000023115 0.0000008803
XTest Matrix: 1.00000 0.00000 0.00000 0.00000 0.00000
XTest Matrix: 0.00000 1.00000 0.00000 0.00000 0.00000
XTest Matrix: 0.00000 0.00000 1.00000 0.00000 0.00000
XTest Matrix: 0.00000 0.00000 0.00000 1.00000 0.00000
XTest Matrix: 0.00000 0.00000 0.00000 0.00000 1.00000
Beta Hat: 0 29843.97538
Beta Hat: 1 -61.85082
Beta Hat: 2 732.56913
Beta Hat: 3 -68.85197
Beta Hat: 4 10.10098
SSE: 5071785864.31
SE: 11767484.60
B, Var, SE, TStat: 29843.97538 3520606.31417 1876.32788 15.90552
B, Var, SE, TStat: -61.85082 74.88767 8.65377 -7.14727
B, Var, SE, TStat: 732.56913 126.54212 11.24909 65.12250
B, Var, SE, TStat: -68.85197 198.90621 14.10341 -4.88194
B, Var, SE, TStat: 10.10098 10.35856 3.21847 3.13844
SST: 115005501258.94
R-Square: 0.955900
RBar-Square: 0.955490
F Stat: 2335.539818
DW: 1.092759
I have formatted the results so that they can be interpreted a little more easily below:
Coefficient T-Statistic
Intercept : 29843.97538 15.90552
IRON_STEEL: -61.85082 -7.14727
ELEC_PROD : 732.56913 65.12250
COAL_PRICE: -68.85197 -4.88194
GAS_PRICE : 10.10098 3.13844
Pindyck & Rubinfeld:
Coefficient T-Statistic
Intercept : 12.262 3.51
IRON_STEEL: 92.34 6.46
ELEC_PROD : 118.57 7.14
COAL_PRICE: -48.90 -3.82
GAS_PRICE : 118.91 3.18
I also show the Pindyck & Rubinfeld (PR) results. The two of results are broadly consistent with one another, with the exception of the sign on the iron & steel production variable. In the PR regression, coal consumption has a strong positive relationship to steel production; in my regression, the opposite is true. My suspicion here that that iron & steel production in the US has become so small and coal consumption by electric utilities so large over the past 50 years, that the iron & steel variable is swamped by other other factors.
One interesting experiment that I tried was to run the regression where each of the variables was transformed by taking their natural logs. In this special case the coefficients can be interpreted as indicating the elasticity of responsiveness.
Coefficient T-Statistic
IRON_STEEL: -.0518 -4.312
ELEC_PROD : .7123 54.232
COAL_PRICE: -.1340 -6.154
GAS_PRICE : .0463 6.942
So, the elasticity of the coal price is interpreted as such: For a 1.00% increase in the price of coal, consumption goes down by 0.1340%. A 1.00% increase in the price of gas, increases coal consumption by 0.0463%.
Interesting.
But, the real motivation for this regression was to deal with serial correlation. How do we know it it is a problem? Well that is what the Durbin-Watson statistic is supposed to help us out with. It turns out that a number near 2.0 means there is no serial correlation. In our case, the value is 1.09. In our regression with over 400 observations, any value below 1.50 would indicate the presence of serial correlation.
What to do. This is another case where we can apply the GLS estimator to fix up our results.