Preliminaries

Note that all code is available on my Github repository.

In this short project, we will apply and discuss some properties of the local level model. First, we use simulated time series to apply the Kalman Filter. Second, we discuss the properties of the Maximum Likelihood Estimator (MLE) in a small Monte Carlo simulation exercise. Within this framework, we will also analyze the pile-up problem of the MLE estimator.

Local Level Model

The Local Level Model is based on two equations, namely the measurement equation and the transition (or state) equation. The measurement equation represents a noisy signal that is observed by the agent. The true underlying data generation process in the form of the transition equation is not directly observed by the agent. An example for this framework could be as follows. When professional analysts have to forecast GDP, they might not observe a perfect signal of GDP - presumably, the data they observe contains some measurement inaccuracies. Often, the GDP data is subject to continuous revisions in the first few quarters after the first publication of the data. Using this noisy signal of GDP, the forecasters then try to “filter out” the noise in an optimal way to forecast GDP according their assumption of the underlying data generating process.

The framework of an exemplary Local Level Model is presented below. \[\begin{align*} & \text{Measurement Equation: } y_t = \mu_t + \varepsilon_t, \varepsilon_t \sim\left(0, \sigma_\varepsilon^2 \right)\\ & \text{Transition Equation: } \mu_{t+1} = \mu_t + \eta_t, \eta_t \sim\left(0, \sigma_\eta^2 \right)\\ \end{align*}\] Note that this framework could be easily extended to a case where the agents receive multiple signals, observe different exogenous variables, allow for covariances among the noises in the measurement and transition equation, etc. Also, we consider here the special case where the transition equation follows a random walk process. Coming back to the example from above, the agent is interested in guessing the distribution of the underlying data generating process, given his observations of the time series \(Y_t = y_1, y_2, \dots, y_T\), and using this signal to predict the future outcome. Therefore, we are interested in the distribution of the predicted signal density \(f(\mu_{t+1}|Y_t) \equiv N \left( a_{t+1} , p_{t+1}\right)\) (as all the noise is normal, the predicted signal will also be normally distributed). \(a_{t+1}\) is simple the conditional expectation of the future outcome of \(\mu_t\) (GDP forecast in the next quarter): \[ a_{t+1} = \mathbb{E}(\mu_{t+1} | Y_t) = \mathbb{E}( \mu_t + \eta_t| Y_t) = a_{t|t} \] where \(a_{t|t}\) is simply the result of the Kalman filter. Note that for a transition equation with \(\mu_{t+1} = \rho \mu_t + \eta_t, \rho < 1\), \(a_{t+1}= \rho a_{t|t}\). Further, we have that \[ p_{t+1} = \mathbb{V}(\mu_t + \eta_t| Y_t) = p_{t|t} + \sigma_\eta^2 \] Now, one can show that the result of the signal extraction (Kalman Filter) can be obtained by recursively solving the following system of equations (given a starting value for \(a_1\) and \(p_1\)). \[\begin{align*} & v_t = y_t - a_t \\ & k_t = \frac{p_t}{p_t + \sigma_{\varepsilon}^2} \\ & a_{t|t} = a_t + k_t v_t \\ & p_{t|t} = k_t \sigma_{\varepsilon}^2 \\ & a_{t+1} = a_{t|t} \\ & p_{t+1} = p_{t|t} + \sigma_{\eta}^2 \end{align*}\] Given some starting values, we are now ready to calculate the Kalman Fitler and the predicted signal. Note that we also define \(f_t = p_t + \sigma_{\varepsilon}^2\).

Kalman Filter

In this section, we simply simulate several time series with different parameter values. In detail, we use the model set-up described above together with initial conditions of \(\mu_1 \sim N (a_1 = 0, p_1 = 10^7)\) and variances \(\sigma_{\varepsilon}^2 = 1\) and \(\sigma_{\eta}^2 = q\) with \(q\in \{10,1,0.1,0.001\}\).

Note that we simulated 100 time periods.

For the Kalman Filter that we will calculate below, a parameter of interest is the signal-to-noise ratio. This ratio is defined given by \(\frac{\sigma_\eta^2}{\sigma_\varepsilon^2}\). In our example above, the variance of the data the agent observes, \(\sigma_\varepsilon^2\), is fixed to 1. However, the variance of the data generating process, \(\sigma_\eta^2\), varies. As \(\sigma_\varepsilon^2\) is fixed to 1, \(q\) directly indicates the signal-to-noise ratio. The higher this ratio is, the more of the variation the agent observes in the time series \(y_t\) actually stemms from variations in the true data generating process. In that sense, the signal observed is very informative about the true data generating process. In other words, the signal \(y_t\) observed is a very good indicator of the true \(mu_t\). This is represented in the graph - The higher the signal-to-noise ratio, the closer the two time series are. Consequently, the more informative the signal observed by the agent is about the \(mu_t\).

Using the generated time series of the signal, we can now calculate the Kalman filter as described below. The code snippet is attached as well.

KF <- data.frame(matrix(NA,nrow = t*(NROW(q)), ncol = 9))
colnames(KF) <- c("v.t","k.t","a.t.t","p.t.t","a.t","p.t","y.t","q","t")

x <- 1
for (i in q){
  
  Y.t <- df.ts %>%
    dplyr::filter(q == i) %>%
    dplyr::select(y.t) %>%
    unlist(.)
  
  Mu.t <- df.ts %>%
    dplyr::filter(q == i) %>%
    dplyr::select(mu.t) %>%
    unlist(.)
  
  # var eta
  var.eta <- i
  
  # Allocation
  A.t <- rep(0, t)
  P.t <- rep(0, t)
  V.t  <- rep(0, t)
  A.t.t <- rep(0, t)
  P.t.t <- rep(0, t)
  K.t     <- rep(0, t)
  
  # Initialization:
  # A.t already initiliazed with 0
  P.t[1] <- var.vareps*10^7
  
  for (j in 2:(t)){

    V.t[j] <- Y.t[j] - A.t[j-1]
    K.t[j] <- P.t[j-1]/(P.t[j-1] + var.vareps)
    A.t.t[j] <- A.t[j-1] + K.t[j] * V.t[j]
    P.t.t[j] <- K.t[j] * var.vareps
    
    A.t[j] <- A.t.t[j]
    P.t[j] <- P.t.t[j] + var.eta
    
  }
  
  # indexes for dataframe
  i1 <- 1 + (t*(x-1))
  i2 <- t*x
    
  temp <- cbind(V.t,K.t,A.t.t,P.t.t,A.t,P.t,Y.t,i,c(1:t))
  rownames(temp) <- NULL
  
  x <- x + 1
  
  KF[i1:i2,] <- temp
  
}

We plot below the filtered series \(a_{t|t}\) versus the observed time series \(y_t\).

Moreover, we can plot the Kalman gain. The higher the Kalman gain, the faster the expectation adjusts to the noise and therefore the more abrupt the filtered series will adjust. The Kalman gain is also higher for higher signal-to-noise ratios. Hence, the higher the Kalman gain, the faster the adaption and the more weight will be given to recent observations. Also, the higher the share of the noise, the longer it takes for the Kalman gain to converge, as it can be seen from the plot below.

Monte Carlo Simulation

In a next exercise, we simulate 100 time series for each value of \(q\) of different lengths, namely \(T=50,100,200,1000\) (for the data generating process and the observed signal). Then, we estimate the two variance parameters using maximum likelihood (ML). Finally, we will analyze the properties of the ML estimators and discuss the pile-up problem. Note that, with the optimization algorithm used in this exercise, the estimated variance turned out to be negative. To avoid such results, we expressed everything in standard deviations in the code below.

Below, we plot the 100 different series of length \(T=100\) for \(y\), the true data generating process, once with variance \(q=0.1\) and once with variance \(q=0.001\). The signals therefore correspond to these series plus the noise.

Next, we create a function to estimate the Kalman Filter that takes as input the signals generated for each variance \(q\) and the dataframes of different lenghts regarding the time dimension. Note that we use the standard deviations of \(\varepsilon\) and \(\eta\) instead of the variances as input. This is due to the optimization procedure. It has produced more stable results and squaring the standard deviation guarantees positive variance.

Kalman.Filter.sd <- function(y.t,sd.vareps,sd.eta){
  # As input, signal y.t is needed and the variances of the noise in the transition and measurement equation
  t <- length(y.t)
  
  # Allocation
  df.KF <- c()
  a.t <- rep(0, t)
  p.t <- rep(0, t)
  v.t  <- rep(0, t)
  a.t.t <- rep(0, t)
  p.t.t <- rep(0, t)
  k.t     <- rep(0, t)
  f.t     <- rep(0, t)
  
  # Initialization:
  # A.t already initiliazed with 0
  p.t[1] <- (sd.vareps)^2*10^7
  
  for (j in c(2:t)){

    v.t[j] <- y.t[j] - a.t[j-1]
    k.t[j] <- p.t[j-1]/(p.t[j-1] + (sd.vareps)^2)
    a.t.t[j] <- a.t[j-1] + k.t[j] * v.t[j]
    p.t.t[j] <- k.t[j] * (sd.vareps)^2
    
    # For log-likelihood:
    f.t[j] <- (p.t[j-1] + (sd.vareps)^2)
    
    # update
    a.t[j] <- a.t.t[j]
    p.t[j] <- p.t.t[j] + sd.eta^2
    
  }
  
  var.eta <- sd.eta^2
  
  temp <- cbind(v.t,k.t,a.t.t,p.t.t,a.t,p.t,f.t,y.t,var.eta,c(0:(t-1))) # we discard the initial observation 
  rownames(temp) <- NULL
  
  df.KF<- as.data.frame(temp[2:t,] )
  
  return(df.KF)
  
} 

We also have to define the log likelihood function which we use to estimate the variances (or here, standard deviations, \(\theta\)) in our simulated data. The log-likelihood function of the Kalman Filter is given as \[ \text{log} L^{KF}(\theta) = -\frac{n}{2}\ln(2\pi) -\frac{1}{2}\sum_{t=2}^n \ln(f_t) - \sum_{t=2}^n \frac{v_t^2}{f_t} \]

The function can be coded as shown below in R.

Log.Lik.sd <- function(thetas){
  # The function takes as inputs the guesses for the standard deviations of eta and varepsilon. With these guesses, it calculates the maximum likelihood.
  
  df.KF <- Kalman.Filter.sd(y.t, thetas[1], thetas[2]) 
  
  log.lik <- -0.5*(NROW(df.KF))*log(2*pi) - 0.5 * sum(log(df.KF$f.t),na.rm=T) - 0.5 * sum((df.KF$v.t^2)/(df.KF$f.t), na.rm = T)
  
  return(-log.lik)
}

We then estimate the standard deviations for the four different sample sizes and plot the distribution of the 100 variances for every sample size and signal-to-noise ratio \(q\) below.

Pile-up Problem

Looking at the figures of the signal-to-noise ratio distribution, we observe a left-skewed distribution of the estimates, especially when the signal-to-noise ratio is small (and the time dimension \(T\) is very limited). This pile-up of estimates at zero when using maximum likelihood to estimate a small parameter is well documented in the literature (Kim and Kim (2020), Stock (1994), Stock and Watson (1998)).

Kim, Chang-Jin, and Jaeho Kim. 2020. “Trend-Cycle Decompositions of Real GDP Revisited: Classical and Bayesian Perspectives on an Unsolved Puzzle.” Macroeconomic Dynamics, 1–25.
Stock, James H. 1994. “Unit Roots, Structural Breaks and Trends.” Handbook of Econometrics 4: 2739–2841.
Stock, James H, and Mark W Watson. 1998. “Median Unbiased Estimation of Coefficient Variance in a Time-Varying Parameter Model.” Journal of the American Statistical Association 93 (441): 349–58.