--- title: "Feasible Multivariate GARCH Models" author: "Alexios Galanos" date: "`r Sys.Date()`" output: pdf_document: toc: yes extra_dependencies: ["amsmath"] number_sections: yes link-citations: yes colorlinks: true linkcolor: red urlcolor: teal citecolor: teal toccolor: "teal" bibliography-style: apalike natbiboptions: round bibliography: tsmarch.bib vignette: > %\VignetteIndexEntry{Feasible Multivariate GARCH Models} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r setup, include=FALSE} knitr::opts_chunk$set( echo = TRUE, warning = FALSE, message = FALSE, comment = "##", R.options = list(width = 60) ) ``` # Introduction {#sec-introduction} The `tsmarch` package represents a re-write and re-think of the models in [rmgarch](https://CRAN.R-project.org/package=rmgarch). It is written using simpler S3 methods and classes, has a cleaner code base, extensive documentation and unit tests, provides speed gains by making use of parallelization in both R (via the `future` package) and in the C++ code (via [RcppParallel](https://CRAN.R-project.org/package=RcppParallel) package), and works with the new univariate GARCH package [tsgarch](https://CRAN.R-project.org/package=tsgarch). To simplify usage, similar to the approach adopted in `tsgarch`, conditional mean dynamics are not included. Since the distributions are location/scale invariant, the series must be pre-filtered for conditional mean dynamics before submitting the data, but there is an option to pass the conditional_mean (`cond_mean`) as an argument in both the setup specification, filtering, prediction and simulation methods so that it is incorporated into the output of various methods at each step of the way. Alternatively, the user can location shift (re-center) the simulated predictive distribution matrix by the pre-calculated conditional mean forecast, or pass the zero mean correlated predictive distribution innovations as an input to the conditional mean prediction simulation dynamics. The nature of the models implemented allows for separable dynamics in estimation which means that individual series have univariate dynamics which can be estimated in parallel. However, the drawback to this is that for standard errors a partitioned Hessian must be calculated which may be very time consuming. While the package defaults to the use of OPG (outer of product of gradients) for standard errors which is fast, there is always the option to calculate the full partitioned Hessian for other types of standard error estimators. The next sections provide a detailed overview of each of the 3 models available, namely the Dynamic Correlation GARCH model of @Engle2002 (**DCC**), the Copula-DCC model (see @Patton2009 and @Ausin2010) (**CGARCH**) and the Generalized Orthogonal GARCH model with multivariate affine Generalized Hyperbolic Distribution (see @Chen2007 and @Broda2009) (**GOGARCH**). Code snippets are provided in each section to highlight the code implementation, whilst Section \ref{sec-methods} provides diagrams for the methods in the package. # Covariance Decomposition Models (`dcc` and `cgarch`) {#sec-cd1} The covariance decomposition type models decompose the dynamics of the conditional covariance into the conditional standard deviations and correlations, so that they may be expressed in such a way that the univariate and multivariate dynamics may be separated, thus simplifying the estimation process. This decomposition allows a great deal of flexibility by defining individual univariate dynamics for each marginal series, and separate dynamics for the conditional correlation, using a 2-stage estimation process. ## Constant Correlation {#sec-cd1-ccc} In the Constant Conditional Correlation (**CCC**) model of @Bollerslev1990 the equation for the conditional covariance can be represented as: \begin{equation} H_t = D_t R D_t = \rho_{ij}\sqrt{h_{iit}h_{jjt}}, \label{eq:1} \end{equation} where $D_t = diag(\sqrt {h_{11,t}} ,...,\sqrt{h_{nn,t}})$, and $R$ is the positive definite constant correlation matrix. The conditional variances, $h_{ii,t}$, which can be estimated separately, usually admit some type of GARCH(p,q) model, but can be generalized to any admissible model for the variance dynamics. The conditions for the positivity of the covariance matrix $H_t$ are that $R$ is positive definite and the diagonal elements of $D_t$, $h_{ii,t}$, are positive. Formally, the model may be represented as follow: \begin{equation} \begin{aligned} y_{t}&=\mu_{t}+\varepsilon_{t},\varepsilon_{t}\sim N\left({0,\Sigma_{t}}\right)\\ \Sigma_{t}&=D_{t}RD_{t}\\ z_{t}&=D_{t}^{-1}\varepsilon_{t}\\ LL_{t}\left(\theta\right)&=\frac{1}{2}\left(n \log\left(2\pi\right)+2\log\left|D_t\right|+\log\left|R\right|+z'_tR^{-1}z_t\right) \end{aligned} \label{eq:2} \end{equation} where $y_t$ is a vector representing the series values at time $t$, $\mu_t$ a vector of the conditional mean at time t for each series, $\varepsilon_t$ a vector of zero mean residuals which have a multivariate Normal distribution with conditional covariance $\Sigma_t$, and $LL_t\left(\theta\right)$ is the log-likelihood at time $t$ given a parameter vector $\theta$ of univariate GARCH (or any other model) parameters. The constant correlation matrix $R$ is usually approximated by its sample counterpart as \begin{equation} R = \frac{1}{T}\sum\limits_{t=1}^{T}{z_t z_t'} \end{equation} where $z_t$ is the vector of GARCH standardized residuals at time $t$. **Code Snippet** ```{r,eval=FALSE} dcc_modelspec(..., dynamics = 'constant') ``` ## Dynamic Conditional Correlation {#sec-cd1-dcc} Allowing the correlation to be time varying, whilst retaining the 2-stage separation of dynamics, creates an additional challenge to the positive definiteness constraint at each point in time for the conditional correlation. @Engle2002 proposed the following proxy process : \begin{equation} \begin{aligned} Q_t &= \bar Q + \sum\limits_{j=1}^{q}{a_j\left(z_{t-j}z'_{t-j}-\bar Q\right)} + \sum\limits_{j=1}^{p}{b_j\left(Q_{t-j}-\bar Q\right)}\\ &= \left(1 - \sum\limits_{j=1}^{q}{a_j} - \sum\limits_{j=1}^{p}{b_j}\right)\bar Q + \sum\limits_{j=1}^{q}{a_j\left(z_{t-j}z'_{t-j}\right)} + \sum\limits_{j=1}^{p}{b_j Q_{t-j}} \end{aligned} \label{eq:3} \end{equation} where $a_j$ and $b_j$ are non-negative coefficients with the constraint that they sum to less than 1 for stationarity and positive definiteness, and $\bar Q$ is the unconditional matrix of the standardized residuals $z_t$. To recover $R_t$, the following re-scaling operation is performed: \begin{equation} R_t = \text{diag}\left(Q_t\right)^{-1/2}Q_t\text{diag}\left(Q_t\right)^{-1/2} \label{eq:4} \end{equation} Under the assumption that $\varepsilon_t\sim N\left(0,\Sigma_t\right)$, the log-likelihood is: \begin{equation} \begin{aligned} LL_t\left(\theta\right) &= -\frac{1}{2}\left(n \log\left(2\pi\right) + 2\log\left|D_t\right| + z_t z'_t\right) - \frac{1}{2}\left(z_t z'_t + \log\left|R_t\right| + z'_t R^{-1}_t z_t\right)\\ &= LL_{V,t}(\phi) + LL_{R,t}\left(\psi|\phi\right) \end{aligned} \label{eq:5} \end{equation} where $LL_{V,t}(\phi)$ is the log-likelihood of the conditional variances given the GARCH parameters $\phi$, and $LL_{R,t}\left(\psi|\phi\right)$ is the log-likelihood of the conditional correlation given the DCC parameters $\psi|\phi$ (conditioned on the GARCH parameters). **Code Snippet** ```{r,eval=FALSE} dcc_modelspec(..., dynamics = 'dcc') ``` ## Estimation {#sec-cd1-estimation} Estimation of the CCC and DCC models is done in 2 steps. The first step is to estimate the conditional variances for each series using a univariate GARCH model. The univariate models do not need to conform to the any single type of dynamics, but can be any admissible model. The conditional variances are then used to standardize the residuals which are then used to estimate the conditional correlation matrix. Since the first step is separable, it can be done in parallel for each series. After considerable experimentation, it was chosen to avoid the use of TMB for automatic differentiation and instead use numerical derivatives for the second stage estimation due to speed considerations, and observing that there was little gain in terms of accuracy. **Code Snippet** ```{r,eval=FALSE} dcc_modelspec(...) |> estimate(...) ``` ## Standard Errors {#sec-cd1-se} Since estimation is performed in 2 stages, the log-likelihood is the sum of these 2 stages. More specifically, the second stage log-likelihood is estimated conditioning on the parameters of the first stage. Details of the assumptions behind this and the arguments for the asymptotic distribution of the estimates are put forth in @Engle2001, who posit that the asymptotic distribution of the 2-stage DCC model estimates with parameters $\theta = \left(\phi, \psi\right)$ is: \begin{equation} \sqrt{T}\left(\hat\theta_T - \hat\theta_0\right) \overset{\text{A}}{\sim} {N\left(0, B^{-1}_0 M_0 {B'}^{-1}_0\right)}\\ \label{eq:6} \end{equation} where: \begin{equation} B_0 = \begin{bmatrix} \nabla_{\phi\phi} \ln f_1(\phi_0) & 0 \\ \nabla_{\phi\psi} \ln f_2(\phi_0) & \nabla_{\psi\psi} \ln f_2(\theta_0) \end{bmatrix} = \begin{bmatrix} B_{11} & 0 \\ B_{12} & B_{22} \end{bmatrix} \label{eq:7} \end{equation} and \begin{equation} M_0 = \text{var} \left[ \sum_{t=1}^{T} \left\{ T^{-1/2} \nabla_{\phi}' \ln f_1(r_t, \phi_0), T^{-1/2} \nabla_{\psi}' \ln f_2(r_t, \phi_0, \psi_0) \right\} \right] = \begin{bmatrix} M_{11} & M_{12} \\ M_{12} & M_{22} \end{bmatrix} \label{eq:8} \end{equation} where $B_0^{-1}$ and $M_0$ are the Hessian and the covariance matrix of the score vector respectively, also called the bread and meat of the sandwich estimator. In the case of this 2 stage estimation procedure, the Hessian is partitioned into 4 blocks, $B_{11}$, $B_{12}$, $B_{22}$ and $M_{11}$, $M_{12}$, $M_{22}$, where $B_{11}$ and $M_{11}$ are the Hessian and covariance matrix of the first stage log-likelihood, and $B_{22}$ and $M_{22}$ are the Hessian and covariance matrix of the second stage log-likelihood. $B_{12}$ and $M_{12}$ are the cross derivatives of the first and second stage log-likelihoods. Because the Hessian is partitioned, the calculation of the standard errors is more computationally intensive than the standard OPG (outer product of gradients) method. The package defaults to the OPG method, but the user can choose to calculate the full partitioned Hessian if they so wish, for which parallel functionality has been implemented to provide some speed gains. The package uses numerical derivatives for the second stage estimation, as well as the calculation of the standard errors. When the partitioned Hessian is present, a number of additional estimators become available including the those of the QML and HAC (using the bandwidth of @Newey1994) type. **Code Snippet** ```{r,eval=FALSE} dcc_modelspec(...) |> estimate(...) |> vcov(...) ``` ## Distributions The original DCC model was based on the assumption that the residuals are Normally distributed, as shown in equations \eqref{eq:2} and \eqref{eq:5}. However, this is not always the case in practice. There are 2 possible approaches to this problem. The first is to use a different multivariate distribution for the DCC model and the second is to use a copula approach. ### Multivariate Student Distribution (`mvt`) Introducing a multivariate distribution which requires estimation of higher moment parameters implies that the univariate margins must also have the same distribution and usually the same higher moment parameters. This in turn implies the need for joint estimation of all dynamics in the model with specific constraints on the joint parameters. This is computationally intensive and may not be feasible for anything but a few series. However, it may be possible to estimate the higher moment parameters using QML (quasi-maximum likelihood) for the first stage (essentially use a Normal distribution for the residuals), and then use these estimates in the second stage estimation. This is the approach taken in the `tsmarch` package when using the multivariate Student distribution with log-likelihood equal to: \begin{equation} \begin{aligned} LL_t\left(\theta, \nu| \phi\right) & = \log\Gamma\left(\frac{\nu + n}{2}\right) - \log\Gamma\left(\frac{\nu}{2}\right) -\frac{n}{2}\log\left(\pi\left(\nu - 2\right)\right)\\ &- \frac{1}{2}\log\left|R_t\right| - \log\left|D_t\right| - \frac{\nu + 2}{2}\log\left(1 + \frac{1}{\nu-2}z_t' R_t^{-1} z_t\right) \end{aligned} \label{eq:9} \end{equation} where $\nu$ is the degrees of freedom parameter of the Student distribution, and $n$ is the number of series. The assumption post-estimation is that the marginal distribution of each series is Student with the same degrees of freedom parameter $\nu$. **Code Snippet** ```{r,eval=FALSE} dcc_modelspec(distribution = 'mvt') ``` ### Copula Distribution (`cgarch`) The copula approach is a more general approach which allows for the use of different marginal distributions for each series. In simple terms, the copula is a function which links the marginal distributions to the joint distribution by applying a transformation to the marginal residuals to make them uniformly distributed. The joint distribution is then obtained by applying the inverse distribution of the copula to the uniform margins. Specifically: \begin{equation} \begin{aligned} u_t & = C\left(F_1\left(z_{1,t}\right),\ldots,F_n\left(z_{n,t}\right)\right)\\ \epsilon_t & = F^{-1}\left(u_t\right) \end{aligned} \label{eq:10} \end{equation} where $F$ is the marginal distribution function, $F^{-1}$ is the inverse of the multivariate distribution function, $C$ is the copula function, and $u_t$ is the vector of uniform margins. The transformation of the margins is called the Probability Integral Transform (PIT). There are a number of ways to transform the margins to uniform distributions, including a parametric transform (also called the Inference-Functions-For-Margins by @Joe1997), using the empirical CDF (also called pseudo-likelihood and investigated by @Genest1995) and the semi-parametric approach (see @Davison1990). All three approaches are implemented in the `tsmarch` package. The package implements both a Normal and Student copula, with either constant or dynamic correlation. For the Student Copula, the log-likelihood is given by: \begin{equation} \begin{aligned} LL_t\left(\theta\right) & = \sum\limits_{i=1}^{n} \left( -\frac{1}{2} \log h_{i,t} + \log f_{i,t}(z_{i,t}) \right) + \log c_t\left(u_{i,t},\ldots, u_{n,t}\right)\\ & = LL_{V,t}\left(\phi\right) + LL_{R,t}\left(\psi,\eta|\phi\right) \end{aligned} \label{eq:11} \end{equation} where \begin{equation} \log c_t\left(u_{1,t}, \ldots, u_{n,t}\right) = \log f_t\left(F_i^{-1}\left(u_{1,t}|\eta\right), \ldots, F_i^{-1}\left(u_{n,t}|\eta\right) | R_t,\eta \right) - \sum\limits_{i=1}^{n} \log f_i\left( F_i^{-1}(u_{i,t})|\eta \right) \label{eq:12} \end{equation} with $u_{i,t} = F_{i,t}\left(\varepsilon_{i,t}|\phi_i \right)$ is the PIT transformation of each series by it's conditional distribution estimated in the first stage GARCH process by any choice of dynamic and distributions, $F_i^{-1}\left(u_{i,t}|\eta\right)$ represents the quantile transformation of the uniform margins subject to a common shape parameter of the multivariate Student Copula, and $f_t\left( F_i^{-1}\left(u_{1,t}|\eta\right), \ldots, F_i^{-1}\left(u_{n,t}|\eta\right) | R_t,\eta \right)$ is the joint density of the Student Copula with shape parameter $\eta$ and correlation matrix $R_t$. Therefore, the log-likelihood is composed of essentially 3 parts: the first stage univariate models, the univariate PIT transformation and inversion given the Copula distribution and the second stage multivariate model. The correlation can be modeled as either constant or dynamic in the copula model. The dynamic correlation is modeled in the same way as the DCC model, but the residuals are transformed to uniform margins before estimating the correlation. Appendix \ref{sec-appendix-correlation} provides a more detailed overview for the constant correlation case. Essentially, the difference between the standard DCC type model and the one with a Copula distribution involves the PIT transformation step, which is reflected in the inference (standard errors), forecasting and simulation methods for this model. **Code Snippet** ```{r,eval=FALSE} cgarch_modelspec(...) ``` ## Generalized Dynamics {#sec-cd1-extensions} A number of extensions to the DCC model have been proposed in the literature. These include the use of asymmetric and generalized dynamics. @Cappiello2006 generalize the DCC model with the introduction of the Asymmetric Generalized DCC (AGDCC) where the dynamics of $Q_t$ now become: \begin{equation} Q_t = \left(\bar Q - A'\bar Q A - B'\bar Q B - G'\bar N G\right) + \sum\limits_{j=1}^{q}{A'_j z_{t-j}z'_{t-j}A_j} + \sum\limits_{j=1}^{q}{G'_j{\bar z}_{t-j}{\bar z}'_{t-j}G_j} + \sum\limits_{j=1}^{p}{B'_j Q B_j} \label{eq:13} \end{equation} where A, B and G are the $N\times N$ parameter matrices, $\bar z_t$ are the zero-threshold standardized residuals (equal to $z_t$ when less than zero else zero otherwise), $\bar Q$ and $\bar N$ are the unconditional matrices of $z_t$ and $\bar z_t$ respectively. The generalization allows for cross dynamics between the standardized residuals, and cross-asymmetric dynamics. However, this comes at the cost of increased computational complexity making it less feasible for large scale systems. Additionally, covariance targeting in such a high dimensional model where the parameters are no longer scalars creates difficulties in imposing positive definiteness constraints. The `tsmarch` package does not implement this generalized model, but instead a scalar version to capture asymmetric dynamics (`adcc`): \begin{equation} Q_t = \left(1 - \sum\limits_{j=1}^{q}{a_j} - \sum\limits_{j=1}^{p}{b_j}\right)\bar Q - \sum\limits_{j=1}^{q}{g_j}\bar N + \sum\limits_{j=1}^{q}{a_j\left(z_{t-k}z'_{t-j}\right)} + \sum\limits_{j=1}^{q}{g_j\left(\bar z_{t-k}\bar z'_{t-j}\right)} + \sum\limits_{j=1}^{p}{b_j Q_{t-j}} \label{eq:adcc} \end{equation} **Code Snippet** ```{r,eval=FALSE} cgarch_modelspec(..., dynamics = 'adcc') ``` ## Forecasting and Simulation {#sec-cd1-forecasting} Because of the non linearity of the DCC evolution process, the multi-step ahead forecast of the correlation cannot be directly solved. @Engle2001 provide a couple of suggestions which form a reasonable approximation. In the `tsmarch` package we instead use a simulation approach, similar to the other packages in the `tsmodels` universe, and in doing so generate a simulated distribution of the h-step ahead correlation and covariance. This also generalizes to the copula model which does not have a closed form solution. Formally, for a draw $s$ at time $t$, the simulated correlation is given by: \begin{equation} \begin{aligned} Q^s_t & = \bar Q + \sum\limits_{j=1}^{q}{a_j\left(z^s_{t-j}{z^s}'_{t-j}-\bar Q\right)} + \sum\limits_{j=1}^{p}{b_j\left({Q^s}_{t-j}-\bar Q\right)}\\ R^s_t & = \text{diag}\left({Q^s}_t\right)^{-1/2}{Q^s}_t\text{diag}\left({Q^s}\right)^{-1/2}\\ z^s_t & = E \Lambda^{1/2} {\epsilon^s}_t \end{aligned} \label{eq:14} \end{equation} where ${\epsilon^s}_t$ is a vector of standard Normal random variables, $\Lambda$ is the diagonal matrix of the eigenvalues of ${R^s}_t$ and $E$ is the matrix of eigenvectors of ${R^s}_t$, such that: \begin{equation} {R^s}_t = E \Lambda E' \label{eq:15} \end{equation} For the multivariate Student distribution, the draws for $z^s_t$ are generated as : \begin{equation} z^s_t = E \Lambda^{1/2} {\epsilon^s}_t \sqrt{\frac{\nu}{W^s}} \label{eq:16} \end{equation} where $W^s$ is a random draw from a chi-squared distribution with $\nu$ degrees of freedom. For ${\epsilon^s}_t$, it is also possible to instead use the empirical standardized and de-correlated residuals, an option which is offered in the package. **Code Snippet** ```{r,eval=FALSE} dcc_modelspec(...) |> estimate(...) |> predict(...) dcc_modelspec(...) |> estimate(...) |> simulate(...) ``` ## Weighted Distribution {#sec:dcc_weighted} Since all predictions are generated through simulation, the weighted predictive distribution $\bar {y_t}^s$ can be obtained as: \begin{equation} \bar {y_t}^s|w = \sum\limits_{i=1}^{n} w_{i,t} {y_{i,t}}^s \end{equation} where $s$ is a sample draw from the simulated predictive distribution of the $n$ series, and $w_{i,t}$ the weight on series $i$ at time $t$. **Code Snippet** ```{r,eval=FALSE} dcc_modelspec(...) |> estimate(...) |> tsaggregate(...) ``` ## News Impact Surface @Engle1993 introduced the news impact curve as a tool for evaluating the effects of news on the conditional variances, and further extended by @Kroner1998 to the multivariate GARCH case. The news impact surface shows the effect of shocks from a pair of assets on their conditional covariance (or correlation). Formally, the correlation $R_{i,j}$ given a set of shock $\left(z_i,z_j\right)$ is given by: \begin{equation} R_{i,j}\left(z_i,z_j\right) = \frac{\left(1 - a - b\right)\bar{Q}_{i,j} - g\bar N_{i,j} + a z_{i} z_{j} + g \bar z_{i}\bar z_{j} + b \bar Q_{i,j}}{\sqrt{\left[\left(1 - a - b\right)\bar{Q}_{i,i} - g\bar N_{i,i} + a z_{i}^2 + g \bar z_{i}^2 + b \bar Q_{i,i}\right] \cdot \left[\left(1 - a - b\right)\bar{Q}_{j,j} - g\bar N_{i,j} + a z_{j}^2 + g \bar z_{j}^2 + b \bar Q_{j,j}\right]}} \label{eq:dccnewsimpact} \end{equation} where $\bar z$, $\bar Q$ and $\bar N$ are defined as in equation \eqref{eq:13}. **Code Snippet** ```{r,eval=FALSE} dcc_modelspec(...) |> estimate(...) |> newsimpact(...) |> plot(...) ``` ## Critique {#sec-cd1-critique} The DCC model has come under a considerable amount of criticism in the literature, particularly by @Caporin2008 who argue that consistency and asymptotic Normality of the estimators has not been fully established, and proposes instead the indirect DCC model (essentially a scalar BEKK) which models directly the conditional covariance matrix and forgoes univariate GARCH dynamics. @Aielli2013 also points out that the estimation of $Q_t$ as the empirical counterpart of the correlation matrix of $z_t$ is inconsistent since $E\left[z_t z'_t\right] = E\left[R_t\right] \neq E\left[Q_t\right]$. They propose a DCC (cDCC) model which includes a corrective step which eliminates this inconsistency, albeit at the cost of targeting which is not allowed. # Generalized Orthogonal GARCH Model (`gogarch`) {#sec-gogarch} The GOGARCH model in the `tsmarch` package is based on the approach described in @Chen2007, @Broda2009 and @Ghalanos2015. Unlike the covariance decomposition models, the Generalized Orthogonal GARCH model is akin to a full distributional decomposition model rather than just the covariance, since it makes use of higher moments to achieve independent margins from which univariate models can then be used to estimate the conditional dynamics (see @Schmidt2006 for details on the multivariate affine Generalized Hyperbolic distribution). Formally, the stationary return series $y_t$ follows a linear dynamic model: \begin{equation} \begin{aligned} y_t &= \mu_t + \varepsilon_t\\ \varepsilon_t &= A f_t\\ \end{aligned} \label{eq:17} \end{equation} where $A$ is an invertible and constant mixing matrix, which transforms the independent components $f_t$ back into the residuals $\varepsilon_t$. This can be further decomposed into de-whitening and rotation matrices, $\Sigma^{1/2}$ and $U$ respectively: \begin{equation} A = \Sigma^{1/2}U \label{eq:18} \end{equation} where $\Sigma$ is the unconditional covariance matrix of the residuals and performs the role of de-correlating (whitening) the residuals prior to rotation (independence). It is at this juncture where dimensionality reduction may also be performed, by selecting the first $m \left(\le n\right)$ principal components of the residuals. In that case $\varepsilon_t \approx \hat \varepsilon_t = A f_t$ since the dimensionality reduced system only approximates the original residuals. The rotation matrix $U$ is calculated through the use of Independent Component Analysis (ICA) which is a method separating multivariate mixed signals into additive statistically independent and non-Gaussian components\footnote{Technically, up to 1 component may be Gaussian.} (see @Hyvarinen2000). The algorithm used in the `tsmarch` package for ICA is based on the RADICAL algorithm of @Learned2003 which uses an efficient entropy estimator (due to @Vasicek1976) which is robust to outliers and requires no strong characterization assumptions about the data generating process.\footnote{To speed estimation, parts of the ICA algorithm are parallelized in C++ using `RcppParallel` and hence benefits from multi-threading.} The independent components $f_t$ can be further decomposed as: \begin{equation} f_t = H_t^{1/2} z_t \label{eq:19} \end{equation} where $H_t$ is the conditional covariance matrix of the independent components with $H_t = E[f_t f_t' | \mathfrak{F}_{t-1}]$, being a diagonal matrix with elements $(h_{1,t},\ldots,h_{n,t} )$ which are the conditional variances of the factors, and $z_t = (z_{1,t},\ldots,z_{n,t})'$. The random variable $z_{i,t}$ is independent of $z_{jt-s}$ $\forall j \neq i$ and $\forall s$, with $E[z_{i,t}|\mathfrak{F}_{t-1}]=0$ and $E[z_{i,t}^2]=1$, implying that $E[f_t|\mathfrak{F}_{t-1}] = \mathbf{0}$ and $E[\varepsilon_t|\mathfrak{F}_{t-1}] = \mathbf{0}$. The factor conditional variances, $h_{i,t}$ can be modeled as a GARCH-type process. The unconditional distribution of the factors is characterized by: \begin{equation} E[f_t] = \mathbf{0} \quad E[f_t f'_t] = I_n \label{eq:20} \end{equation} which, in turn, implies that: \begin{equation} E[\varepsilon_t] = \mathbf{0} \quad E[\varepsilon_t\varepsilon'_t] = A A'. \label{eq:21} \end{equation} It then follows that the return series can be expressed as: \begin{equation} y_t = \mu_t + A H_t^{1/2} z_t. \label{eq:22} \end{equation} The conditional covariance matrix, $\Sigma_t \equiv E[(y_t -\mu_t)(y_t - \mu_t)'|\mathfrak{F}_{t-1}]$ is given by $\Sigma_t = A H_t A'$. The Orthogonal Factor model of @Alexander2002\footnote{When $U$ is restricted to be an identity matrix, the model reduces to the Orthogonal Factor model.} which uses only information in the covariance matrix, leads to uncorrelated components but not necessarily independent unless assuming a multivariate Normal distribution. However, while whitening is not sufficient for independence, it is nevertheless an important step in the pre-processing of the data in the search for independent factors, since by exhausting the second order information contained in the covariance matrix it makes it easier to infer higher order information, reducing the problem to one of rotation (orthogonalization). **Code Snippet** ```{r,eval=FALSE} gogarch_modelspec(...) ``` ## Estimation The estimation of the GOGARCH model is performed in 3 steps. 1. Filter the data for conditional mean dynamics ($\mu_{i,t}$). Similar to the DCC model, this is performed outside the system. 2. Perform Independent Component Analysis (ICA) on the residuals to separate out the additive and statistically independent components. 3. Estimate the conditional distribution of each independent component using an admissible univariate model. In the `tsmarch` package we use the `tsgarch` package for estimating the GARCH model for each factor with either a Generalized Hyperbolic (`gh`) or Normal Inverse Gaussian (`nig`) distribution\footnote{Technically, the `nig` distribution is a special case of the `gh` distribution with $\lambda = -0.5$, but since is possesses some nice properties, it is included as a separate distribution for speed purposes.} The log-likelihood of the GOGARCH model is composed of the individual log-likelihoods of the univariate independent factor GARCH models and a term for the mixing matrix: \begin{equation} LL_t\left(\theta, A\right) = \log\left|A^{-1}\right| + \sum\limits_{i=1}^{n} \log\left(GH_{\lambda_i}\left(\theta_i\right)\right) \label{eq:23} \end{equation} where $GH_{\lambda_i}$ is the Generalized Hyperbolic distribution with shape parameter $\lambda_i$ for the $i$-th factor. Since the inverse of the mixing matrix is the un-mixing matrix $A^{-1} = W = K'U$, and since the determinant of $U$ is 1 ($|U|=1$), the the log-likelihood simplifies to: \begin{equation} LL_t\left(\theta, A\right) = \log\left|K\right| + \sum\limits_{i=1}^{n} \log\left(GH_{\lambda_i}\left(\theta_i\right)\right) \label{eq:24} \end{equation} In the presence of dimensionality reduction, $K$ is no longer square and we cannot directly compute its determinant. Instead, we use the determinant of $K K'$ , to reflect the combined effect of whitening and dimensionality reduction: \begin{equation} LL_t\left(\theta, A\right) = \log\left|K K'\right| + \sum\limits_{i=1}^{m} \log\left(GH_{\lambda_i}\left(\theta_i\right)\right) \label{eq:25} \end{equation} where $m$ are the number of factors. A key advantage of this approach is the ability to estimate multivariate dynamics using only univariate methods, providing for feasible estimation of large scale systems. The speed advantage of this approach does not come for free, as there is a corresponding need for memory to handle the parallelization offered, particularly when generating higher co-moment matrices. **Code Snippet** ```{r,eval=FALSE} gogarch_modelspec(...) |> estimate(...) ``` ## Co-Moments The linear affine representation of the GOGARCH model provides for a closed-form expression for the conditional co-moments. The first 3 co-moments are given by: \begin{equation} \begin{aligned} M^2_t &= A M^2_{f,t} A'\\ M^3_t &= A M^3_{f,t} \left(A' \otimes A'\right)\\ M^4_t &= A M^4_{f,t} \left(A' \otimes A' \otimes A'\right) \end{aligned} \label{eq:26} \end{equation} where $M_{f,t}^2$, $M_{f,t}^3$ and $M_{f,t}^4$ are the $(N \times N)$ conditional covariance, $(N \times N^2)$ the conditional third co-moment matrix and the $(N \times N^3)$ conditional fourth co-moment matrix of the factors, respectively, represented as unfolded tensors\footnote{The package allows to extract these tensors in both folded and unfolded form.} These are calculated from the univariate factor dynamics, and defined as: \begin{eqnarray} M_{f,t}^3 & =& \begin{bmatrix} M_{1,f,t}^3,M_{2,f,t}^3,\ldots,M_{n,f,t}^3 \end{bmatrix} \label{eq:27}\\ M_{f,t}^4 & = & \begin{bmatrix} M_{1,1,f,t}^4,M_{1,2,f,t}^4,\ldots,M_{1,n,f,t}^4|\ldots|M_{n,1,f,t}^4,M_{n,2,f,t}^4,\ldots,M_{n,n,f,t}^4 \end{bmatrix}\label{eq:28} \end{eqnarray} where $M_{k,f,t}^3, k=1,\ldots,n$ and $M_{k,l,f,t}^4, k,l=1,\ldots,n$ are the $N\times N$ sub-matrices of $M_{f,t}^3$ and $M_{f,t}^4$, respectively, with elements \begin{equation} \begin{aligned} m_{i,j,k,f,t}^3 & = & E[f_{i,t}f_{j,t}f_{k,t}|\mathfrak{F}_{t-1}] \\ m_{i,j,k,l,f,t}^4 & = & E[f_{i,t}f_{j,t}f_{k,t}f_{l,t}|\mathfrak{F}_{t-1}]. \end{aligned} \label{eq:29} \end{equation} Since the factors $f_{i,t}$ can be decomposed as $z_{i,t}\sqrt{h_{i,t}}$, and given the assumptions on $z_{i,t}$ then $E[f_{i,t}f_{j,t}f_{k,t}|\mathfrak{F}_{t-1}] = 0$. It is also true that for $i \neq j\neq k \neq l$ $E[f_{i,t}f_{j,t}f_{k,t}f_{l,t}|\mathfrak{F}_{t-1}] = 0$ and when $i=j$ and $k=l$, \begin{equation} E[f_{i,t}f_{j,t}f_{k,t}f_{l,t}|\mathfrak{F}_{t-1}] = h_{i,t}^2 h_{k,t}^2. \label{eq:30} \end{equation} Thus, under the assumption of mutual independence, all elements in the conditional co-moments matrices with at least 3 different indices are zero. Finally, standardizing the conditional co-moments one obtains conditional co-skewness and co-kurtosis of $y_t$, \begin{equation} \begin{aligned} S_{i,j,k,t} &= \frac{m_{i,j,k,t}^3}{({\sigma_{i,t}}{\sigma_{j,t}}{\sigma _{k,t}})}, \\ K_{i,j,k,l,t} &= \frac{m_{i,j,k,l,t}^4}{({\sigma_{i,t}}{\sigma_{j,t}}{\sigma_{k,t}}{\sigma _{l,t}})} \end{aligned} \label{eq:31} \end{equation} where $S_{i,j,k,t}$ represents the series co-skewness between elements $i,j,k$ of $y_t$, $\sigma_{i,t}$ the standard deviation of $y_{i,t}$, and in the case of $i=j=k$ represents the skewness of series $i$ at time $t$, and similarly for the co-kurtosis tensor $K_{i,j,k,l,t}$. Two natural applications of return co-moments matrices are Taylor type utility expansions in portfolio allocation and higher moment news impact surfaces. **Code Snippet** ```{r,eval=FALSE} tscov(...) tscor(...) tscoskew(...) tscoskurt(...) ``` The weighted moments are calculated as follows: \begin{equation} \begin{aligned} \mu_{1,t} &= w_t` M^1_{t}\\ \mu_{2,t} &= w_t' M^2_{t} w_t\\ \mu_{3,t} &= w_t' M^3_{t} \left(w_t \otimes w_t\right)\\ \mu_{4,t} &= w_t' M^4_{t} \left(w_t \otimes w_t \otimes w_t\right)\\ \end{aligned} \label{eq:32} \end{equation} where $w_t$ is the vector of weights at time $t$. Using the vectorization operator $vec$ the weighted moments can be re-written compactly as: \begin{equation} \mu_k = \text{vec}\left(M_k\right)'\underbrace{(w_t \otimes \ldots \otimes w_t)}_{k \text{ times}} \label{eq:33} \end{equation} One application of this is in the calculation of the Constant Absolute Risk Aversion utility function, using a Taylor series expansion as in @Ghalanos2015: \begin{equation} E_{t-1}\left[U\left(W_t\right)\right] = E_{t-1}\left[-\exp\left(-\eta \bar W_t\right)\right] \approx -\exp\left(-\eta \bar W_t\right)\left[1 + \frac{\eta}{2}\mu_{2,t} - \frac{\eta^3}{3!}\mu_{3,t} + \frac{\eta^4}{4!}\mu_{4,t}\right] \label{eq:34} \end{equation} where $\eta$ represents the investor’s constant absolute risk aversion, with higher (lower) values representing higher (lower) aversion, and $\bar W_t = w'_t E_{t-1}\left[y_t\right]$ is the expected portfolio return at time $t$. Note that this approximation includes two moments more than the mean-variance criterion, with the skewness and kurtosis are directly related to investor preferences (dislike) for odd (even) moments, under certain mild assumptions given in @Scott1980. **Code Snippet** ```{r,eval=FALSE} gogarch_modelspec(...) |> estimate(...) |> predict(...) |> tsaggregate(...) ``` ## Weighted Distribution The weighted conditional density of the model may be obtained via the inversion of the Generalized Hyperbolic characteristic function through the fast fourier transform (FFT) method as in @Chen2007 or by simulation as in Section \ref{sec:dcc_weighted}. The `tsmarch` package generates both simulated predictive distributions as well as the weighted $p^*$, $d^*$ and $q^*$ functions based on FFT. It should be noted that while the n-dimensional NIG distribution is closed under convolution, when the distributional higher moment parameters are allowed to be different across assets, as is the case here, this property no longer holds. Provided that $\epsilon_t$ is a n-dimensional vector of innovations, marginally distributed as 1-dimensional standardized GH, the density of weighted asset returns, $w_{i,t}y_{i,t}$ is: \begin{equation} w_{i,t}y_{i,t} - w_{i,t}\mu_{i,t} = \bar w_{i,t}\epsilon_{i,t}\sim \text{maGH}_{\lambda_i}\left(\bar w_{i,t}\bar\mu_{i,t},\left|\bar w_{i,t}\delta_{i,t}\right|,\frac{\alpha_{i,t}}{\left|\bar w_{i,t}\right|}, \frac{\beta_{i,t}}{\left|\bar w_{i,t}\right|} \right) \label{eq:35} \end{equation} where $\bar w_{t} = w'_tAH^{1/2}_t$ and hence $\bar w_{i,t}$ is the $i$-th element of this vector, $\mu_{i,t}$ is the conditional mean of series $i$ at time $t$, and the parameter vector $\left(\bar \mu_{i,t}, \delta_{i,t}, \beta_{i,t}, \alpha_{i,t}, \lambda_i\right)$ represents the generalized hyperbolic distributional parameters which have been converted from their $\left(\rho,\zeta\right)$ parameterization used in the `tsgarch` package estimation routine. In order to obtain the weighted density we need to sum the individual log densities of $\epsilon_{i,t}$: \begin{equation} \varphi(u) = \prod\limits_{i = 1}^n {\varphi_{\bar w{\epsilon_i}}} (u) = \exp{ \left( \text{i}u\sum\limits_{j = 1}^d \bar\mu_j + \sum\limits_{j = 1}^d \left( \frac{\lambda_j}{2} \log{\left(\frac{\gamma}{\upsilon}\right)} + \log \left(\frac{K_{\lambda_j}(\bar\delta_j\sqrt{\upsilon})}{K_{\lambda_j}(\bar\delta_j \sqrt{\gamma})} \right) \right) \right) } \label{eq:36} \end{equation} where, $\gamma = \bar \alpha_j^2 - \bar \beta_j^2$, $\upsilon = \bar \alpha_j^2 - {({\bar \beta_j} + {\text{i}}u)^2}$, and $(\bar \alpha_j, \bar \beta_j, \bar \delta_j, \bar \mu_j)$ are the scaled versions of the parameters $(\alpha_{i}, \beta_{i}, \delta_{i}, \mu_{i})$ as shown in equation \eqref{eq:35}. The density may be accurately approximated by FFT as follows: \begin{equation} {f_y}(r) = \frac{1}{2\pi}\int_{-\infty}^{+\infty}{{e^{( - \text{i}u r)}}} \varphi_y (u)\mathrm{d}u \approx \frac{1}{2\pi}\int_{ - s}^s {{e^{( - \text{i}u r)}}} \varphi_y (u)\mathrm{d}u. \label{eq:37} \end{equation} Once the density is obtained, we use smooth approximations to obtain functions for the density ($d^*$), quantile ($q^*$) and distribution ($p^*$), with random number generation provided by the use of the inverse transform method using the quantile function. **Code Snippet** ```{r,eval=FALSE} gogarch_modelspec(...) |> estimate(...) |> predict(...) |> tsconvolve(...) |> dfft(...) ``` ## Factor News Impact Surface Consider the equation for the construction of the conditional covariance matrix $\Sigma_t$: \begin{equation} \Sigma_t = A H_t A' \end{equation} where $H_t$ is the diagonal conditional covariance matrix of the independent components. The factor news impact covariance surface is a measure of the effect of shocks from the independent factors on the conditional covariance of the returns of assets $i$ and $j$ at time $t-1$. In order to construct this we first generate the news impact curves for all the factors using a common grid of shock values, and then calculate the effect of these on the conditional covariance of the returns of assets $i$ and $j$. Since the factors are independent, we simply loop through all pairwise combinations of these shocks\footnote{We loop through all possible combinations of shocks for factors k and l which were previously calculated from their respective news impact curves using a common grid of shock values $\{\epsilon_1,\ldots,e_z\}$} to calculate the covariance surface, keeping the value of the other factors constant: \begin{equation} \Sigma_{i,j} = a_{k,i} \cdot h_{k,k}|\epsilon_k \cdot a_{k,j} + a_{l,i} \cdot h_{l,l}|\epsilon_l \cdot a_{l,j} + \sum_{\substack{m=1 \\ m \neq k, l}}^{n} a_{m,i} \cdot h_{m,m}|{\epsilon_m=0} \cdot a_{m,j} \label{eq:38} \end{equation} where : 1. $a_{k,i}$ is the loading of factor $k$ on asset $i$, 2. $h_{k,k}|\epsilon_k$ is the conditional variance of factor $k$ from the news impact curve given the shock $\epsilon_k$, 3. $a_{k,i} \cdot h_{k,k}|\epsilon_k \cdot a_{k,j}$ captures the contribution of factor k to the covariance between assets i and j, based on the shock $\epsilon_k$ applied to factor $k$, 4. $a_{l,i} \cdot h_{l,l}|\epsilon_l \cdot a_{l,j}$ captures the contribution of factor $l$ to the covariance between assets i and j, based on the shock $\epsilon_l$ applied to factor l and 5. $\sum_{\substack{m=1 \\ m \neq k, l}}^{n} a_{m,i} \cdot s_{m,m}(0) \cdot a_{m,j}$ captures the contributions of all other factors (except k and l), where their variances are held constant at their unconditional levels (i.e., with no shocks). This means that the covariance will be affected by both the relative sign and magnitude of the loadings and the relative magnitude of the variance of the factors. The news impact surface is not restricted to the covariance matrix, but can be extended to the conditional co-skewness and co-kurtosis, but that is only truly meaningful when the higher moment distributional parameters are allowed to vary as in @Ghalanos2015. **Code Snippet** ```{r,eval=FALSE} gogarch_modelspec(...) |> estimate(...) |> newsimpact(...) |> plot(...) ``` \clearpage # Methods and Classes {#sec-methods} Figures \ref{fig-dcc-diagram} and \ref{fig-gogarch-diagram} provide a visual summary of the main methods in the package for the DCC and GOGARCH type models. Blue boxes denote a method which produces a custom class which can be further manipulated with other methods\footnote{The tsfilter method returns an updated object of the same class as the `estimate` method hence has an arrow going back to that method to denote this.}, whilst Appendix \ref{sec-appendix-methods} provides a summary of some of the methods and output classes in the `tsmarch` package and the type of parallelization they use. \begin{figure}[ht] \centering \includegraphics[width=0.6\textwidth]{dcc_diagram.png} \caption{DCC/CGARCH diagram representing function/method relationships in the package.} \label{fig-dcc-diagram} \end{figure} \begin{figure}[ht] \centering \includegraphics[width=0.6\textwidth]{gogarch_diagram.png} \caption{GOGARCH diagram representing function/method relationships in the package.} \label{fig-gogarch-diagram} \end{figure} \clearpage # Appendix {#sec-appendix} ## Correlation and Kendall's $\tau$ {#sec-appendix-correlation} Pearson’s product moment correlation $R$ totally characterizes the dependence structure in the multivariate Normal case, where zero correlation also implies independence, but can only characterize the ellipses of equal density when the distribution belongs to the elliptical class. In the latter case for instance, with a distribution such as the multivariate Student, the correlation cannot capture tail dependence determined by the shape parameter. Furthermore, it is not invariant under monotone transformations of original variables making it inadequate in many cases. An alternative measure which does not suffer from this is Kendall’s $\tau$ (see @Kruskal1958), based on rank correlations which makes no assumption about the marginal distributions but depends only on the copula $C$. It is a pairwise measure of concordance calculated as: \begin{equation} \tau(x_i, x_j) = 4 \int_0^1 \int_0^1 C\left(u_i, u_j\right) \, dC\left(u_i, u_j\right) - 1 \label{eq:app1} \end{equation} For elliptical distributions, @Lindskog2003, proved that there is a one-to-one relationship between this measure and Pearson's correlation coefficient $\rho$ given by: \begin{equation} \tau(x_i, x_j) = \left( 1 - \sum_{x \in \mathbb{R}} \left( \mathbb{P}\{X_i = x\}^2 \right) \right) \frac{2}{\pi} \arcsin \rho_{ij} \label{eq:app2} \end{equation} which under certain assumptions (e.g. in the multivariate Normal case) simplifies to $$\frac{2}{\pi} \arcsin \rho_{ij}$$. Kendall's $\tau$ is invariant under monotone transformations making it more suitable when working with non-elliptical distributions. A useful application arises in the case of the multivariate Student distribution where the maximum likelihood approach for the estimation of the correlation matrix $R$ become unfeasible for large dimensions. Instead, estimating the sample counterpart of Kendall's $\tau$ from the transformed margins and then translating that into a correlation matrix as detailed in equation \eqref{eq:app2} provides for a method of moments type estimator. The shape parameter $\nu$ is then estimated keeping the correlation matrix constant, with little loss in efficiency. \clearpage ## Notes on Selected Methods {#sec-appendix-methods} The table below provides a summary of selected methods, the classes they belong to, the classes they output, whether they are parallelized and the dimensions of the objects involved. For example, the tsaggrate method on predicted and simulated objects will return a list of length 4, with each slot having an object of class `tsmodel.distribution` of size $sim\times h$ representing the predictive simulatated distribution of each of the weighted moments generated by the model.\footnote{For the DCC and Copula GARCH models only the first 2 moments are time varying and hence contain a simulated distribution.} The output for the `tsconvolve` method which is `list(h):list(sim)` means that the output is a list of length $h$ where each slot contains a list of length $sim$. \begin{table}[h] \centering \begin{tabular}{|l|l|l|c|c|r|} \hline \textbf{Method} & \textbf{class (in)} & \textbf{class (out)} & \textbf{future} & \textbf{RcppParallel} & \textbf{dimensions} \\ \hline \texttt{tscov} & (1,4,7) & \texttt{matrix} & no & no & [n, n, T] \\ \hline \texttt{tscov} & (2,3,5,6,8,9) & \texttt{matrix} & no & no & [n, n, h, sim] \\ \hline \texttt{tscoskew} & (1) & \texttt{matrix} & no & yes & [n, n\textsuperscript{2}, T] \\ \hline \texttt{tscoskew} & (2,3) & \texttt{matrix} & no & yes & [n, n\textsuperscript{2}, h, sim] \\ \hline \texttt{tscokurt} & (1) & \texttt{matrix} & no & yes & [n, n\textsuperscript{3}] \\ \hline \texttt{tscokurt} & (2,3) & \texttt{matrix} & no & yes & [n, n\textsuperscript{3}, h, sim] \\ \hline \texttt{tsconvolve} & (1) & \texttt{gogarch.fft} & yes & no & list(T) \\ \hline \texttt{tsconvolve} & (2,3) & \texttt{gogarch.fftsim} & yes & no & list(h):list(sim) \\ \hline \texttt{tsaggregate} & (1,4,7) & \texttt{list} & no & yes & 4:n \\ \hline \texttt{tsaggregate} & (2,3,5,6,8,9) & \texttt{list} & no & yes & 4:(sim x h) \\ \hline \end{tabular} \caption{Selected Methods} \end{table} \noindent \textbf{Key for class (in):} \begin{itemize} \item \textbf{1}: \texttt{gogarch.estimate} \item \textbf{2}: \texttt{gogarch.predict} \item \textbf{3}: \texttt{gogarch.simulate} \item \textbf{4}: \texttt{dcc.estimate} \item \textbf{5}: \texttt{dcc.predict} \item \textbf{6}: \texttt{dcc.simulate} \item \textbf{7}: \texttt{cgarch.estimate} \item \textbf{8}: \texttt{cgarch.predict} \item \textbf{9}: \texttt{cgarch.simulate} \end{itemize} \clearpage # References