Some methods for robust inference in econometric factor models and in machine learning
Nikolaev, Nikolay Ivanov
MetadataShow full item record
Traditional multivariate statistical theory and applications are often based on specific parametric assumptions. For example it is often assumed that data follow (nearly) normal distribution. In practice such assumption is rarely true and in fact the underlying data distribution is often unknown. Violations of the normality assumption can be detrimental in inference. In particular, two areas affected by violations of assumptions are quadratic discriminant analysis (QDA), used in classification, and principal component analysis (PCA), commonly employed in dimension reduction. Both PCA and QDA involve the computation of empirical covariance matrices of the data. In econometric and financial data, non-normality is often associated with heavy-tailed distributions and such distributions can create significant problems in computing sample covariance matrix. Furthermore, in PCA non-normality may lead to erroneous decisions about numbers of components to be retained due to unexpected behavior of empirical covariance matrix eigenvalues. In the first part of the dissertation, we consider the so called number-of-factors problem in econometric and financial data, which is related to the number of sources of variations (latent factors) that are common to a set of variables observed multiple times (as in time series). The approach that is commonly used in the literature is the PCA and examination of the pattern of the related eigenvalues. We employ an existing technique for robust principal component analysis, which produces properly estimated eigenvalues that are then used in an automatic inferential procedure for the identification of the number of latent factors. In a series of simulation experiments we demonstrate the superiority of our approach compared to other well-established methods. In the second part of the dissertation, we discuss a method to normalize the data empirically so that classical QDA for binary classification can be used. In addition, we successfully overcome the usual issue of large dimension-to-sample-size ratio through regularized estimation of precision matrices. Extensive simulation experiments demonstrate the advantages of our approach in terms of accuracy over other classification techniques. We illustrate the efficiency of our methods in both situations by applying them to real datasets from economics and bioinformatics.