Deviance matrix factorization and its applications
Embargo Date
2026-09-17
OA Version
Citation
Abstract
Matrix factorization aims at summarizing the data matrix information with lower dimensional vectors. With reduced dimensions, those vectors can be leveraged to provide efficient data storage, to facilitate representative visualization, and to make more robust predictions. Recent research has found that the factorization result can be greatly improved via (i). imposing appropriate statistical assumptions, (ii). adding flexible entry-wise factorization weights, and (iii). constraining the factorization components space.
Following those recent improvements, we investigate a general matrix factorization for deviance-based data losses, extending the ubiquitous singular value decomposition beyond squared error loss. While similar approaches have been explored before, our method lever- ages classical statistical methodology from generalized linear models (GLMs) and provides an efficient algorithm that is flexible enough to allow for structural zeros and entry weights. Moreover, by adapting results from GLM theory, we provide support for these decompositions by (i) showing strong consistency under the GLM setup, (ii) checking the adequacy of a chosen exponential family , and (iii) determining the rank of the decomposition.
This general factorization model is then applied toward structured data to demonstrate its improvement toward different statistical applications. Specifically, our first contribution is to conduct factorization with multinomial and binomial assumption on classification evaluation matrices. The factorized components are then used to provide effective model evaluation metric for multi-class classification models. Secondly, a joint binomial factorization on pair-wise true positive rate matrix and false positive matrix is shown to provide ROC equivalent curve to visualize the model performance for multi-class classification. Both of the two statistical applications are compared against the benchmark methodologies to demonstrate superior performance.
Lastly, we seek computational improvements by (i). generalizing the factorization into a factor model and by (ii). providing efficient rank transformations. The generalization toward the factor model is readily scalable for millions-size matrix factorization by lever- aging on recent advances in stochastic optimization. The rank transformation extension will allow practitioners to efficiently navigate among different factorization ranks when prior knowledge on possibles factorization rank are available. Both of the methods are validated with extensive simulation experiments to demonstrate their speed improvement and approximation quality. Lastly, we apply our DMF method toward multi-layer network data and large genetic data to illustrate the improved DMF application for network analysis and geon-wide association study.
Description
License
Attribution-NonCommercial-NoDerivatives 4.0 International