Towards effective and robust transformer-based models for biomedical applications

Date
2025
DOI
Version
OA Version
Citation
Abstract
Transformer-based deep neural networks have achieved great success in biomedical research and practice. Leveraging the strength of the self-attention mechanism and self-supervised learning schemes such as Masked Language Modeling (MLM), transformer-based Language Models (LMs) like Clinical BERT have shown impressive performance on clinical Natural Language Processing (NLP) tasks, while Vision Transformers (ViTs) also improve the traditional Convolutional Neural Network (CNN) models on clinical image analysis. These successes indicate the general sequence modeling power of transformers, which can be potentially extended to effectively analyze the biomedical sequential data in other modalities. On the other hand, ViTs trained under Empirical Risk Minimization (ERM) are known to be vulnerable to perturbations in the image, which limits their applications in biomedicine. This dissertation focuses on developing effective and robust transformer models and learning methods for a variety of biomedical applications. We propose a pre-training method to introduce prior knowledge from a medical knowledge base to transformer-based LMs, which can be implemented jointly with traditional MLM and improve the LM's ability on clinical text understanding. Inspired by the strength of MLM, we pre-train and fine-tune transformer-based protein LMs to achieve state-of-the-art MHC-peptide binding predictions. We also propose a causal Electronic Health Record (EHR) modeling scheme for general structured EHR modeling, which is implemented to train a Generative Pre-trained Transformer (GPT) and support real-world, unsupervised novel disease detection. To improve the robustness of transformers, we develop a distributionally robust deep learning framework, which significantly reduces the ViT classifier's error rate under image perturbations, and improves the stroke diagnosis accuracy under accelerated MRI settings. Our work not only practically improves the applicability and reliability of Artificial Intelligence (AI) in healthcare, but also provides general effective and robust sequence data processing frameworks for other research domains.
Description
2025
License
Attribution 4.0 International