Similarity-principle-based machine learning method for clinical trials and beyond
MetadataShow full item record
The control of type-I error is a focal point for clinical trials. On the other hand, it is also critical to be able to detect a truly efficacious treatment in a clinical trial. With recent success in supervised learning (classification and regression problems), artificial intelligence (AI) and machine learning (ML) can play a vital role in identifying efficacious new treatments. However, the high performance of the AI methods, particularly the deep learning neural networks, requires a much larger dataset than those we commonly see in clinical trials. It is desirable to develop a new ML method that performs well with a small sample size (ranges from 20 to 200) and has advantages as compared with the classic statistical models and some of the most relevant ML methods. In this dissertation, we propose a Similarity-Principle-Based Machine Learning (SBML) method based on the similarity principle assuming that identical or similar subjects should behave in a similar manner. SBML method introduces the attribute-scaling factors at the training stage so that the relative importance of different attributes can be objectively determined in the similarity measures. In addition, the gradient method is used in learning / training in order to update the attribute-scaling factors. The method is novel as far as we know. We first evaluate SBML for continuous outcomes, especially when the sample size is small, and investigate the effects of various tuning parameters on the performance of SBML. Simulations show that SBML achieves better predictions in terms of mean squared errors or misclassification error rates for various situations under consideration than conventional statistical methods, such as full linear models, optimal or ridge regressions and mixed effect models, as well as ML methods including kernel and decision tree methods. We also extend and show how SBML can be flexibly applied to binary outcomes. Through numerical and simulation studies, we confirm that SBML performs well compared to classical statistical methods, even when the sample size is small and in the presence of unmeasured predictors and/or noise variables. Although SBML performs well with small sample sizes, it may not be computationally efficient for large sample sizes. Therefore, we propose Recursive SBML (RSBML), which can save computing time, with some tradeoffs for accuracy. In this sense, RSBML can also be viewed as a combination of unsupervised learning (dimension reduction) and supervised learning (prediction). Recursive learning resembles the natural human way of learning. It is an efficient way of learning from complicated large data. Based on the simulation results, RSBML performs much faster than SBML with reasonable accuracy for large sample sizes.
RightsAttribution 4.0 International