Image-based classification of hoarding clutter using deep learning

Date
2024
DOI
Authors
Sun, Zhenghao
Version
Embargo Date
2025-05-23
OA Version
Citation
Abstract
Hoarding disorder (HD) is characterized by difficulty letting go of items in living space resulting in excessive clutter, which can lead to significant health and safety risks. Traditionally, HD is assessed in an interview with a practitioner but can be complemented by evaluating room clutter from pictures. To formalize the assessment of room clutter, a numerical scale was developed, called the Clutter Image Rating (CIR) scale: CIR = 1 corresponds to an uncluttered room, while CIR = 9 corresponds to a fully-cluttered room. CIR assessment is performed by social workers or other trained health or human-service professionals, which is time-consuming (and, therefore, costly), subjective, and can lack consistency in its repeatability. To address these challenges, deep-learning methods have been developed to automatically assess CIR from pictures, achieving up to 81% accuracy in estimating CIR on a dataset of 1,233 images of room clutter. However, this is a relatively small dataset for training large deep-learning models, and its CIR-class composition is imbalanced. This thesis focuses on issues associated with the dataset size and imbalance, and also adopts a new deep-learning architecture for CIR scoring. First, data augmentation is applied to enlarge the training dataset and a novel weighted loss function is introduced to combat the dataset imbalance. Jointly, these two techniques improve the CIR scoring accuracy by 1% point compared to a ResNet-18-based method previously developed by Tezcan et al. Secondly, a Vision Transformer (ViT) architecture is adopted for CIR scoring, resulting in additional 5% points improvement in accuracy over ResNet-18. Thirdly, in order to further address the dataset imbalance, DALL.E, a generative AI tool, is employed to synthesize new images with room clutter based on existing natural images in the dataset. This can be considered a novel type of data augmentation - AI-driven. New images are generated for underrepresented CIR classes in order to minimize the dataset imbalance. This also increases the overall dataset size which is beneficial for training the ViT model. Extensive experiments conducted using ResNet-18 and ViT models demonstrate that augmenting the original training dataset by AI-generated images enhances the performance for most under-represented classes but that the overall CIR-estimation accuracy is not improved. A detailed analysis of AI-generated clutter images against natural images from the dataset performed using t-SNE visualization suggests that for some CIR classes the new images exhibit outlying properties compared to the natural images, which likely affects the trained model’s performance. While the novel idea of AI-driven data augmentation is beneficial for improving performance for some CIR classes, more research is needed to extend these gains across all classes.
Description
License