Image classification with directional image sensors
OA Version
Citation
Abstract
Traditional electronic implementations of CNNs (Convolutional Neural Networks) suffer from high power consumption and limited processing speed, hindering deployment in resource-constrained scenarios. Leveraging the power of photonics, an innovative imaging device has been developed in Prof. Paiella’s lab, that integrates a standard image sensor with a photonic nanostructure (metasurface). This device has a unique and asymmetric response to the angle of incident light. Combined into an array within an imaging system, it can perform optical spatial filtering analogous to that in the first convolutional layer of a typical CNN tailored to image recognition. This filtering process relates an imaged object to the output of the sensor array by a coherent transfer function (CTF) or optical transfer function (OTF), under the illumination by coherent or spatially-incoherent light, respectively. By combining this all-optical convolutional layer with a shallow digital CNN, it is expected that the complexity and power consumption can be significantly reduced compared to an all-digital CNN.
In this thesis, we propose, numerically simulate and experimentally evaluate two types of the device targeting the problem of image recognition. First, we evaluate an angle-selective device, characterized by an OTF, in combination with a 5-layer LeNet CNN (fully-digital). Replacing the first digital convolutional layer of LeNet with the OTF results in a small performance drop (0.1-0.5% reduction in accuracy), but a significant reduction in computational complexity (28.8% fewer multply-accumulate operations). Further reducing the digital network's complexity (OTF layer followed only by pooling, activation function and one fully-connected layer) leads to a hugely-reduced computational complexity (96.0% reduction) at the cost of a slight performance loss (0.6-0.8%). We also evaluate a phase-imaging device characterized by a CTF. We simulate the imaging capabilities of this device based on experimentally-measured parameters and test it on a real cell dataset. Compared to the fully-digital LeNet, the new architecture achieves an accuracy of 96.1% (2.5% reduction compared to LeNet) for 3 classes of cells and complexity savings of up to 98.4%. Finally, we propose a joint optimization of two parameters of a numerically-simulated CTF response and of a single-layer digital network. Although performance gains compared to using a fixed CTF are small (on average, about 0.5% points improvement in accuracy), we believe this is a promising pathway for further exploring optical-digital system co-design.
Overall, our numerical experiments, performed using realistic OTF/CTF responses, project significant reductions in system complexity while retaining high accuracy. These results remain to be confirmed by measurements with full sensor arrays, which would pave the way for efficient CNN-based visual recognition hardware for mobile applications.