Building next-generation deep learning hardware using photonic computing

Date
2024
DOI
Authors
Demirkiran, Cansu
Version
OA Version
Citation
Abstract
In recent years, the demand for computational power has skyrocketed due to the rapid advancement of artificial intelligence (AI). As we move past Moore’s Law, the limitations of traditional digital computing are pushing the exploration of alternative computing paradigms. Among the emerging technologies, integrated photonics stands out as a highly promising candidate for the next generation of high-performance AI computing as it offers low latency, high bandwidth, and high parallelism. However, there still exist challenges associated with photonic hardware for AI acceleration including the need for slower and less efficient electronic circuits and memory units, lack of efficient nonlinearity in photonics, limited precision, analog noise, and various device non-idealities. In this thesis, we investigate the opportunities and challenges of photonics technology for accelerating state-of-the-art AI workloads from a realistic perspective, evaluate the performance benefits, and propose solutions to address the associated challenges. First, we outline our strategy for designing and evaluating ADEPT, a complete electro-photonic accelerator for deep neural network (DNN) inference. ADEPT leverages a photonic computing unit for general matrix-matrix multiplication (GEMM) operations, a vectorized digital electronic application-specific integrated circuit (ASIC) for non-GEMM operations, and static random-access memory (SRAM) arrays for storing DNN parameters and activations. Unlike previous photonic DNN accelerators, we adopt a system-level perspective to provide a more realistic assessment of the photonics technology and its applicability in accelerating state-of-the-art DNNs. We detail our design steps and introduce optimizations to minimize the overhead of electronic devices. Our evaluation shows that ADEPT achieves, on average, 5.73× higher throughput per watt compared to systolic arrays (SAs), and more than 6.8× and 2.5× better throughput per watt compared to state-of-the-art electronic and photonic accelerators, respectively. Second, we focus on the precision limitations in analog computing and propose using the residue number system (RNS) to compose high-precision operations from multiple low-precision operations. This approach eliminates the need for high-precision data converters and avoids information loss. Our study shows that our technology-agnostic RNS-based approach can achieve ≥ 99% of 32-bit floating-point (FP32) accuracy for state-of-the-art DNN inference with only 6-bit and training with 7-bit fixed-point (FXP) arithmetic. This indicates that using RNS can significantly reduce the energy consumption of analog accelerators while maintaining the same throughput and precision. In addition, we present a fault-tolerant dataflow using redundant RNS (RRNS) to protect computations against noise and errors inherent in analog hardware. At last, leveraging this RNS-based framework, we propose Mirage, a photonic DNN training accelerator. Mirage employs a novel micro-architecture to support modular arithmetic in the analog domain, achieving high energy efficiency without compromising precision. Our study shows that, on average, Mirage achieves FP32 accuracy with 23.8× lower training time and 32.1× lower energy-delay product (EDP) in an iso-energy scenario, and 42.8× less power consumption with comparable or better EDP in an iso-area scenario, compared to SAs.
Description
License
Attribution 4.0 International