Domain-specific accelerators using optically-addressed phase change memory

Yang, Guowei

Domain-specific accelerators using optically-addressed phase change memory

Files

Yang_bu_0017E_20319.pdf(3.58 MB)

Date

2025

Authors

Yang, Guowei

URI

https://hdl.handle.net/2144/51177

Abstract

In recent years, the exponential growth in data generation and the increasing complexity of computational tasks have created a pressing need for more efficient computing solutions. To address this demand, researchers have developed domain-specific accelerators (DSAs) for various applications, including machine learning (ML), combinatorial optimization, and fully homomorphic encryption (FHE). However, traditional electronic accelerators face significant challenges in both performance and energy efficiency, largely due to the memory wall problem and the limitations of complementary metal-oxide-semiconductor (CMOS) technology scaling. As a result, electronic devices are increasingly unable to meet the growing computational demands, necessitating the exploration of alternative computing paradigms. Among various solutions, optically-addressed phase change memory (OPCM) has emerged as a promising candidate, offering high computational and communication throughput, along with processing-in-memory (PIM) capabilities. However, OPCM also presents unique challenges-such as high programming overhead, low storage density, and limited computational precision-that differ significantly from those of traditional electronic devices. To fully exploit the potential of OPCM while mitigating these limitations, it is necessary to design OPCM-based DSAs that are specifically tailored to the distinct characteristics of the technology. Accordingly, this thesis focuses on the design of OPCM-based DSAs, incorporating optimizations at the device, architecture, and algorithm levels. We first present an ML accelerator using OPCM. OPCM-based PIM systems offer a promising solution to mitigate the data movement overhead in deep neural network (DNN) inference. Prior OPCM-based accelerators have primarily targeted small-scale DNNs that can fit entirely within a limited OPCM array, while neglecting the impact of programming cost. This assumption does not hold for practical deployments. To address this, we propose a system-level design that explicitly accounts for OPCM's high programming overhead and demonstrate that this cost becomes the dominant factor in DNN inference performance on OPCM-based PIM architectures. We conduct a thorough design space exploration to identify the most energy-efficient OPCM array size and batch size configurations. Additionally, we introduce a novel thresholding and weight block reordering technique to further reduce programming overhead. Through these optimizations, our approach achieves up to 65.2× higher throughput compared to existing photonic accelerators when applied to realistic DNN workloads. We then present an Ising machine accelerator using OPCM for solving combinatorial optimization problems. Previous implementations of Ising machines required the hardware capacity to be larger than the problem size; otherwise, their performance would have degraded significantly. We propose SOPHIE, a Scalable Optical PHase-change-memory based Ising Engine that targets the scalability challenge of Ising machines. SOPHIE's modified algorithm incorporates a symmetric local update technique and a stochastic global synchronization strategy, which reduces the overall computation demand and global synchronization overhead. We apply device-level optimizations to support the modified algorithm, including employing bi-directional OPCM arrays and dual-precision analog-to-digital converters (ADCs). Our symmetric tile mapping method at the architecture level reduces approximately half of the OPCM array area, enhancing the scalability of the system. SOPHIE is 3× faster than the state-of-the-art (SOTA) photonic Ising machines on small graphs and 125× faster than the field-programmable gate array (FPGA)-based designs on large problems. SOPHIE alleviates the hardware capacity constraints of Ising machines, offering a scalable and efficient alternative for solving Ising problems. Finally, we present our FHE over the torus (TFHE) accelerator using OPCM. FHE enables secure computation on encrypted data, making it a promising solution for privacy-preserving applications in the cloud. However, its high computation and communication overhead, particularly in the fast Fourier transform (FFT) operations required during bootstrapping, limits its practicality for real-world applications. To tackle these challenges, we propose PHAT, a Photonic Accelerator for TFHE that leverages OPCM. OPCM-based PIM systems offer high computational and communication throughput, making them well-suited for accelerating FFT operations in TFHE. Nonetheless, mapping FFT computations onto OPCM introduces new challenges, such as supporting high-precision analog operations and mitigating the latency and energy costs associated with OPCM programming. To address these issues, PHAT introduces a novel electro-photonic architecture that consists of OPCM-based FFT units, a twiddle-stationary dataflow optimized for OPCM, and a scheduling mechanism to improve FFT unit utilization. PHAT achieves 1.39×-1.77× speedup over the SOTA application-specific integrated circuit (ASIC) accelerator across four programmable bootstrapping configurations, and delivers 2.14×-5.10× speedup on real-world TFHE-based machine learning workloads. These results demonstrate that PHAT significantly improves the practicality and efficiency of TFHE, paving the way for scalable, privacy-preserving computation in cloud environments.

Description

2025

Collections

Boston University Theses & Dissertations

Full item page