Using offline routing to implement a low latenc 3D FFT in a multinode FPGA system
MetadataShow full item record
Applications that require highly parallel computing along with low latency communication due to strong scaling, such as a calculating a 3D FFT for Molecular Dynamics simulations, can be problematic for traditional high performance computing (HPC) clusters. A multinode FPGA array is a good solution for these types of problems due to the direct high speed connections and flexible internal fabric inherent in FPGAs. Offline routing uses precomputed routing information to direct packets and can avoid much of the switching and congestion communication overhead. Two architectures are explored here which show the feasibility ofusing offline routing techniques to reduce communication latencies in FPGA systems. The first architecture targets a single FPGA that was built for initial exploration and to show how the powerful and flexible a single FPGA can be. It attained a maximum clock frequency of 102MHz and latencies of 64us and 250us for 3D FFT calculations of 32^3 and 64^3 data points respectively. The second architecture targets an FPGA that is intended to be the model for each node in the array. The best multinode version is based on a multilevel switching architecture. It has a maximum clock frequency of 134MHz. When scaled to a cluster, latencies project to 2.4us and 5.5us for 3D FFT calculations of 32^3 and 64^3 data points respectively. The two designs show the potential for using a single FPGA and multi-FPGA arrays for HPC applications where communication latency is critical to the application.
Thesis (M.S.)--Boston University