Show simple item record

dc.contributor.advisorHerbordt, Martin C.en_US
dc.contributor.authorXiong, Qingqingen_US
dc.date.accessioned2019-10-08T15:26:46Z
dc.date.available2019-10-08T15:26:46Z
dc.date.issued2019
dc.identifier.urihttps://hdl.handle.net/2144/38211
dc.description.abstractHigh-Performance Computing (HPC) necessarily requires computing with a large number of nodes. As computing technology progresses, internode communication becomes an ever more critical performance blocker. The execution time of software communication support is generally critical, often accounting for hundreds of times the latency of actual time-of-flight. This software support comes in two types. The first is support for core functions as defined in middleware such as the ubiquitous Message Passing Interface (MPI). Over the last decades this software overhead has been addressed through a number of advances such as eliminating data copies, improving drivers, and bypassing the operating system. However an essential core still remains, including message matching, data marshaling, and handling collective operations. The second type of communication support is for new services not inherently part of the middleware. The most prominent of these is compression; it brings huge savings in transmission time, but much of this benefit is offset by a new level of software overhead. In this dissertation, we address the software overhead in internode communication with elements of the emerging node architectures, which include FPGAs in multiple configurations, including closely coupled hardware support, programmable Network Interface Cards (NICs), and routers with programmable accelerators. While there has been substantial work in offloading communication software into hardware, we advance the state-of-the-art in three ways. The first is to use an emerging hardware model that is, for the first time, both realistic and supportive of very high performance gains. Previous studies (and some products) have relied on hardware models that are either of limited benefit (a NIC processor) or not sustainable (NIC augmented with ASICs). Our hardware model is based on the various emerging CPU-FPGA computing architectures. The second is to improve on previous work. We have found this to be possible through a number of means: taking advantage of configurable hardware, taking advantage of close coupling, and coming up with novel improvements. The third is looking at problems that have been, so far, nearly completely unexplored. One of these is hardware acceleration of application-aware, in-line, lossy compression. In this dissertation, we propose offload approaches and hardware designs for integrated FPGAs to bring down communication latency to ultra-low levels unachievable by today's software/hardware. We focus on improving performance from three aspects: 1) Accelerating middleware semantics within communication routines such as message matching and derived datatypes; 2) Optimizing complex communication routines, namely, collective operations; 3) Accelerating operations vital in new communication services independent of the middleware, such as data compression. % The last aspect is somewhat broader than the others. It is applicable both to HPC communication, but also is vital to broader system functions such as I/O.en_US
dc.language.isoen_US
dc.subjectComputer engineeringen_US
dc.subjectCommunicationen_US
dc.subjectCompressionen_US
dc.subjectFPGAen_US
dc.subjectHPCen_US
dc.subjectMiddlewareen_US
dc.subjectMPIen_US
dc.titleFPGA acceleration of high performance computing communication middlewareen_US
dc.typeThesis/Dissertationen_US
dc.date.updated2019-09-29T04:01:43Z
etd.degree.nameDoctor of Philosophyen_US
etd.degree.leveldoctoralen_US
etd.degree.disciplineElectrical & Computer Engineeringen_US
etd.degree.grantorBoston Universityen_US
dc.identifier.orcid0000-0002-3243-3949


This item appears in the following Collection(s)

Show simple item record