Date of Graduation
8-2024
Document Type
Dissertation
Degree Name
Doctor of Philosophy in Engineering (PhD)
Degree Level
Graduate
Department
Computer Science & Computer Engineering
Advisor/Mentor
Andrews, David L.
Committee Member
Di, Jia
Second Committee Member
Nelson, Alexander H.
Third Committee Member
Chen, Zhong
Keywords
Accelerator; Array processor; Deep-learning; FPGA; Processing-in-memory; Reconfigurable overlay
Abstract
Deep-Learning has become a dominant computing paradigm across a broad range of application domains. Different architectures of Deep-Networks like CNN, MLP, and RNN have emerged as the prominent machine-learning approaches for today’s application domains. These architectures are heavily data-dependent, requiring frequent access to memory. As a result, these applications suffer the most from the memory bottleneck of the von Neumann architectures. There is an imminent need for memory-centric architectures for deep-learning and big-data analytic applications that are memory intensive. Modern Field Programmable Gate Arrays (FPGAs) are ideal programmable substrates for creating customized Processor in/near Memory (PIM) accelerators. Modern FPGAs contain 100s of Mbits of dual-ported SRAM in the form of disaggregated, configurable Block RAMs (BRAMs). These BRAMs contain TB/s of available internal bandwidth. Unfortunately, developing FPGA-based accelerators for deep learning is not a simple task and demands the utilization of specialized tools provided by the FPGA vendors. It requires expertise in low-level hardware microarchitecture design. These are often not available to most researchers in the field of deep learning. Even with the ongoing improvements in High-Level Synthesis (HLS) tools, the requirement for hardware-specific design knowledge cannot be completely eliminated. This research developed a new reconfigurable memory-centric architecture and design approach that opens the advantages of FPGAs and Processor-in-Memory architecture to memory-intensive applications. Due to its high-performance and scalable memory-centric design, this architecture can deliver the highest speed and the lowest latency achievable from an FPGA overcoming the memory bottleneck.
Citation
Kabir, M. (2024). ReMoDeL-FPGA: Reconfigurable Memory-centric Array Processor Architecture for Deep-Learning Acceleration on FPGA. Graduate Theses and Dissertations Retrieved from https://scholarworks.uark.edu/etd/5483