Author ORCID Identifier:

https://orcid.org/0009-0007-5329-8233

Date of Graduation

5-2026

Document Type

Thesis

Degree Name

Master of Science in Computer Engineering (MSCmpE)

Degree Level

Graduate

Department

Computer Science & Computer Engineering

Advisor/Mentor

Andrews, David

Committee Member

Nelson, Alexander

Second Committee Member

Huang, Miaoqing

Keywords

BRAM; FPGA; GEMV; Memory Wall; PIM; Processing-in-Memory

Abstract

Traditional computer systems are hitting the Memory Wall as machine learning applications are bottlenecked by the bandwidth between separate memory and compute units. Current FPGA architectures are able to bypass this bottleneck with block RAMs (BRAMs) that provide on-chip, in-fabric storage. However, their potential as the foundation of computing components is often overlooked; machine learning accelerators implemented on FPGAs face additional delays when transferring data between memory and compute units. These penalties arise from BRAM bandwidth limitations and movement of data through the reconfigurable fabric. To support FPGA-based accelerators on the edge and break the Memory Wall, reconfigurable architectures must adopt a Processor-in-Memory aligned paradigm. This work presents MC2-BRAM, a Memory-Centric Compute BRAM architecture that integrates processing and networking directly within the FPGA’s memory blocks, enabling efficient execution of machine learning workloads. MC2-BRAM includes networking that integrates zero-copy, in-operation data movement within and between BRAMs. As a result, MC2-BRAM delivers the lowest inter-PIM accumulation latency among all PIM BRAMs to date. MC2-BRAM stands as the first PIM BRAM architecture to not only maintain but exceed the BRAM’s original clocking frequency, advancing the state of BRAM-based Processor-in-Memory designs. This speed enables MC2-BRAM to deliver superior GEMV latencies at low-precisions. MC2 -BRAM achieves all this while using 18% less area than the previously smallest PIM design, making it the smallest and fastest FPGA PIM architecture reported to date.

Share

COinS