Date of Graduation

9-2025

Document Type

Dissertation

Degree Name

Doctor of Philosophy in Engineering (PhD)

Degree Level

Graduate

Department

Electrical Engineering and Computer Science

Advisor/Mentor

Huang, Miaoqing

Committee Member

Alexander Nelson

Third Committee Member

David Andrews

Fourth Committee Member

Zhong Chen

Keywords

DNN, data storage

Abstract

Deep neural networks (DNNs) are widely used in applications such as classification, prediction, and regression. Various DNN architectures, such as convolutional neural networks (CNN), multilayer perceptrons (MLP), long short-term memory (LSTM), recurrent neural networks (RNN), and transformers, have become leading machine learning techniques in these applications. They require significant computational resources and have substantial memory demands due to intensive matrix-matrix multiplications and complex data flows. Hence, efficient utilization of on-chip computational and memory resources is essential to maximize parallelism and minimize latency. Designing an optimal tiling scheme that aligns effectively with the architecture is also necessary. Modern FPGAs are equipped with Block Random Access Memory (BRAM) for storing data, such as weights, intermediate results, or configurations during processing, and Digital Signal Processing (DSP) blocks for performing high-speed arithmetic operations like multiplication and accumulation, which are essential for deep learning tasks. Efficient parallel utilization of these resources while maintaining operational frequency is crucial for achieving low-latency inference. Moreover, achieving high performance and accuracy across diverse applications often necessitates an optimized network architecture, typically developed through iterative experimentation and evaluation across various network topologies. However, custom hardware accelerators lack scalability and flexibility, as they cannot adapt to different topologies at runtime. Additionally, designing custom hardware using FPGA vendor tools and programming languages is a time-intensive process that demands a deep understanding of hardware architecture. This research developed versatile accelerators to address diverse computational and memory demands, enabling support for various DNNs across multiple applications while striving to maximize processing unit utilization to achieve low latency. Due to their high parallelism, efficient tiling, and advanced coding techniques, they surpassed the performance of most custom accelerators and general-purpose processors.

Share

COinS