Date of Graduation

12-2022

Document Type

Thesis

Degree Name

Master of Science in Computer Science (MS)

Degree Level

Graduate

Department

Computer Science & Computer Engineering

Advisor/Mentor

Qinghua Li

Committee Member

Brajendra Panda

Second Committee Member

Lu Zhang

Keywords

Artificial intelligence, Machine learning, Privacy

Abstract

Machine learning has become a highly utilized technology to perform decision making on high dimensional data. As dataset sizes have become increasingly large so too have the neural networks to learn the complex patterns hidden within. This expansion has continued to the degree that it may be infeasible to train a model from a singular device due to computational or memory limitations of underlying hardware. Purpose built computing clusters for training large models are commonplace while access to networks of heterogeneous devices is still typically more accessible. In addition, with the rise of 5G networks, computation at the edge becoming more commonplace, and inspired by the successes of the folding@home project utilizing crowdsourced computation, we consider the scenario of the crowdsourcing the computation required for training of a neural network particularly appealing. Distributed learning promises to bridge the widening gap between singular device performance and large-scale model computational requirements, but unfortunately, current distributed learning techniques do not maintain privacy of both the model and input with- out an accuracy or computational tradeoff. In response, we present Divide and Conquer Learning (DCL), an innovative approach that enables quantifiable privacy guarantees while offloading the computational burden of training to a network of devices. A user can divide the training computation of its neural network into neuron-sized computation tasks and dis- tribute them to devices based on their available resources. The results will be returned to the user and aggregated in an iterative process to obtain the final neural network model. To protect the privacy of the user’s data and model, shuffling is done to both the data and the neural network model before the computation task is distributed to devices. Our strict adherence to the order of operations allows a user to verify the correctness of performed computations through assigning a task to multiple devices and cross-validating their results. This can protect against network churns and detect faulty or misbehaving devices.

Share

COinS