Date of Graduation

5-2020

Document Type

Thesis

Degree Name

Bachelor of Science in Computer Engineering

Degree Level

Undergraduate

Department

Computer Science and Computer Engineering

Advisor/Mentor

Li, Qinghua

Committee Member/Reader

Li, Qinghua

Committee Member/Second Reader

Parkerson, James

Committee Member/Third Reader

Gauch, Susan

Abstract

Ever since technology (tech) companies realized that people's usage data from their activities on mobile applications to the internet could be sold to advertisers for a profit, it began the Big Data era where tech companies collect as much data as possible from users. One of the benefits of this new era is the creation of new types of jobs such as data scientists, Big Data engineers, etc. However, this new era has also raised one of the hottest topics, which is data privacy. A myriad number of complaints have been raised on data privacy, such as how much access most mobile applications require to function correctly, from having access to a user's contact list to media files. Furthermore, the level of tracking has reached new heights, from tracking mobile phone location, activities on search engines, to phone battery life percentage. However much data is collected, it is within the tech companies' right to collect the data because they provide a privacy policy that informs the user on the type of data they collect, how they use that data, and how they share that data. In addition, we find that all privacy policies used in this research state that by using their mobile application, the user agrees to their terms and conditions. Most alarmingly, research done on privacy policies has found that only 9% of mobile app users read legal terms and conditions [2] because they are too long, which is a worryingly low number. Therefore, in this thesis, we present two summarization programs that take in privacy policy text as input and produce a shorter summarized version of the privacy policy. The results from the two summarization programs show that both implementations achieve an average of at least 50%, 90%, and 85% on the same sentence, clear sentence, and summary score grading metrics, respectively.

Keywords

Privacy Policy; Natural Language Processing; Summarization; Summarization Algorithms; Ed Munson algorithm

Share

COinS