Date of Graduation
8-2022
Document Type
Thesis
Degree Name
Master of Science in Computer Science (MS)
Degree Level
Graduate
Department
Computer Science & Computer Engineering
Advisor/Mentor
Li, Qinghua
Committee Member
Thompson, Dale R.
Second Committee Member
Panda, Brajendra N.
Keywords
Cyber security; Knowledge graphs; Extraction language models
Abstract
With the rate at which malware spreads in the modern age, it is extremely important that cyber security analysts are able to extract relevant information pertaining to new and active threats in a timely and effective manner. Having to manually read through articles and blog posts on the internet is time consuming and usually involves sifting through much repeated information. Knowledge graphs, a structured representation of relationship information, are an effective way to visually condense information presented in large amounts of unstructured text for human readers. Thusly, they are useful for sifting through the abundance of cyber security information that is released through web-based security articles and blogs. This paper presents a pipeline for extracting these relationships using supervised deep learning with the recent state-of-the-art transformer-based neural architectures for sequence processing tasks. To this end, a corpus of text from a range of prominent cybersecurity-focused media outlets was manually annotated. An algorithm is also presented that keeps potentially redundant relationships from being added to an existing knowledge graph, using a cosine-similarity metric on pre-trained word embeddings.
Citation
Boudreau, P. R. (2022). Effective Knowledge Graph Aggregation for Malware-Related Cybersecurity Text. Graduate Theses and Dissertations Retrieved from https://scholarworks.uark.edu/etd/4604