Date of Graduation
Master of Science in Computer Science (MS)
Computer Science & Computer Engineering
Dale R. Thompson
Second Committee Member
Decision Trees, Internet Protocol Version 6 (IPV6), Machine Learning, Neural Networks, Operating System Identification, Random Forests
Operating system (OS) identification tools, sometimes called fingerprinting tools, are essential for the reconnaissance phase of penetration testing. While OS identification is traditionally performed by passive or active tools that use fingerprint databases, very little work has focused on using machine learning techniques. Moreover, significantly more work has focused on IPv4 than IPv6. We introduce a collaborative neural network ensemble that uses a unique voting system and a random forest ensemble to deliver accurate predictions. This approach uses IPv6 features as well as packet metadata features for OS identification. Our experiment shows that our approach is valid and we achieve a neural network ensemble average accuracy of 85% over 100 sets of neural networks with a highest accuracy of 96%. Furthermore, we explore the impact of additional training for poor neural network accuracy, and we show that our system can achieve an average accuracy of 93%, which is an 8% improvement over the previous approach. A random forest of 30 decision trees attains an average accuracy of 93.6% and a best accuracy of 96% when given a dataset of Windows and Linux packets. Finally, as packets from the Mac OS is introduced into the dataset, the random forested performed with an average accuracy of 89.6% and a best accuracy of 93.2%.
Ordorica, Adrian, "Operating System Identification by IPv6 Communication using Machine Learning Ensembles" (2017). Theses and Dissertations. 2413.