Date of Graduation

5-2020

Document Type

Dissertation

Degree Name

Doctor of Philosophy in Engineering (PhD)

Degree Level

Graduate

Department

Computer Science & Computer Engineering

Advisor/Mentor

Xintao Wu

Committee Member

Qinghua Li

Second Committee Member

Song Yang

Third Committee Member

Lu Zhang

Keywords

Dirichlet process, Fraud Detection, Machine Learning, Mixture Model, Sequential Model, Survival Analysis

Abstract

The impacts of information revolution are omnipresent from life to work. The web services have signicantly changed our living styles in daily life, such as Facebook for communication and Wikipedia for knowledge acquirement. Besides, varieties of information systems, such as data management system and management information system, make us work more eciently. However, it is usually a double-edged sword. With the popularity of web services, relevant security issues are arising, such as fake news on Facebook and vandalism on Wikipedia, which denitely impose severe security threats to OSNs and their legitimate participants. Likewise, oce automation incurs another challenging security issue, insider threat, which may involve the theft of condential information, the theft of intellectual property, or the sabotage of computer systems. A recent survey says that 27% of all cyber crime incidents are suspected to be committed by the insiders. As a result, how to ag out these malicious web users or insiders is urgent. The fast development of machine learning (ML) techniques oers an unprecedented opportunity to build some ML models that can assist humans to detect the individuals who conduct misbehaviors automatically. However, unlike some static outlier detection scenarios where ML models have achieved promising performance, the malicious behaviors conducted by humans are often dynamic. Such dynamic behaviors lead to various unique challenges of dynamic fraud detection:

Unavailability of sucient labeled data - traditional machine learning approaches usually require a balanced training dataset consisting of normal and abnormal samples. In practice, however, there are far fewer abnormal labeled samples than normal ones.

Lack of high quality labels - the labeled training records often have the time gap between the time that fraudulent users commit fraudulent actions and the time that they are suspended by the platforms.

Time-evolving nature - users are always changing their behaviors over time.

To address the aforementioned challenges, in this dissertation, we conduct a systematic study for dynamic fraud detection, with a focus on: (1) Unavailability of labeled data: we present (a) a few-shot learning framework to handle the extremely imbalanced dataset that abnormal samples are far fewer than the normal ones and (b) a one-class fraud detection method using a complementary GAN (Generative Adversarial Network) to adaptively generate potential abnormal samples; (2) Lack of high-quality labels: we develop a neural survival analysis model for fraud early detection to deal with the time gap; (3) Time-evolving nature: we propose (a) a hierarchical neural temporal point process model and (b) a dynamic Dirichlet marked Hawkes process model for fraud detection.

Share

COinS