Incorporating Pre-Training in Long Short-Term Memory Networks for Tweets Classification

Document Type


Publication Date



Training, Logistics, Data models, Twitter, Semantics, Tagging, Logic gates, Artificial intelligence, Pattern classification, Recurrent neural nets, Regression analysis, Social networking, LSTM, Tweets classification, Pre-training, Deep learning, Long short-term memory networks, Binary classification, Long-term dependencies, Semantic tweet representation, Logistic regression, Tweet label, LSTM-TC model, Well-labeled training data, Weakly-labeled data, Hashtag information, Tweet representation, Logistic regression classifier, Weakly-labeled tweets


The paper presents deep learning models for tweets binary classification. Our approach is based on the Long Short-Term Memory (LSTM) recurrent neural network and hence expects to be able to capture long-term dependencies among words. We develop two models for tweets classification. The basic model, called LSTM-TC, takes word embeddings as input, uses the LSTM layer to derive semantic tweet representation, and applies logistic regression to predict tweet label. The basic LSTM-TC model, like other deep learning models, requires a large amount of well-labeled training data to achieve good performance. To address this challenge, we further develop an improved model, called LSTM-TC*, that incorporates a large amount of weakly-labeled data for classifying tweets. We present two approaches of constructing the weakly-labeled data. One is based on hashtag information and the other is based on the prediction output of some traditional classifier that does not need a large amount of well-labeled training data. Our LSTM-TC* model first learns tweet representation based on the weakly-labeled data, and then trains the logistic regression classifier based on the small amount of well-labeled data. Experimental results show that: (1) the proposed method can be successfully used for tweets classification and outperform existing state-of-the-art methods, (2) pre-training tweet representation, which utilizes weakly-labeled tweets, can significantly improve the accuracy of tweets classification.


Principal Investigator: Xintao Wu

Acknowledgements: The authors acknowledge the support from the 973 Program of China (2014CB340404), the National Natural Science Foundation of China (71571136), the Basic Research Program of Shanghai (16JC1403000), and China Scholarship Council to Shuhan Yuan and Yang Xiang, and from National Science Foundation (1646654) to Xintao Wu.