Evaluating The Impact of Removing Less Important Terms on Sentiment Analysis

150 150 MIMOS Berhad


Salhana Amad Darwis, Duc Nghia Pham, Ang Jia Pheng, and Ong Hong Hoe



Sentiment analysis is an important taskin Natural Language Processing (NLP) that analyses and predicts people¶s opinion from te[tualdata. It is a complex process due to the interactions with computer science, linguistics, psychology and social science disciplines. There is no straight forward rule to analyse and predict sentiment. Supervised learning methods, which adopt learning models from human, are being widely used by NLP researchers and experts to predict sentiment. However, this approach is tricky due to the challenges in ensuring the quality of the manually labelled training dataset. In this study, we investigated the use of lingXistic factors to improYe the model¶s accuracy. We gathered two datasets: (i) 125,000 annotated sentences from Amazon product reviews, and (ii) 11,250 annotated sentences from financial news articles. We then pre-processed the data, identified the less importantterms that exist in the dataset, the linguistic featuresand their effect towards the correctness of predictedsentiment. Our experimental results showed thatpunctuation separationand removal ofsupporting POS words improves precisionaccuracyin larger-generic dataset rather than in smaller-context sensitive dataset.



6th International Conference on Artificial Intelligence and Computer Science (AICS2019), Wuhan, Hubei, China