A Novel Approach for Sentiment Analysis on Social Data

Gathering public opinion by analyzing big social data has attracted wide attention due to its interactive and real time nature. For this, recent studies have relied on both social media and sentiment analysis in order to accompany big events by tracking people’s behavior. In this paper, we propose an adaptable sentiment analysis approach that analyzes social media posts and extracts user’s opinion in real-time.

The proposed approach consists of first constructing a dynamic dictionary of words’ polarity based on a selected set of hashtags related to a given topic, then, classifying the tweets under several classes by introducing new features that strongly fine-tune the polarity degree of a post. To validate our approach, we classified the tweets related to the 2016 US election. The results of prototype tests have performed a good accuracy in detecting positive and negative classes and their sub-classes.

Introduction

Social media and its corresponding applications allow millions of users to express and spread their opinions about a topic and show their attitudes by liking or disliking content. All these constantly accumulating actions on social media generate high-volume, high-velocity, high-variety, high-value, high-variability data termed as big social data. In general, this kind of data refers to massive set of opinions that could be processed to determine people tendencies in the digital realm.

Several researchers have shown a keen interest in the exploitation of big social data in order to describe, determine and predict human behaviors in several domains [10, 26]. Processing this kind involve various research avenues, particularly, text analysis. In fact, almost 80% of internet data is text [23], therefore, text analysis has become key element for public sentiment and opinion elicitation. Sentiment analysis, which is also called opinion mining, aims to determine people’s sentiment about a topic by analyzing their posts and different actions on social media. Then, it consists of classifying the posts polarity into different opposite feelings such as positive, negative and so on.

Sentiment analysis could be divided into two main categories:

Lexicon analysis aims to calculate the polarity of a document from the semantic orientation of words or phrases in the document. However, applications based on lexicon analysis do not consider the studied context.
Machine learning (ML) involves building models from labeled training dataset (instances of texts or sentences) in order to determine the orientation of a document. Studies that used this type of methods have been carried out on a specific topic.

These two analysis methods have been widely used on big social data to gather public opinion in order to asssess internauts satisfaction of a subject (services, products, events, topics or persons) in several domains including politics [3], marketing [4] and health [7]. However, the results are varying, sometimes concluding with a reasonable degree of accuracy and sometimes are not. The failure is generally due to the opinion mining challenges such as the semantic orientation of a word which could change depending on the context. In this paper, we aim to tackle semantic analysis by introducing a novel adaptable approach that relies on social media posts and big data architecture to analyze internauts’ behaviors and feelings toward a subject in real-time. The proposed approach is based on three stages as shown in Fig. 1 (See above.)

In order to validate our proposed approach, we built a prototype and conducted a study on analyzing the 2016 US election related tweets to find out which candidate is the favorite.

The remainder of this paper is organized as follows. In the second section, work related to analyze social media data and its correlation with trends are explored. In the third section, we highlight the theoretical basis on textual analysis. Section four presents an overview of the proposed method. The experimental methodology and results are presented in “Methods and experience: US presidential elections” and “Results” sections respectively.

See the source article in the Journal of Big Data.