Workflow

Let us walk you through all steps:
From acquiring data to visualising results.

Acquiring Data

We created our dataset through the Twitter API. We retrieved only the information we found interesting for our project, such as the date in which the tweet was created, the language used, the name of the user and the location from which the tweet was sent, the full text of the tweet, links, hashtags, mentions and we counted retweets and favourites. Check the full code in the Acquiring tab of our Code subpage

Cleaning Data

In the preprocessing phase we cleaned out links, emoji, hashtags and mentions from the full text by parsing it, so to be correctly analysed by Sentilo. Check out the Cleaning tab in the Code subpage to learn more about the code we used for the cleaning.

Translating Tweets

Since the tool we used to perform the sentiment analysis takes as input sentences exclusively in English, it was necessary to translate our tweets written in the 24 official languages of the EU - check the list - into English. In order to automatically translate the huge amount of tweets we have gathered, we utilized the Yandex.Translate API, which allowed us to perform the translation of our dataset automatically. See the full code for the translation in the Translating tab in our Code subpage.

Analysing Sentiment

The tool we used to perform the sentiment analysis is called Sentilo. Sentilo performs sentence-based sentiment analysis. It takes a sentence as input as gives you back the positive or negative and average score. Additionally, Sentilo relies on FRED to automatically mine a knowledge graph for the sentence. FRED is a succesful machine reader for the Semantic Web. The knowledge graphs are given back in various possible formats, such as png, rdf, xml, json, turtle, N3, NT.
Our problem concerned finding a way to analyse in an automatic way the parsed text of every tweet we had previously collected. For this reason, we created a script to send a get request to Sentilo with the tweet to get back the knowledge graph. Check the script in the Analysing tab under our Code subpage.

Structuring Information

The output provided by Sentilo has been processed to keep only the relevant data according to our needs. As Sentilo returned the sentiment analysis for each tweet, the tweets with their useful information have been gradually integrated into the construction of the final Knowledge Graph. The sentiment analysis of tweets, their cleaning and the merging of the single graphs for each tweet in one single named graph have been realized in the same code, that you can find in the Analysing tab under our Code subpage.

Gaining Knowledge

Fuseki
Apache Jena Fuseki is the SPARQL server we used to create and run SPARQL queries over our dataset. We run Fuseki as a Java web application (WAR file) by means of Apache Tomcat .

Sparql queries
We provide some examples of possible SPARQL queries, which the user could make to retrieve relevant information from our final knowledge graph containing the data from all the tweets. The queries focus on the core aspects of our project:

the language as the criterion used for the collection of tweets;

the positive and negative score returned by Sentilo as the result of the sentiment analysis;

the quantity of tweets to make some statistics.

Check out our queries in the Queries subpage.

Visualising Results

In addition to the ten SPARQL queries that we created to explore our dataset and that you can find in the "Other SPARQL queries" subsection of the Queries subpage, we created and used other three SPARQL queries to build some visualisations.
The visualisations provide a more intuitive representation of the results of our sentiment analysis. Check them out in the Queries subsection of the Queries subpage. We used Chart.js and D3SPARQL, which is based on D3.js, to build the charts.