Data

For the sake of transparency, all data, which has been processed during our project, can be accessed on our GitHub repository. Regarding our data, the table provides an overview of all the languages, which we considered, and the number of tweets, which we were able to acquire for the period 27th May to 2nd June. The number of tweets which we actually analysed is different due to some technical problems with Sentilo.

Search:

Language	Abbreviation	Term	Acquired	Analysed
Bulgarian	bg	Европейски избори	25	20
Croatian	hr	Europski izbori	99	29
Czech	cs	Evropské volby	126	109
Danish	da	EU-valget	72	58
Dutch	nl	Europese verkiezingen	1967	623
English	en	European elections	28787	4708
Estonian	et	Euroopa valimised	9	0
Finnish	fi	EU-vaalit	44	10
French	fr	Élections européennes	22991	1827
German	de	Europawahl	16730	10784
Greek	el	Ευρωεκλογές	5381	3523
Hungarian	hu	Európai választások	14	3
Irish	ga	Na toghcháin Eorpacha	3	0
Italian	it	Elezioni europee	12099	3191
Latvian	lv	Eiropas vēlēšanas	74	15
Lithuanian	lt	EP rinkimai	6	3
Maltese	mt	L-elezzjonijiet Ewropej	1	0
Polish	pl	Wybory europejskie	196	46
Portuguese	pt	Eleições europeias	1432	491
Romanian	ro	Alegerile europene	37	17
Slovak	sk	Európske voľby	4	3
Slovenian	sl	Evropske volitve	94	34
Spanish	es	Elecciones europeas	7185	2048
Swedish	sv	EU-valet	1596	562

Showing entries (filtered from total entries)

Twitter Data.

The Twitter API returns several data, such as a long list of attributes associated to the user, the text of the tweet, its source, its lang, the place associated to the tweet, how many times it has been retweeted, quoted and liked by other users, entities such as hashtags, urls, users’ mentions, media, symbols etc.

We decided to keep the following information:

date of creation of the tweet (created_at)
language of the tweet as identified by Twitter (lang)
name of the user (screen_name)
place where the tweet has been posted from (location)
text contained in the tweet (full_text)
links (urls)
hashtags used in the tweet (tags)
the mentions realized by the user through the @ (mentions)
number of times the tweet has been retweeted (retweet_count)
number of favourites (favorite_count)

Additional information we added with the preprocessing:

text of the tweet free from hashtags and mentions to be correctly analysed by Sentilo (parsed_text)
emojis associated to the tweet to enrich the final graph (emoji)

Besides, each tweet has finally been enriched with the average positive and/or average negative score assigned by Sentilo, in order to perform our sentiment analysis through the sparql queries and represent it in our graph.

Data

The foundation of our sentiment analysis.

Twitter Data.