This tutorial is based on a term paper I wrote for the course "Data Analysis & Evaluation" as part of my studies “Information Science” at Humboldt University Berlin. Even though there are already countless resources available online, I finally decided to publish this one, since equivalent examples and explanations have often helped me out in the past.
Table of Contents
Introduction
In our today's world, politics, science and society are aware of the dangers surrounding disinformation and "fake news" and work on developing solutions - a topic that plays an important role especially for the information sciences. The present tutorial should focus on this current debate, motivated by the following tweet by Donald Trump from December 11, 2014:
Policy towards our enemies: Hit them hard, hit them fast, hit them often & then tell them it was because they are the enemy!
— Donald J. Trump (@realDonaldTrump) December 11, 2014
The fact that Trump later repeatedly referred to the (primarily nonconservative) news media as the enemy of the American population is highlighted in the article "Defining the Enemy: How Donald Trump Frames the News Media" [Meeks 2019]. In general, news and media are a frequently addressed topic in Trumps tweets [Meeks 2019, p. 17] [Wang et al. 2016]. Therefore, an analysis might produce relevant and novel results. Our media system is currently undergoing a transformation from controlled top-down processes with journalistic gatekeepers to a more decentralized “hybrid media system” (according to A. Chadwick, "The Hybrid Media System: Politics and Power") in which digital and printed media coexist. Social media facilitate addressing target groups and disseminating information to a potentially unlimited audience so that they also support the "development of communities of populist, ethnonationalist, and anti-establishment sentiment" [ Wells et al. 2020].
Trump also uses so-called “framing” as a persuasion strategy: “the presence or absence of certain keywords, stock phrases, stereotyped images, sources of information, and sentences that provide thematically reinforcing clusters of facts or judgments” [Entman 1993, p. 52, quoted from Meeks (2019), p. 3]. Twitter's retweet function additionally supports framing:
“Twitter gives users an easy way to repeat Trump’s frames verbatim via the retweet function. […] Twitter’s ‘shareability’ enables Trump’s frames and influence to spread outward across peer networks, adding momentum to his framing.”
Meeks 2019, p. 7.
Therefore, the information producer is no longer responsible for his or her content alone, but responsibility also lies with the platform enabling these diverse functionalities for networking and information diffusion between actors. ↑
Preparation
This tutorial uses the programming language Python . The code snippets are extracted from a Jupyter Notebookwhich can be accessed via Github .
To analyze Twitter data I used a whole range of different Python libraries, for instance for formatting and processing of data as well as statistics and visualizations. In a first step, these libraries need to be installed with the help of a package manager such as pip or Conda, before they can be imported as follows:
#import all necessary modules
# general
import json
import pandas as pd
#numbers and time
import numpy as np
from collections import Counter
from datetime import datetime
# text
import re #regular expressions
import preprocessor as p #from tweet-preprocessor library, see https://pypi.org/project/tweet-preprocessor/
from textblob import TextBlob #text processing library, here used to extract sentiment, see also https://textblob.readthedocs.io/en/dev/
from nltk import ngrams
#stats
from scipy import stats
import researchpy as rp
# visualizations
import matplotlib.pyplot as plt
import plotly.graph_objects as go
import plotly.express as px
from plotly.colors import n_colors
import seaborn as sn
The dataset used here was retrieved from the Trump Twitter Archive . This collection consists of more than 50.000 tweets of the personal Twitter account by Donald Trump @realDonaldTrump and I would recommend reusing these data (for that specific use case). Contrary to the Twitter API (see Tweet Availability), the Trump Twitter Archive gives access to the majority of deleted tweets (a recommended blog post on that subject would also be Tweets and Deletes). The dataset was generated on 4th August 2020 by searching for the following keywords (exact word search):
- news
- media
- press
- fact
- facts
- information
- cnn
- nbc
- abc
- cbs
- nytimes
- newyorktimes
- ny times
- fox
Preprocessing
The next step is data preparation and data cleaning. It also means adding new columns by extracting existing information from our dataset. First things first - we need to load our dataset (which is a JSON-file right now) and convert it into a Pandas Dataframe . This will make later analysis much easier.
#read json file
with open("dataset_media.json", 'r',encoding="utf8") as f:
datastore = json.load(f)
# convert json to dataframe
df = pd.json_normalize(datastore)
df
source | text | created_at | retweet_count | favorite_count | is_retweet | id_str | |
---|---|---|---|---|---|---|---|
0 | Twitter for iPhone | RT @WhiteHouse: LIVE: President @realDonaldTru… | Mon Aug 03 21:36:46 +0000 2020 | 4971 | 0 | True | 1290401249251270663 |
1 | Twitter for iPhone | My visits last week to Texas and Florida had m… | Mon Aug 03 15:27:41 +0000 2020 | 24241 | 111167 | False | 1290308363872538624 |
2 | Twitter for iPhone | RT @realDonaldTrump: FAKE NEWS IS THE ENEMY OF… | Mon Aug 03 13:53:10 +0000 2020 | 78121 | 0 | True | 1290284578578419712 |
3 | Twitter for iPhone | Wow! Really bad TV Ratings for Morning Joe (@J… | Mon Aug 03 12:57:35 +0000 2020 | 15131 | 72174 | False | 1290270589945430016 |
4 | Twitter for iPhone | My visits last week to Texas and Frorida had m… | Mon Aug 03 12:46:02 +0000 2020 | 12976 | 59426 | False | 1290267685117460481 |
… | … | … | … | … | … | … | … |
4238 | Twitter Web Client | Donald Trump appearing today on CNN Internatio… | Wed Feb 10 15:17:56 +0000 2010 | 7 | 1 | False | 8905123688 |
4239 | Twitter Web Client | Celebrity Apprentice returns to NBC, Sunday, 3… | Tue Jan 12 18:05:08 +0000 2010 | 20 | 3 | False | 7677152231 |
4240 | Twitter Web Client | Reminder: The Miss Universe competition will b… | Sun Aug 23 21:12:37 +0000 2009 | 1 | 4 | False | 3498743628 |
4241 | Twitter Web Client | Watch the Miss Universe competition LIVE from … | Fri Aug 21 14:32:45 +0000 2009 | 1 | 3 | False | 3450626731 |
4242 | Twitter Web Client | Read a great interview with Donald Trump that … | Wed May 20 22:29:47 +0000 2009 | 4 | 3 | False | 1864367186 |
Additionally, I filtered out retweetsbecause they are not relevant for this analysis (Trumps own statements should be in focus).
A final pre-processing step concerns the time format. The datetime values were transformed into UTC format and individual years were extracted in a similar manner.
# transform col "created_at" into recognizable UTC time values as dates/date type
# see also https://stackoverflow.com/questions/7703865/going-from-twitter-date-to-python-datetime-date
# define function, using datetime module
def change_dates(dataframe, index):
for tweet in dataframe:
dataframe.iloc[index] = datetime.strftime(datetime.strptime(dataframe.iloc[index],'%a %b %d %H:%M:%S +0000 %Y'), '%Y-%m-%d %H:%M:%S')
index += 1
# call function for col "created_at"
change_dates(df_media["created_at"], 0)
Text Processing
By using regular expressions, hashtags and mentions could be identifiedwhich were saved in a seperate column for analysis and evaluation purposes.
# search for hashtags and mentions, put them in own columns
#hashtags
hashtag_re = re.compile("#(\w+)")
df_media['hashtags'] = np.where(df_media.text.str.contains(hashtag_re), df_media.text.str.findall(hashtag_re), "")
#mentions
mention_re = re.compile("@(\w+)")
df_media['mentions'] = np.where(df_media.text.str.contains(mention_re), df_media.text.str.findall(mention_re), "")
Below, a simplified sentiment analysis should be performed on all tweets. Therefore, the tweets needs to be cleaned from any noise, such as hashtags, mentions, urls, ... using the preprocessormodule so that they can't influence the results in any way.
df_media['cleaned_text'] = df_media["text"] #copy col text to cleaned_text
p.set_options(p.OPT.HASHTAG, p.OPT.URL, p.OPT.EMOJI, p.OPT.MENTION) #set options for cleaning: clean text from hashtags, mentions, urls and emojis
# define function clean_text() which will append the predefined options to each row
def clean_text(dataframe, index):
for row in dataframe:
dataframe.iloc[index] = p.clean(dataframe.iloc[index])
index += 1
# call clean_text() for col "cleaned_text"
clean_text(df_media["cleaned_text"], 0)
The following example uses the library Textblob for a sentiment analysis.
#create df containing polarity values for each tweet in the dataset
tweet_text = [] # create an empty list
for index, col in df_media.iterrows():
tweet_text.append(df_media.cleaned_text[index]) # append each tweet to it
sentiment_objects = [TextBlob(tweet) for tweet in tweet_text] # apply textblob for each tweet in tweet_text
sentiment_values = [[tweet.sentiment.polarity, str(tweet)] for tweet in sentiment_objects] # create a new list of polarity values and tweet text
sentiment_df = pd.DataFrame(sentiment_values, columns=["polarity", "tweet"]) # transform list into dataframe and sort the values accordingly
sentiment_df.sort_values("polarity", ascending=True)
We receive a list of all tweets, sorted by their polarity values:
polarity | tweet | |
---|---|---|
1756 | -1.0 | find the leakers within the FBI itself. Classi… |
2119 | -1.0 | FMR PRES of Mexico, Vicente Fox horribly used … |
2620 | -1.0 | If you look at the horrible picture on the fro… |
2908 | -1.0 | The media is pathetic. Our embassies are savag… |
1316 | -1.0 | Some people HATE the fact that I got along wel… |
… | … | … |
472 | 1.0 | Triggered, a great book by my son, Don. Now nu… |
972 | 1.0 | Great news! |
969 | 1.0 | Finally great news at the Border! |
1501 | 1.0 | Great news, as a result of our TAX CUTS & … |
696 | 1.0 | Jesse & Emma, Great News. Congratulations!… |
For being able to compare/analyse a tweet's polarity with other attributes from our dataframe, an additional column must be added.
df_media = df_media.reset_index(drop=True)
df_media['sentiment_values'] = pd.Series(sentiment_df['polarity'])
Alternatively, we can simply convert all polarity values into descriptions (but we still need to find fitting thresholds, in this code example I used: negative<= -0.33 < neutral <= 0.33 < positive).
df_media["sentiment"] = "" # new col
# define function to devide sentiment_values into sentiments ("negative","neutral","positive")
def get_sentiment(dataframe, index):
for tweet in dataframe:
if df_media.sentiment_values.iloc[index] <= -0.33:
df_media.sentiment.iloc[index] = "negative"
elif (df_media.sentiment_values.iloc[index] > -0.33) & (df_media.sentiment_values.iloc[index] <=0.33):
df_media.sentiment.iloc[index] = "neutral"
else:
df_media.sentiment.iloc[index] = "positive"
index += 1
# apply get_sentiment()
get_sentiment(df_media["sentiment_values"], 0)
Data Analysis
Descriptive Analysis
By using the pandas.Dataframe.describe() function, we can generate first key figures for minimum, maximum, average, standard deviation and quantiles. To visualize the distributions, we can implement a histogram or box plot.
The histograms shown above already indicate a right skewed distribution for retweets and likes in this dataset. For the polarity values calculated afterwards, we can observe an approximate normal distribution (although a majority of the values are close to 0 which might be due to the dictionary based approach). The following example shows how such a histogram can be created using matplotlib.pyplot.hist() .
# plot data
fig, ax = plt.subplots()
ax.hist(df_media.retweet_count, color = "blue", alpha=0.5, bins=20)
ax.hist(df_media.favorite_count, color = "aquamarine", alpha=0.5, bins=60)
# add labels, title, legend
ax.set_ylabel('Häufigkeit')
ax.set_xlabel('Anzahl der Retweets/Likes')
ax.set_title(r'Verteilungen retweet_count/favorite_count')
ax.legend()
#show plot
plt.show()
Box plots are another method to visualize the distribution of numerical values. The figure shows examples for the same three variables, created with plotly.graph_objects.Box. plotly.graph_objects.Box.
# boxplot favorite_count
data = df_media.favorite_count
# plot
fig = go.Figure()
fig.add_trace(go.Box(y=data, name="favorite_count", marker_color="lightseagreen", boxpoints="all", jitter=0.6,
pointpos=-2, marker=dict(opacity=0.5), marker_size=5, width=40))
# update layout
fig.update_layout(width=400, height=800, font=dict(color="black"), plot_bgcolor="white", xaxis = dict(gridcolor = "#ededed", gridwidth=3), yaxis=dict(gridcolor = "#ededed", gridwidth=3))
#show plot
fig.show()
The visualizations presented so far are applicable for metric variables. When dealing with nominal or ordinal data one may need to use alternative, simplier methods.
This sunburst chart contains all hashtags from our dataset, which were grouped into more general, heuristically built categories ("media", "media criticism", "political", "show", "other", "election campaign"). In addition to the expected hashtags for media ("media", "media criticism", "show"), many hashtags are included that were assigned to the category "election campaign" (e.g. #maga, #trump2016).
# plot
fig = px.sunburst(hashtags_df, path=["category", "hashtag"], values='frequency', color="frequency", color_continuous_scale=[[0,"white"],[1.0, "blue"]])
# update layout
fig.update_layout(width=900, height=900, uniformtext=dict(minsize=13, mode='hide'),coloraxis_showscale=False) # labels of hashtags with frequency=1 are not shown (by "minsize=13, mode='hide'")
# show plot
fig.show()
The bar chart represents the 10 most frequently mentioned accounts (mentions). It mainly consists of accounts of those news channels that were already used as terms for the generation of the dataset in the first place (@cnn, @nytimes, @foxnews, @nbc, @foxandfriends, @abc, @nbcnews and also @washingtonpost).
First, all of the mentions are transferred into a list, then the individual frequencies of these mentions are counted and afterwards the results are saved in a new dataframe and sorted by frequency (this step is similar for the hashtag example above).
mentions = [] # create empty list
# iterate through all mentions and append them to list
for index, col in df_media.iterrows():
mentions.append(df_media.mentions[index])
# change nested list to flat list
# see https://stackoverflow.com/questions/952914/how-to-make-a-flat-list-out-of-list-of-lists
flat_list = []
for sublist in mentions:
for item in sublist:
item = item.lower()
flat_list.append(item)
# create dict for mentions by counting occurences of each item, then transform mentions and frequencies into a dataframe (mentions_df)
mentions_dict = Counter(flat_list)
mentions_df = pd.DataFrame(list(mentions_dict.items()),columns = ["mention", "frequency"])
# sort mentions in descending order
mentions_df = mentions_df.sort_values(by="frequency", ascending=False).reset_index(drop=True)
Subsequently, the first ten mentions can be visualized accordingly:
data = mentions_df.iloc[:11]
# plot
fig = px.bar(data, x="mention", y="frequency")
fig.update_traces(marker_color='lightseagreen', opacity=.6, texttemplate='%{y}', textposition='outside')
# update layout
fig.update_layout(width=800, height=500, font=dict(size=12, color="black"), plot_bgcolor="white", xaxis = dict(gridcolor = "#ededed", gridwidth=1), yaxis=dict(gridcolor = "#ededed", gridwidth=1))
# show plot
fig.show()
Relationships between variables
Scatter plots can be employed to show relationships between variables. For these visualizations, I used the library Plotly once again. Calculations of statistical values were done with the help of the Python library researchpy . Researchpy returns different statistical summaries and is based on statistical functions of scipy.stats .
The figure above shows the attributes source (device/application) and created_at (publication date/time of the tweet) in a scatter plot. The colours represent the polarity values of each tweet. We can assume, that the three most frequently used channels (Twitter for Android, Twitter Web Client and Twitter for iPhone) differ mainly in their temporal usage. This temporal subdivision according to the "tweet sources" used for sending a tweet corresponds to the observations of [Clarke/Grieve 2019, p. 5f.] and their extensive data.
df = df_media
# plot
fig = px.scatter(df, x="created_at", y="source",color="sentiment_values", color_continuous_scale=["orangered","greenyellow","blue"], opacity=.5)
# update layout
fig.update_layout(font=dict(size=12, color="black"), plot_bgcolor="white", xaxis = dict(gridcolor = "#ededed", gridwidth=2), yaxis=dict(gridcolor = "#ededed", gridwidth=2), width=1200)
# show plot
fig.show()
Since source is a nominal variable, the Chi-squared test is applied here. It should be ensured that the expected frequencies are above 5 for each domain, but in any case "the proportion of expected frequencies that are less than 5 should not exceed 20%" [Bortz 2005, p. 177, freely translated]. In this example, the dataset was reduced by any rarely occurring values (all "sources" except Twitter Web Client, Twitter for Android and Twitter for iPhone; the year 2009) to fulfill this requirement.
dataset | Pearson Chi-squared | p-value | Cramers V |
---|---|---|---|
original (df=143.0) | 4984.11 | 0.00 | 0.38 |
adjusted (df=20.0) | 3236.34 | 0.00 | 0.73 |
For the adjusted data (df=20) a strong effect (V = 0.73) can be observed. This makes it possible to assume that at least between the most frequently used devices and the time intervals in years a certain correlation can be found in the dataset. The Chi-squared test was performed using the crosstab() function.
crosstab, res, expected = rp.crosstab(df_media.source, df_media.year,prop="cell",test="chi-square",correction=True, cramer_correction=True, expected_freqs=True)
These scatter plots show the distributions of retweets and likes over time. While at first glance they both appear to be very similar, the values for favorite_count reach a higher scale. Most striking is the strong increase of both retweets and likes for tweets starting from 2016, at the time of Trump's presidential candidacy. The following table summarizes the correlation values, calculated with parametric (Pearson) and non-parametric (Spearman, Kendall) methods for the variables retweet_count and year, favorite_count and year as well as favorite_count and retweet_count .
variables | method | r-value | p-value |
---|---|---|---|
retweet_count x year | Pearson | 0.69 | 0.00 |
Spearman | 0.75 | 0.00 | |
Kendall | 0.59 | 0.00 | |
favorite_count x year | Pearson | 0.71 | 0.00 |
Spearman | 0.78 | 0.00 | |
Kendall | 0.64 | 0.00 | |
favorite_count x retweet_count | Pearson | 0.96 | 0.00 |
Spearman | 0.98 | 0.00 | |
Kendall | 0.88 | 0.00 |
The correlation coefficients can be calculated with researchpy.corr_pair() :
rp.corr_pair(df_media[["retweet_count","favorite_count"]], method="kendall")
The values for the correlation of retweet_count x year and favorite_count x year recorded in the table imply a strong and positive linear correlation. However, the variables retweet_count and favorite_count with their extremely high correlation values could be an indication of a spurious relationship :
"A relationship between two variables that is caused by a statistical artifact or a factor, not included in the model, that is related to both variables."
Downey 2014, S. 143
This means the correlation might not imply causation as well. Probably a confounding variable ("confounder"), such as the number of followers increasing over time, is responsible for a similarly high number of retweets and likes [see Wang et al. 2016, p. 721]. ↑
Additional Analysis
The selection of examined relations between variables presented here allows the assumption that there is a certain correlation between different variables and the temporal component (e.g. by increasing numbers of followers or strategic adjustments over time). The temporal dependency of linguistic change in Trump's tweets has already been discussed in detail. [Clarke/Grieve 2019] Further analysis could focus on determining how the content Trump publishes on Twitter varies and how the Twitter community reacts to it. One research question could therefore be:
What are the most frequent statements (measured in n-grams) in the dataset and how are they distributed over time?
To address this question, all tweets need to be segmented into so-called n-gramsfirst, here 5-grams ("pentagrams"). For this step I used the module ngrams of the NLTK (Natural Language Toolkit).
# copy df for this step, it gets messy
# df_media should stay as it is (deep=True)
df_ngrams = df_media.copy(deep=True)
#create a new column "ngrams"
df_ngrams["ngrams"] = ""
#def function to iterate through df and get all 5-grams within "cleaned_text"
def get_ngrams(dataframe, index):
for tweet in dataframe:
list_ngrams = []
n = 5
fivegrams = list(ngrams(df_ngrams.cleaned_text.iloc[index].split(), n))
dataframe.iloc[index] = fivegrams
index += 1
# call get_ngrams()
get_ngrams(df_ngrams["ngrams"], 0)
In order to actually evaluate the n-grams, the nested lists need to be dissolved. However, splitting them with pandas.DataFrame.explode()does not solve the problem, because it splits all contained lists until nothing but seperate words are left over. Deviding a list only on its top level can be achieved by implementing the tidy_split() function (see on Github):
# call function tidy_split() and separate all ngrams per tweet from each other
df_grams = tidy_split(df_ngrams,"ngrams",sep="), (")
The following table presents the 10 most frequent pentagrams .
Pentagramm | Anzahl |
---|---|
'the', 'failing', 'new', 'york', 'times' | 30 |
'the', 'enemy', 'of', 'the', 'people!' | 25 |
'the', 'fake', 'news', 'media', 'is' | 19 |
'be', 'on', 'fox', '&', 'friends' | 15 |
'will', 'be', 'on', 'fox', '&' | 15 |
'i', 'will', 'be', 'having', 'a' | 13 |
'be', 'doing', 'fox', '&', 'friends' | 12 |
'is', 'the', 'enemy', 'of', 'the' | 12 |
'the', 'history', 'of', 'our', 'country.' | 11 |
Reflecting upon these most frequent text fragments within the context of their date of publication, one aspect becomes even clearer now: at least after Trump's inauguration, the language and content of his tweets seem to change as well. The results of Clarke/Grieve (2019) coincide with this statement:
"All four dimensions showed clear temporal patterns and most major shifts in style align to a small number of indisputably important points in the Trump timeline, especially the 2011 Birther controversy, the 2012 election, his 2015 declaration, his 2016 Republican nomination, the 2016 election, and his 2017 inauguration, as well as the seasons of his television series The Apprentice."
Clarke/Grieve 2019, S. 19
It should be noted that the figure consists of a very small fraction of all n-grams (a total of 75756 N-grams were extracted from the dataset). Nevertheless, it can be assumed that earlier tweets are more likely to deal with media in the sense of entertainment television (here: Fox & Friends) while the number of distancing and incriminating statements towards media (keywords would be "enemy", "fake news media", "failing") is obviously increasing, especially from 2017 onwards. For example, Trump antagonizes "the fake news" or "fake news media" (he often calls them "the enemy of the people"):
CNN and others in the Fake News Business keep purposely and inaccurately reporting that I said the “Media is the Enemy of the People.” Wrong! I said that the “Fake News (Media) is the Enemy of the People,” a very big difference. When you give out false information - not good!
— Donald J. Trump (@realDonaldTrump) October 30, 2018
Of course, these observations alone are not proof enough and further analysis would have to be carried out in order to consolidate corresponding assumptions. ↑
Conclusion
In conclusion, I would like to summarize some key findings.
1. Trump pursues a clear communication strategy on Twitter. Indicators can be the temporal dependency of various aspects - for instance, the devices/applications used to publish tweets as well as the diverging content of the tweets. In general, a distinct increase in tweets can be observed from 2016 onwards. Important events are likely to have an impact on the content and style of the tweets and could serve as the subject of later studies. [Clarke/Grieve 2019]
2. Trump's media criticism on Twitter is of a strong political nature, it is used for election campaign purposes. Hashtags and terms extracted from the tweets contain critical and/or attacking statements against the media and news as well as slogans such as #makeamericagreatagain / #maga and #trump2016, later #kag ("Keep America Great"). By repeating certain phrases such as "'fake news', 'failing' and 'enemy of the American People'" [ Meeks 2019, p. 5], he frames distinct media channels and even the entire media system. Thereby, Trump is receiving a lot of feedback on social media as well as in more traditional media.
3. By exploiting the hybrid media system, Trump aims for a broad media coverage. The sharing of information in social media such as Twitter ensures that users can give their approval to a specific information or opinion. Information can be passed on in ideologically oriented communities, but also beyond them - which can then result in the attention of traditional news media. [Wells et al. 2020,p. 664f.]
"Furthermore, Trump tweeted more at times when he had recently garnered less of a relative advantage in news attention, suggesting he strategically used Twitter to trigger coverage."
Wells et al. 2020, S. 559
Accordingly, Trump does not only attack political opponents with his tweets, but he also attacks the media [Clarke/Grieve 2019, p. 20] [Wang et al. 2016, p. 719][Wells et al. 2020, p. 661] by using frames to reinforce an already existing mistrust of the population [Meeks 2019, p. 5]. Ultimately, this media image created by Trump and disseminated via Twitter could also have an impact on the public perception of media worldwide [Meeks 2019, p. 19]. ↑
Literature
Bortz, Jürgen (2005): Statistik für Human- und Sozialwissenschaftler, 6. Aufl., Berlin/Heidelberg: Springer.
Clarke, Isobelle & Grieve, Jack (2019): Stylistic variation on the Donald Trump Twitter account: A linguistic analysis of tweets posted between 2009 and 2018, in: PLoS One 14(9), S. 1-27. https://doi.org/10.1371/journal.pone.0222062
Downey, Allen B. (2014): Think Stats. Exploratory Data Analysis, 2. Aufl., O'Reilly Media, Inc.
Entman, Robert M. (1993): Framing: Toward clarification of a fractured paradigm, in Journal of Communication 43(4), S. 51-58. https://doi.org/10.1111/j.1460-2466.1993.tb01304.x
Meeks, Lindsey (2019): Defining the Enemy: How Donald Trump Frames the News Media, in Journalism & Mass Communication Quarterly 97(1), S. 211-234. https://doi.org/10.1177/1077699019857676
Wang, Yu; Luo, Jiebo; Niemi, Richard; Li, Yuncheng & Hu, Tianran (2016): Catching Fire via "Likes": Inferring Topic Preferences of Trump Followers on Twitter, in Proceedings of the Tenth International AAAI Conference on Web and Social Media (ICWSM 2016), S. 719-722.
Wells, Chris; Shav, Dhavan; Lukito, Josephine; Pelled, Ayellet; Pevehouse, Jon CW & Yang, Jung Hwan (2020): Trump, Twitter, and news media responsiveness: A media system approach, in new media & society 22(4), S. 659-682. https://doi.org/10.1177/1461444819893987