May 16 2018 cs.CY
In today's news ecosystem, news sources emerge frequently and can vary widely in intent. This intent can range from benign to malicious, with many tactics being used to achieve their goals. One lesser studied tactic is content republishing, which can be used to make specific stories seem more important, create uncertainty around an event, or create a perception of credibility for unreliable news sources. In this paper, we take a first step in understanding this tactic by exploring verbatim content copying across 92 news producers of various characteristics. We find that content copying occurs more frequently between like-audience sources (eg. alternative news, mainstream news, etc.), but there consistently exists sparse connections between these communities. We also find that despite articles being verbatim, the headlines are often changed. Specifically, we find that mainstream sources change more structural features, while alternative sources change many more content features, often changing the emotional tone and bias of the titles. We conclude that content republishing networks can help identify and label the intent of brand-new news sources using the tight-knit community they belong to. In addition, it is possible to use the network to find important content producers in each community, producers that are used to amplify messages of other sources, and producers that distort the messages of other sources.
Mar 28 2018 cs.CY
The complexity and diversity of today's media landscape provides many challenges for researchers studying news producers. These producers use many different strategies to get their message believed by readers through the writing styles they employ, by repetition across different media sources with or without attribution, as well as other mechanisms that are yet to be studied deeply. To better facilitate systematic studies in this area, we present a large political news data set, containing over 136K news articles, from 92 news sources, collected over 7 months of 2017. These news sources are carefully chosen to include well-established and mainstream sources, maliciously fake sources, satire sources, and hyper-partisan political blogs. In addition to each article we compute 130 content-based and social media engagement features drawn from a wide range of literature on political bias, persuasion, and misinformation. With the release of the data set, we also provide the source code for feature computation. In this paper, we discuss the first release of the data set and demonstrate 4 use cases of the data and features: news characterization, engagement characterization, news attribution and content copying, and discovering news narratives.
May 09 2017 cs.SI
Increasingly people form opinions based on information they consume on online social media. As a result, it is crucial to understand what type of content attracts people's attention on social media and drive discussions. In this paper we focus on online discussions. Can we predict which comments and what content gets the highest attention in an online discussion? How does this content differ from community to community? To accomplish this, we undertake a unique study of Reddit involving a large sample comments from 11 popular subreddits with different properties. We introduce a large number of sentiment, relevance, content analysis features including some novel features customized to reddit. Through a comparative analysis of the chosen subreddits, we show that our models are correctly able to retrieve top replies under a post with great precision. In addition, we explain our findings with a detailed analysis of what distinguishes high scoring posts in different communities that differ along the dimensions of the specificity of topic and style, audience and level of moderation.
Mar 31 2017 cs.SI
Today, users are reading the news through social platforms. These platforms are built to facilitate crowd engagement, but not necessarily disseminate useful news to inform the masses. Hence, the news that is highly engaged with may not be the news that best informs. While predicting news popularity has been well studied, it has not been studied in the context of crowd manipulations. In this paper, we provide some preliminary results to a longer term project on crowd and platform manipulations of news and news popularity. In particular, we choose to study known features for predicting news popularity and how those features may change on reddit.com, a social platform used commonly for news aggregation. Along with this, we explore ways in which users can alter the perception of news through changing the title of an article. We find that news on reddit is predictable using previously studied sentiment and content features and that posts with titles changed by reddit users tend to be more popular than posts with the original article title.
The problem of fake news has gained a lot of attention as it is claimed to have had a significant impact on 2016 US Presidential Elections. Fake news is not a new problem and its spread in social networks is well-studied. Often an underlying assumption in fake news discussion is that it is written to look like real news, fooling the reader who does not check for reliability of the sources or the arguments in its content. Through a unique study of three data sets and features that capture the style and the language of articles, we show that this assumption is not true. Fake news in most cases is more similar to satire than to real news, leading us to conclude that persuasion in fake news is achieved through heuristics rather than the strength of arguments. We show overall title structure and the use of proper nouns in titles are very significant in differentiating fake from real. This leads us to conclude that fake news is targeted for audiences who are not likely to read beyond titles and is aimed at creating mental associations between entities and claims.