As people increasingly use social media as a source for news consumption, its unmoderated nature enables the diffusion of hoaxes, which in turn jeopardises the credibility of information gathered from social media platforms. To mitigate this problem, we study the development of a hoax detection system that can distinguish true and false reports early on. We introduce a semi-automated approach that leverages the Wikidata knowledge base to build large-scale datasets for veracity classification, which enables us to create a dataset with 4,007 reports including over 13 million tweets, 15% of which are fake. We describe a method for learning class-specific word representations using word embeddings, which we call multiw2v. Our approach achieves competitive results with F1 scores over 72% within 10 minutes of the first tweet being posted, outperforming other baselines. Our dataset represents a realistic scenario with a real distribution of true and false stories, which we release for further use as a benchmark in future research.
Rumour stance classification, defined as classifying the stance of specific social media posts into one of supporting, denying, querying or commenting on an earlier post, is becoming of increasing interest to researchers. While most previous work has focused on using individual tweets as classifier inputs, here we report on the performance of sequential classifiers that exploit the discourse features inherent in social media interactions or 'conversational threads'. Testing the effectiveness of four sequential classifiers -- Hawkes Processes, Linear-Chain Conditional Random Fields (Linear CRF), Tree-Structured Conditional Random Fields (Tree CRF) and Long Short Term Memory networks (LSTM) -- on eight datasets associated with breaking news stories, and looking at different types of local and contextual features, our work sheds new light on the development of accurate stance classifiers. We show that sequential classifiers that exploit the use of discourse properties in social media conversations while using only local features, outperform non-sequential classifiers. Furthermore, we show that LSTM using a reduced set of features can outperform the other sequential classifiers; this performance is consistent across datasets and across types of stances. To conclude, our work also analyses the different features under study, identifying those that best help characterise and distinguish between stances, such as supporting tweets being more likely to be accompanied by evidence than denying tweets. We also set forth a number of directions for future research.
Sep 28 2017 cs.DL
With social media datasets being increasingly shared by researchers, it also presents the caveat that those datasets are not always completely replicable. Having to adhere to requirements of platforms like Twitter, researchers cannot release the raw data and instead have to release a list of unique identifiers, which others can then use to recollect the data from the platform themselves. This leads to the problem that subsets of the data may no longer be available, as content can be deleted or user accounts deactivated. To quantify the impact of content deletion in the replicability of datasets in a long term, we perform a longitudinal analysis of the persistence of 30 Twitter datasets, which include over 147 million tweets. Having the original datasets collected between 2012 and 2016, and recollecting them later by using the tweet IDs, we look at four different factors that quantify the extent to which recollected datasets resemble original ones: completeness, representativity, similarity and changingness. Even though the ratio of available tweets keeps decreasing as the dataset gets older, we find that the textual content of the recollected subset is still largely representative of the whole dataset that was originally collected. The representativity of the metadata, however, keeps decreasing over time, both because the dataset shrinks and because certain metadata, such as the users' number of followers, keeps changing. Our study has important implications for researchers sharing and using publicly shared Twitter datasets in their research.
Media is full of false claims. Even Oxford Dictionaries named "post-truth" as the word of 2016. This makes it more important than ever to build systems that can identify the veracity of a story, and the kind of discourse there is around it. RumourEval is a SemEval shared task that aims to identify and handle rumours and reactions to them, in text. We present an annotation scheme, a large dataset covering multiple topics - each having their own families of claims and replies - and use these to pose two concrete challenges as well as the results achieved by participants on these challenges.
Despite the increasing use of social media platforms for information and news gathering, its unmoderated nature often leads to the emergence and spread of rumours, i.e. pieces of information that are unverified at the time of posting. At the same time, the openness of social media platforms provides opportunities to study how users share and discuss rumours, and to explore how natural language processing and data mining techniques may be used to find ways of determining their veracity. In this survey we introduce and discuss two types of rumours that circulate on social media; long-standing rumours that circulate for long periods of time, and newly-emerging rumours spawned during fast-paced events such as breaking news, where reports are released piecemeal and often with an unverified status in their early stages. We provide an overview of research into social media rumours with the ultimate goal of developing a rumour classification system that consists of four components: rumour detection, rumour tracking, rumour stance classification and rumour veracity classification. We delve into the approaches presented in the scientific literature for the development of each of these four components. We summarise the efforts and achievements so far towards the development of rumour classification systems and conclude with suggestions for avenues for future research in social media mining for detection and resolution of rumours.
Social media and data mining are increasingly being used to analyse political and societal issues. Here we undertake the classification of social media users as supporting or opposing ongoing independence movements in their territories. Independence movements occur in territories whose citizens have conflicting national identities; users with opposing national identities will then support or oppose the sense of being part of an independent nation that differs from the officially recognised country. We describe a methodology that relies on users' self-reported location to build datasets for three territories -- Catalonia, the Basque Country and Scotland -- and we test language-independent classifiers using four types of features. We show the effectiveness of the approach to build large annotated datasets, and the ability to achieve accurate, language-independent classification performances ranging from 85% to 97% for the three territories under study. A data analysis shows the existence of echo chambers that isolate opposing national identities from each other.
Social media and user-generated content (UGC) are increasingly important features of journalistic work in a number of different ways. However, their use presents major challenges, not least because information posted on social media is not always reliable and therefore its veracity needs to be checked before it can be considered as fit for use in the reporting of news. We report on the results of a series of in-depth ethnographic studies of journalist work practices undertaken as part of the requirements gathering for a prototype of a social media verification 'dashboard' and its subsequent evaluation. We conclude with some reflections upon the broader implications of our findings for the design of tools to support journalistic work.
Breaking news leads to situations of fast-paced reporting in social media, producing all kinds of updates related to news stories, albeit with the caveat that some of those early updates tend to be rumours, i.e., information with an unverified status at the time of posting. Flagging information that is unverified can be helpful to avoid the spread of information that may turn out to be false. Detection of rumours can also feed a rumour tracking system that ultimately determines their veracity. In this paper we introduce a novel approach to rumour detection that learns from the sequential dynamics of reporting during breaking news in social media to detect rumours in new stories. Using Twitter datasets collected during five breaking news stories, we experiment with Conditional Random Fields as a sequential classifier that leverages context learnt during an event for rumour detection, which we compare with the state-of-the-art rumour detection system as well as other baselines. In contrast to existing work, our classifier does not need to observe tweets querying a piece of information to deem it a rumour, but instead we detect rumours from the tweet alone by exploiting context learnt during the event. Our classifier achieves competitive performance, beating the state-of-the-art classifier that relies on querying tweets with improved precision and recall, as well as outperforming our best baseline with nearly 40% improvement in terms of F1 score. The scale and diversity of our experiments reinforces the generalisability of our classifier.
Rumour stance classification, the task that determines if each tweet in a collection discussing a rumour is supporting, denying, questioning or simply commenting on the rumour, has been attracting substantial interest. Here we introduce a novel approach that makes use of the sequence of transitions observed in tree-structured conversation threads in Twitter. The conversation threads are formed by harvesting users' replies to one another, which results in a nested tree-like structure. Previous work addressing the stance classification task has treated each tweet as a separate unit. Here we analyse tweets by virtue of their position in a sequence and test two sequential classifiers, Linear-Chain CRF and Tree CRF, each of which makes different assumptions about the conversational structure. We experiment with eight Twitter datasets, collected during breaking news, and show that exploiting the sequential structure of Twitter conversations achieves significant improvements over the non-sequential methods. Our work is the first to model Twitter conversations as a tree structure in this manner, introducing a novel way of tackling NLP tasks on Twitter conversations.
Social media tend to be rife with rumours while new reports are released piecemeal during breaking news. Interestingly, one can mine multiple reactions expressed by social media users in those situations, exploring their stance towards rumours, ultimately enabling the flagging of highly disputed rumours as being potentially false. In this work, we set out to develop an automated, supervised classifier that uses multi-task learning to classify the stance expressed in each individual tweet in a rumourous conversation as either supporting, denying or questioning the rumour. Using a classifier based on Gaussian Processes, and exploring its effectiveness on two datasets with very different characteristics and varying distributions of stances, we show that our approach consistently outperforms competitive baseline classifiers. Our classifier is especially effective in estimating the distribution of different types of stance associated with a given rumour, which we set forth as a desired characteristic for a rumour-tracking system that will warn both ordinary users of Twitter and professional news practitioners when a rumour is being rebutted.
The selection of a suitable document representation approach plays a crucial role in the performance of a document clustering task. Being able to pick out representative words within a document can lead to substantial improvements in document clustering. In the case of web documents, the HTML markup that defines the layout of the content provides additional structural information that can be further exploited to identify representative words. In this paper we introduce a fuzzy term weighing approach that makes the most of the HTML structure for document clustering. We set forth and build on the hypothesis that a good representation can take advantage of how humans skim through documents to extract the most representative words. The authors of web pages make use of HTML tags to convey the most important message of a web page through page elements that attract the readers' attention, such as page titles or emphasized elements. We define a set of criteria to exploit the information provided by these page elements, and introduce a fuzzy combination of these criteria that we evaluate within the context of a web page clustering task. Our proposed approach, called Abstract Fuzzy Combination of Criteria (AFCC), can adapt to datasets whose features are distributed differently, achieving good results compared to other similar fuzzy logic based approaches and TF-IDF across different datasets.
In contrast to much previous work that has focused on location classification of tweets restricted to a specific country, here we undertake the task in a broader context by classifying global tweets at the country level, which is so far unexplored in a real-time scenario. We analyse the extent to which a tweet's country of origin can be determined by making use of eight tweet-inherent features for classification. Furthermore, we use two datasets, collected a year apart from each other, to analyse the extent to which a model trained from historical tweets can still be leveraged for classification of new tweets. With classification experiments on all 217 countries in our datasets, as well as on the top 25 countries, we offer some insights into the best use of tweet-inherent features for an accurate country-level classification of tweets. We find that the use of a single feature, such as the use of tweet content alone -- the most widely used feature in previous work -- leaves much to be desired. Choosing an appropriate combination of both tweet content and metadata can actually lead to substantial improvements of between 20\% and 50\%. We observe that tweet content, the user's self-reported location and the user's real name, all of which are inherent in a tweet and available in a real-time scenario, are particularly useful to determine the country of origin. We also experiment on the applicability of a model trained on historical tweets to classify new tweets, finding that the choice of a particular combination of features whose utility does not fade over time can actually lead to comparable performance, avoiding the need to retrain. However, the difficulty of achieving accurate classification increases slightly for countries with multiple commonalities, especially for English and Spanish speaking countries.
Nov 25 2015 cs.SI
As breaking news unfolds people increasingly rely on social media to stay abreast of the latest updates. The use of social media in such situations comes with the caveat that new information being released piecemeal may encourage rumours, many of which remain unverified long after their point of release. Little is known, however, about the dynamics of the life cycle of a social media rumour. In this paper we present a methodology that has enabled us to collect, identify and annotate a dataset of 330 rumour threads (4,842 tweets) associated with 9 newsworthy events. We analyse this dataset to understand how users spread, support, or deny rumours that are later proven true or false, by distinguishing two levels of status in a rumour life cycle i.e., before and after its veracity status is resolved. The identification of rumours associated with each event, as well as the tweet that resolved each rumour as true or false, was performed by a team of journalists who tracked the events in real time. Our study shows that rumours that are ultimately proven true tend to be resolved faster than those that turn out to be false. Whilst one can readily see users denying rumours once they have been debunked, users appear to be less capable of distinguishing true from false rumours when their veracity remains in question. In fact, we show that the prevalent tendency for users is to support every unverified rumour. We also analyse the role of different types of users, finding that highly reputable users such as news organisations endeavour to post well-grounded statements, which appear to be certain and accompanied by evidence. Nevertheless, these often prove to be unverified pieces of information that give rise to false rumours. Our study reinforces the need for developing robust machine learning techniques that can provide assistance for assessing the veracity of rumours.
Inspired by a European project, PHEME, that requires the close analysis of Twitter-based conversations in order to look at the spread of rumors via social media, this paper has two objectives. The first of these is to take the analysis of microblogs back to first principles and lay out what microblog analysis should look like as a foundational programme of work. The other is to describe how this is of fundamental relevance to Human-Computer Interaction's interest in grasping the constitution of people's interactions with technology within the social order. Our critical finding is that, despite some surface similarities, Twitter-based conversations are a wholly distinct social phenomenon requiring an independent analysis that treats them as unique phenomena in their own right, rather than as another species of conversation that can be handled within the framework of existing Conversation Analysis. This motivates the argument that Microblog Analysis be established as a foundationally independent programme, examining the organizational characteristics of microblogging from the ground up. We articulate how aspects of this approach have already begun to shape our design activities within the PHEME project.
Aug 25 2015 cs.SI
This work is motivated by the dearth of research that deals with social media content created from the Basque Country or written in Basque language. While social fingerprints during events have been analysed in numerous other locations and languages, this article aims to fill this gap so as to initiate a much-needed research area within the Basque scientific community. To this end, we describe the methodology we followed to collect tweets posted during the quintessential exhibition race in support of the Basque language. We also present the results of the analysis of these tweets. Our analysis shows that the most eventful moments lead to spikes in tweeting activity, producing more tweets. Furthermore, we emphasize the importance of having an official account for the event in question, which helps improve the visibility of the event in the social network as well as the dissemination of information to the Basque community. Along with the official account, journalists and news organisations play a crucial role in the diffusion of information.
The spread of false rumours during emergencies can jeopardise the well-being of citizens as they are monitoring the stream of news from social media to stay abreast of the latest updates. In this paper, we describe the methodology we have developed within the PHEME project for the collection and sampling of conversational threads, as well as the tool we have developed to facilitate the annotation of these threads so as to identify rumourous ones. We describe the annotation task conducted on threads collected during the 2014 Ferguson unrest and we present and analyse our findings. Our results show that we can collect effectively social media rumours and identify multiple rumours associated with a range of stories that would have been hard to identify by relying on existing techniques that need manual input of rumour-specific keywords.
Social spam produces a great amount of noise on social media services such as Twitter, which reduces the signal-to-noise ratio that both end users and data mining applications observe. Existing techniques on social spam detection have focused primarily on the identification of spam accounts by using extensive historical and network-based data. In this paper we focus on the detection of spam tweets, which optimises the amount of data that needs to be gathered by relying only on tweet-inherent features. This enables the application of the spam detection system to a large set of tweets in a timely fashion, potentially applicable in a real-time or near real-time setting. Using two large hand-labelled datasets of tweets containing spam, we study the suitability of five classification algorithms and four different feature sets to the social spam detection task. Our results show that, by using the limited set of features readily available in a tweet, we can achieve encouraging results which are competitive when compared against existing spammer detection systems that make use of additional, costly user features. Our study is the first that attempts at generalising conclusions on the optimal classifiers and sets of features for social spam detection over different datasets.
Social media users give rise to social trends as they share about common interests, which can be triggered by different reasons. In this work, we explore the types of triggers that spark trends on Twitter, introducing a typology with following four types: 'news', 'ongoing events', 'memes', and 'commemoratives'. While previous research has analyzed trending topics in a long term, we look at the earliest tweets that produce a trend, with the aim of categorizing trends early on. This would allow to provide a filtered subset of trends to end users. We analyze and experiment with a set of straightforward language-independent features based on the social spread of trends to categorize them into the introduced typology. Our method provides an efficient way to accurately categorize trending topics without need of external data, enabling news organizations to discover breaking news in real-time, or to quickly identify viral memes that might enrich marketing decisions, among others. The analysis of social features also reveals patterns associated with each type of trend, such as tweets about ongoing events being shorter as many were likely sent from mobile devices, or memes having more retweets originating from a few trend-setters.
While Twitter provides an unprecedented opportunity to learn about breaking news and current events as they happen, it often produces skepticism among users as not all the information is accurate but also hoaxes are sometimes spread. While avoiding the diffusion of hoaxes is a major concern during fast-paced events such as natural disasters, the study of how users trust and verify information from tweets in these contexts has received little attention so far. We survey users on credibility perceptions regarding witness pictures posted on Twitter related to Hurricane Sandy. By examining credibility perceptions on features suggested for information verification in the field of Epistemology, we evaluate their accuracy in determining whether pictures were real or fake compared to professional evaluations performed by experts. Our study unveils insight about tweet presentation, as well as features that users should look at when assessing the veracity of tweets in the context of fast-paced events. Some of our main findings include that while author details not readily available on Twitter feeds should be emphasized in order to facilitate verification of tweets, showing multiple tweets corroborating a fact misleads users to trusting what actually is a hoax. We contrast some of the behavioral patterns found on tweets with literature in Psychology research.
Feb 21 2013 cs.IR
Since very recently, users on the social bookmarking service Delicious can stack web pages in addition to tagging them. Stacking enables users to group web pages around specific themes with the aim of recommending to others. However, users still stack a small subset of what they tag, and thus many web pages remain unstacked. This paper presents early research towards automatically clustering web pages from tags to find stacks and extend recommendations.
In our daily lives, organizing resources into a set of categories is a common task. Categorization becomes more useful as the collection of resources increases. Large collections of books, movies, and web pages, for instance, are cataloged in libraries, organized in databases and classified in directories, respectively. However, the usual largeness of these collections requires a vast endeavor and an outrageous expense to organize manually. Recent research is moving towards developing automated classifiers that reduce the increasing costs and effort of the task. Little work has been done analyzing the appropriateness of and exploring how to harness the annotations provided by users on social tagging systems as a data source. Users on these systems save resources as bookmarks in a social environment by attaching annotations in the form of tags. It has been shown that these tags facilitate retrieval of resources not only for the annotators themselves but also for the whole community. Likewise, these tags provide meaningful metadata that refers to the content of the resources. In this thesis, we deal with the utilization of these user-provided tags in search of the most accurate classification of resources as compared to expert-driven categorizations. To the best of our knowledge, this is the first research work performing actual classification experiments utilizing social tags. By exploring the characteristics and nature of these systems and the underlying folksonomies, this thesis sheds new light on the way of getting the most out of social tags for the sake of automated resource classification tasks. Therefore, we believe that the contributions in this work are of utmost interest for future researchers in the field, as well as for the scientific community in order to better understand these systems and further utilize the knowledge garnered from social tags.
This paper explores the real-time summarization of scheduled events such as soccer games from torrential flows of Twitter streams. We propose and evaluate an approach that substantially shrinks the stream of tweets in real-time, and consists of two steps: (i) sub-event detection, which determines if something new has occurred, and (ii) tweet selection, which picks a representative tweet to describe each sub-event. We compare the summaries generated in three languages for all the soccer games in "Copa America 2011" to reference live reports offered by Yahoo! Sports journalists. We show that simple text analysis methods which do not involve external knowledge lead to summaries that cover 84% of the sub-events on average, and 100% of key types of sub-events (such as goals in soccer). Our approach should be straightforwardly applicable to other kinds of scheduled events such as other sports, award ceremonies, keynote talks, TV shows, etc.
Social tagging has become an interesting approach to improve search and navigation over the actual Web, since it aggregates the tags added by different users to the same resource in a collaborative way. This way, it results in a list of weighted tags describing its resource. Combined to a classical taxonomic classification system such as that by Wikipedia, social tags can enhance document navigation and search. On the one hand, social tags suggest alternative navigation ways, including pivot-browsing, popularity-driven navigation, and filtering. On the other hand, it provides new metadata, sometimes uncovered by documents' content, that can substantially improve document search. In this work, the inclusion of an interface to add user-defined tags describing Wikipedia articles is proposed, as a way to improve article navigation and retrieval. As a result, a prototype on applying tags over Wikipedia is proposed in order to evaluate its effectiveness.
Recent research has shown the usefulness of social tags as a data source to feed resource classification. Little is known about the effect of settings on folksonomies created on social tagging systems. In this work, we consider the settings of social tagging systems to further understand tag distributions in folksonomies. We analyze in depth the tag distributions on three large-scale social tagging datasets, and analyze the effect on a resource classification task. To this end, we study the appropriateness of applying weighting schemes based on the well-known TF-IDF for resource classification. We show the great importance of settings as to altering tag distributions. Among those settings, tag suggestions produce very different folksonomies, which condition the success of the employed weighting schemes. Our findings and analyses are relevant for researchers studying tag-based resource classification, user behavior in social networks, the structure of folksonomies and tag distributions, as well as for developers of social tagging systems in search of an appropriate setting.