Understanding the causes of crime is a longstanding issue in researcher's agenda. While it is a hard task to extract causality from data, several linear models have been proposed to predict crime through the existing correlations between crime and urban metrics. However, because of non-Gaussian distributions and multicollinearity in urban indicators, it is common to find controversial conclusions about the influence of some urban indicators on crime. Machine learning ensemble-based algorithms can handle well such problems. Here, we use a random forest regressor to predict crime and quantify the influence of urban indicators on homicides. Our approach can have up to $97\%$ of accuracy on crime prediction and the importance of urban indicators is ranked and clustered in groups of equal influence, which are robust under slightly changes in the data sample analyzed. Our results determine the rank of importance of urban indicators to predict crime, unveiling that unemployment and illiteracy are the most important variables for describing homicides in Brazilian cities. We further believe that our approach helps in producing more robust conclusions regarding the effects of urban indicators on crime, having potential applications for guiding public policies for crime control.
Just as natural river networks are known to be globally self-similar, recent research has shown that human-built urban networks, such as road networks, are also functionally self-similar, and have fractal topology with power-law node-degree distributions (p(k) = a k). Here we show, for the first time, that other urban infrastructure networks (sanitary and storm-water sewers), which sustain flows of critical services for urban citizens, also show scale-free functional topologies. For roads and drainage networks, we compared functional topological metrics, derived from high-resolution data (70,000 nodes) for a large US city providing services to about 900,000 citizens over an area of about 1,000 km2. For the whole city and for different sized subnets, we also examined these networks in terms of geospatial co-location (roads and sewers). Our analyses reveal functional topological homogeneity among all the subnets within the city, in spite of differences in several urban attributes. The functional topologies of all subnets of both infrastructure types resemble power-law distributions, with tails becoming increasingly power-law as the subnet area increases. Our findings hold implications for assessing the vulnerability of these critical infrastructure networks to cascading shocks based on spatial interdependency, and for improved design and maintenance of urban infrastructure networks.
Scholars have increasingly investigated "crowdsourcing" as an alternative to expert-based judgment or purely data-driven approaches to predicting the future. Under certain conditions, scholars have found that crowdsourcing can outperform these other approaches. However, despite interest in the topic and a series of successful use cases, relatively few studies have applied empirical model thinking to evaluate the accuracy and robustness of crowdsourcing in real-world contexts. In this paper, we offer three novel contributions. First, we explore a dataset of over 600,000 predictions from over 7,000 participants in a multi-year tournament to predict the decisions of the Supreme Court of the United States. Second, we develop a comprehensive crowd construction framework that allows for the formal description and application of crowdsourcing to real-world data. Third, we apply this framework to our data to construct more than 275,000 crowd models. We find that in out-of-sample historical simulations, crowdsourcing robustly outperforms the commonly-accepted null model, yielding the highest-known performance for this context at 80.8% case level accuracy. To our knowledge, this dataset and analysis represent one of the largest explorations of recurring human prediction to date, and our results provide additional empirical support for the use of crowdsourcing as a prediction method.
Online social networks have increasing influence on our society, they may play decisive roles in politics and can be crucial for the fate of companies. Such services compete with each other and some may even break down rapidly. Using social network datasets we show the main factors leading to such a dramatic collapse. At early stage mostly the loosely bound users disappear, later collective effects play the main role leading to cascading failures. We present a theory based on a generalised threshold model to explain the findings and show how the collapse time can be estimated in advance using the dynamics of the churning users. Our results shed light to possible mechanisms of instabilities in other competing social processes.
Long-range correlation, a property of time series exhibiting long-term memory, is mainly studied in the statistical physics domain and has been reported to exist in natural language. Using a state-of-the-art method for such analysis, long-range correlation is first shown to occur in long CHILDES data sets. To understand why, Bayesian generative models of language, originally proposed in the cognitive scientific domain, are investigated. Among representative models, the Simon model was found to exhibit surprisingly good long-range correlation, but not the Pitman-Yor model. Since the Simon model is known not to correctly reflect the vocabulary growth of natural language, a simple new model is devised as a conjunct of the Simon and Pitman-Yor models, such that long-range correlation holds with a correct vocabulary growth rate. The investigation overall suggests that uniform sampling is one cause of long-range correlation and could thus have a relation with actual linguistic processes.
The formation and stability of social hierarchies is a question of general relevance. Here, we propose a simple model for establishing social hierarchy via pair-wise interactions between individuals and investigate its stability. In each interaction or fight, the probability of "winning" depends solely on the relative societal status of the participants, and the winner has a gain of status whereas there is an equal loss to the loser. The interactions are characterized by two parameters. The first parameter represents how much can be lost, and the second parameter represents the degree to which even a small difference of status can guarantee a win for the higher-status individual. Depending on the parameters, the resulting status distributions reach either a continuous unimodal form or lead to a totalitarian end state with one dominant high-status individual and all other individuals having zero status. However, we find that in the latter case long-lived intermediary distributions often exist, which can give the illusion of a stable society. Moreover, by implementing a simple, but realistic rule that restricts interactions to sufficiently similar-status individuals, the stable or long-lived distributions acquire high-status structure corresponding to a dominant class. We compare our model predictions to human societies using household income as a proxy for societal status and find agreement over their entire range from the low-to-middle-status parts to the characteristic high-status "tail". We discuss how the model provides a conceptual framework for understanding the origin of social hierarchy and the factors which lead to the preservation or deterioration of the societal structure.
We consider a constrained hierarchical opinion dynamics in the case of leaders' competition and with complete information among leaders. Each leaders' group tries to drive the followers' opinion towards a desired state accordingly to a specific strategy. By using the Boltzmann-type control approach we analyze the best-reply strategy for each leaders' population. Derivation of the corresponding Fokker-Planck model permits to investigate the asymptotic behaviour of the solution. Heterogeneous followers populations are then considered where the effect of knowledge impacts the leaders' credibility and modifies the outcome of the leaders' competition.