Genealogical networks, also known as family trees or population pedigrees, are commonly studied by genealogists wanting to know about their ancestry, but they also provide a valuable resource for disciplines such as digital demography, genetics, and computational social science. These networks are typically constructed by hand through a very time-consuming process, which requires comparing large numbers of historical records manually. We develop computational methods for automatically inferring large-scale genealogical networks. A comparison with human-constructed networks attests to the accuracy of the proposed methods. To demonstrate the applicability of the inferred large-scale genealogical networks, we present a longitudinal analysis on the mating patterns observed in a network. This analysis shows a consistent tendency of people choosing a spouse with a similar socioeconomic status, a phenomenon known as assortative mating. Interestingly, we do not observe this tendency to consistently decrease (nor increase) over our study period of 150 years.
Gene gains and losses have shaped the gene repertoire of species since the universal last common ancestor to species today. Genes in extant species were gained at different historical times via de novo creation of new genes, duplication of existing genes or transfer from genes of another species (HGT), and get lost gradually. With the increasing number of sequenced genomes, some comparative analyses have been done to quantify the evolutionary history of gene gains and losses in restricted lineages like vertebrates, insects, fungi, plants and so on. Here, we have constructed and analyzed over 10,000 gene family trees to reconstruct the gene content of ancestral genomes at an unprecedented scale, covering hundreds of genomes across all domains of life. This is the most comprehensive genome-wide analysis of all events in gene evolutionary histories. We find that our results are largely consistent with earlier, less complete comparative studies on specific lineages such as the vertebrates, but find significant differences especially in recent evolutionary histories. We find that the rate of gene gain varies widely among branches of the species tree, and find that some periods of rapid gene duplication are associated with great extinctions in geological history.
Emerging evidence shows that cognitive deficits in Alzheimer disease (AD) are associated with disruptions in brain functional connectivity. Thus, the identification of alterations in AD functional networks has become a topic of increasing interest. However, to what extent AD induces disruption of the balance of local and global information processing in the human brain remains elusive. The main objective of this study is to explore the dynamic topological changes of AD networks in terms of brain network segregation and integration. We used electroencephalography (EEG) data recorded from 20 participants (10 AD patients and 10 healthy controls) during resting state. Functional brain networks were reconstructed using EEG source connectivity computed in different frequency bands. Graph theoretical analyses were performed assess differences between both groups. Results revealed that AD networks, compared to networks of age matched healthy controls, are characterized by lower global information processing (integration) and higher local information processing (segregation). Results showed also significant correlation between the alterations in the AD patients functional brain networks and their cognitive scores. These findings may contribute to the development of EEG network-based test that could strengthen results obtained from currently used neurophysiological tests in neurodegenerative diseases.
The Gene Ontology aims to define the universe of functions known for gene products, at the molecular, cellular and organism levels. While the ontology is designed to cover all aspects of biology in a "species independent manner", the fact remains that many if not most biological functions are restricted in their taxonomic range. This is simply because functions evolve, i.e. like other biological characteristics they are gained and lost over evolutionary time. Here we introduce a general method of representing the evolutionary gain and loss of biological functions within the Gene Ontology. We then apply a variety of techniques, including manual curation, logical reasoning over the ontology structure, and previously published "taxon constraints" to assign evolutionary gain and loss events to the majority of terms in the GO. These gain and loss events now almost triple the number of terms with taxon constraints, and currently cover a total of 76% of GO terms, including 40% of molecular function terms, 78% of cellular component terms, and 89% of biological process terms. Database URL: GOTaxon is freely available at https://github.com/haimingt/GOTaxonConstraint
Learning sparse linear models with two-way interactions is desirable in many application domains such as genomics. l1-regularised linear models are popular to estimate sparse models, yet standard implementations fail to address specifically the quadratic explosion of candidate two-way interactions in high dimensions, and typically do not scale to genetic data with hundreds of thousands of features. Here we present WHInter, a working set algorithm to solve large l1-regularised problems with two-way interactions for binary design matrices. The novelty of WHInter stems from a new bound to efficiently identify working sets while avoiding to scan all features, and on fast computations inspired from solutions to the maximum inner product search problem. We apply WHInter to simulated and real genetic data and show that it is more scalable and two orders of magnitude faster than the state of the art.
The ecologist H. T. Odum introduced a principle of physics, called Maximum Empower, in order to explain self-organization in a system (e.g. physical, biological, social, economical, mathematical, ...). The concept of empower relies on emergy, which is a second notion introduced by Odum for comparing energy systems on the same basis. The roots of these notions trace back to the 50's (with the work of H. T. Odum and R. C. Pinkerton) and is becoming now an important sustainability indicator in the ecologist community. In 2012, Le Corre and Truffet developed a recursive method, based on max-plus algebra, to compute emergy of a system. Recently, using this max-plus algebra approach, it has been shown that the Maximum Empower Principle can be formalized as a new combinatorial optimization problem (called the Maximum Empower Problem). In this paper we show that the Maximum Empower Problem can be solved by finding a maximum weighted clique in a cograph, which leads to an exponential-time algorithm in the worst-case. We also provide a polynomial-time algorithm when there is no cycle in the graph modeling the system. Finally, we prove that the Maximum Empower Problem is #P-hard in the general case, i.e. it is as hard as computing the permanent of a matrix.
Ring oscillators are biochemical circuits consisting of a ring of interactions capable of sustained oscillations. The non-linear interactions between genes hinder the analytical insight into their function, usually requiring computational exploration. Here we show that, despite the apparent complexity, the stability of the unique steady state in an incoherent feedback ring depends only on the degradation rates and a single parameter summarizing the feedback of the circuit. Concretely, we show that the range of regulatory parameters that yield oscillatory behaviour, is maximized when the degradation rates are equal. Strikingly, this results holds independently of the regulatory functions used or number of genes. We also derive properties of the oscillations as a function of the degradation rates and number of nodes forming the ring. Finally, we explore the role of mRNA dynamics by applying the generic results to the specific case with two naturally different degradation time scales
Here we address the challenge of profiling causal properties and tracking the transformation of chemical compounds from an algorithmic perspective. We explore the potential of applying a computational interventional calculus based on the principles of algorithmic probability to chemical structure networks. We profile the sensitivity of the elements and covalent bonds in a chemical structure network algorithmically, asking whether reprogrammability affords information about thermodynamic and chemical processes involved in the transformation of different compound classes. We arrive at numerical results suggesting a correspondence between some physical, structural and functional properties. Our methods are capable of separating chemical classes that reflect functional and natural differences without considering any information about atomic and molecular properties. We conclude that these methods, with their links to chemoinformatics via algorithmic, probability hold promise for future research.
Han Chinese experienced substantial population migrations and admixture in history, yet little is known about the evolutionary process of Chinese dialects. Here, we used phylogenetic approaches and admixture inference to explicitly decompose the underlying structure of the diversity of Chinese dialects, based on the total phoneme inventories of 140 dialect samples from seven traditional dialect groups: Mandarin, Wu, Xiang, Gan, Hakka, Min and Yue. We found a north-south gradient of phonemic differences in Chinese dialects induced from historical population migrations. We also quantified extensive horizontal language transfers among these dialects, corresponding to the complicated socio-genetic history in China. We finally identified that the middle latitude dialects of Xiang, Gan and Hakka were formed by admixture with other four dialects. Accordingly, the middle-latitude areas in China were a linguistic melting pot of northern and southern Han populations. Our study provides a detailed phylogenetic and historical context against family-tree model in China.
The concentration of biochemical oxygen demand, BOD5, was studied in order to evaluate the water quality of the Igapó I Lake, in Londrina, Paraná State, Brazil. The simulation was conducted by means of the discretization in curvilinear coordinates of the geometry of Igapó I Lake, together with finite difference and finite element methods. The evaluation of the proposed numerical model for water quality was performed by comparing the experimental values of BOD5 with the numerical results. The evaluation of the model showed quantitative results compatible with the actual behavior of Igapó I Lake in relation to the simulated parameter. The qualitative analysis of the numerical simulations provided a better understanding of the dynamics of the BOD5 concentration at Igapó I Lake, showing that such concentrations in the central regions of the lake have values above those allowed by Brazilian law. The results can help to guide choices by public officials, as: (i) improve the identification mechanisms of pollutant emitters on Lake Igapó I, (ii) contribute to the optimal treatment of the recovery of the polluted environment and (iii) provide a better quality of life for the regulars of the lake as well as for the residents living on the lakeside.
The distribution of fitness effects for mutations is often believed to be key to predicting microbial evolution. However, fitness effects alone may be insufficient to predict evolutionary dynamics if mutations produce nontrivial ecological effects which depend on the composition of the population. Here we show that variation in multiple growth traits, such as lag times and growth rates, creates higher-order effects such the relative competition between two strains is fundamentally altered by the presence of a third strain. These effects produce a range of ecological phenomena: an unlimited number of strains can coexist, potentially with a single keystone strain stabilizing the community; strains that coexist in pairs do not coexist all together; and the champion of all pairwise competitions may not dominate in a mixed community. This occurs with competition for only a single finite resource and no other interactions. Since variation in multiple growth traits is ubiquitous in microbial populations due to pleiotropy and non-genetic variation, these higher-order effects may also be widespread, especially in laboratory ecology and evolution experiments. Our results underscore the importance of considering the distribution of ecological effects from mutations in predicting microbial evolution.
On 2018-01-17 two electron crystallography structures (with PDB entries 6AXZ, 6BTK) on a prion protofibril of bank vole PrP(168-176) (a segment in the PrP $\beta$2-$\alpha$2 loop) were released into the PDB Bank. The paper published by [Nat Struct Mol Biol 25(2):131-134 (2018)] reports some polar clasps for these two crystal structures, and "an intersheet hydrogen bond between Tyr169 and the backbone carbonyl of Asn171 on an opposing strand." However, by revisiting the polar clasps, we cannot confirm this very important intersheet hydrogen bond; instead we found another hydrogen bond (Asn171:H-Gln172:OE1 between the strand of one sheet and the opposing strand of the mating sheet) to replace it.
We develop a method for reconstructing regulatory interconnection networks between variables evolving according to a linear dynamical system. The work is motivated by the problem of gene regulatory network inference, that is, finding causal effects between genes from gene expression time series data. In biological applications, the typical problem is that the sampling frequency is low, and consequentially the system identification problem is ill-posed. The low sampling frequency also makes it impossible to estimate derivatives directly from the data. We take a Bayesian approach to the problem, as it offers a natural way to incorporate prior information to deal with the ill-posedness, through the introduction of sparsity promoting prior for the underlying dynamics matrix. It also provides a framework for modelling both the process and measurement noises. We develop Markov Chain Monte Carlo samplers for the discrete-valued zero-structure of the dynamics matrix, and for the continuous-time trajectory of the system.
In this work, we improve a previous minimalistic tree-grass savanna model by taking into account water availability, in addition to fire, since both factors are known to be important for shaping savanna physiognomies along a climatic gradient. As in our previous models, we consider two nonlinear functions of grass and tree biomasses to respectively take into account grass-fire feedbacks, and the response of trees to fire of a given intensity. The novelty is that rainfall is taken into account in the tree and grass growth functions and in the biomass carrying capacities. Then, we derive a qualitative analysis of the ODE model, showing existence of equilibria, and studying their stability conditions. We also construct a two dimension bifurcation diagram based on rainfall and fire frequency. This led to summarize different scenarios for the model including multi-stabilities that are proven possible. Next, to bring more realism in the model, pulsed fire events are modelled as part of an IDE (Impulsive differential Equations) system analogous to the ODE system. Numerical simulations are provided and we discuss some important ecological outcomes that our ODE and IDE models are able to predict. Notably, the expansion of forest into tree-poor physiognomies (grassland and savanna) is systematically predicted when fire return period increases, especially in mesic and humid climatic areas.