# Quantitative Biology (q-bio)

• We construct genomic predictors for heritable and extremely complex human quantitative traits (height, heel bone density, and educational attainment) using modern methods in high dimensional statistics (i.e., machine learning). Replication tests show that these predictors capture, respectively, $\sim$40, 20, and 9 percent of total variance for the three traits. For example, predicted heights correlate $\sim$0.65 with actual height; actual heights of most individuals in validation samples are within a few cm of the prediction. The variance captured for height is comparable to the estimated SNP heritability from GCTA (GREML) analysis, and seems to be close to its asymptotic value (i.e., as sample size goes to infinity), suggesting that we have captured most of the heritability for the SNPs used. Thus, our results resolve the common SNP portion of the "missing heritability" problem -- i.e., the gap between prediction R-squared and SNP heritability. The $\sim$20k activated SNPs in our height predictor reveal the genetic architecture of human height, at least for common SNPs. Our primary dataset is the UK Biobank cohort, comprised of almost 500k individual genotypes with multiple phenotypes. We also use other datasets and SNPs found in earlier GWAS for out-of-sample validation of our results.
• This work consists of an epidemic model with vaccination coupled with an opinion dynamics. Our objective was to study how disease risk perception can influence opinions about vaccination and therefore the spreading of the disease. Differently from previous works we have considered continuous opinions. The epidemic spreading is governed by a SIS-like model with an extra vaccinated state. In our model individuals vaccinate with a probability proportional to their opinions. The opinions change due to peer influence in pairwise interactions. The epidemic feedback to the opinion dynamics acts as an external field increasing the vaccination probability. We performed Monte Carlo simulations in fully-connected populations. Interestingly we observed the emergence of a first-order phase transition, besides the usual active-absorbing phase transition presented in the SIS model. Our simulations also show that with a certain combination of parameters, an increment in the initial fraction of the population that is pro-vaccine has a twofold effect: it can lead to smaller epidemic outbreaks in the short term, but it also contributes to the survival of the chain of infections in the long term. Our results also suggest that it is possible that more effective vaccines can decrease the long-term vaccine coverage. This is a counterintuitive outcome, but it is in line with empirical observations that vaccines can become a victim of their own success.
• Directed fibroblast migration is central to highly proliferative processes in regenerative medicine and developmental biology, such as wound healing and embryogenesis. However, the mechanisms by which single fibroblasts affect each other's directional decisions, while chemotaxing in microscopic tissue pores, are not well understood. Therefore, we explored the effects of two types of relevant social interactions on fibroblast PDGF-BB-induced migration in microfluidic tissue-mimicking mazes: cell sequence and mitosis. Surprisingly, it was found that in both cases, the cells display behavior that is contradictory to the chemoattractant gradient established in the maze. In case of the sequence, the cells do not like to take the same path through the maze as their predecessor, when faced with a bifurcation. To the contrary, they tend to alternate - if a leading cell takes the shorter (steeper gradient) path, the cell following it chooses the longer (weaker gradient) path, and vice versa. Additionally, we found that when a mother cell divides, its two daughters go in opposite directions (even if it means migrating against the chemoattractant gradient and overcoming on-going cell traffic). Therefore, it is apparent that fibroblasts modify each other's directional decisions in a manner that is counter-intuitive to what is expected from classical chemotaxis theory. Consequently, accounting for these effects could lead to a better understanding of tissue generation in vivo, and result in more advanced engineered tissue products in vitro.
• The Spliced Alignment Problem (SAP) that consists in finding an optimal semi-global alignment of a spliced RNA sequence on an unspliced genomic sequence has been largely considered for the prediction and the annotation of gene structures in genomes. Here, we re-visit it for the purpose of identifying CDS ortholog groups within a set of CDS from homologous genes and for computing multiple CDS alignments. We introduce a new constrained version of the spliced alignment problem together with an algorithm that exploits full information on the exon-intron structure of the input RNA and gene sequences in order to compute high-coverage accurate alignments. We show how pairwise spliced alignments between the CDS and the gene sequences of a gene family can be directly used in order to clusterize the set of CDS of the gene family into a set of ortholog groups. We also introduce an extension of the spliced alignment problem called Multiple Spliced Alignment Problem (MSAP) that consists in aligning simultaneously several RNA sequences on several genes from the same gene family. We develop a heuristic algorithmic solution for the problem. We show how to exploit multiple spliced alignments for the clustering of homologous CDS into ortholog and close paralog groups, and for the construction of multiple CDS alignments. An implementation of the method in Python is available on demande to SFA@USherbrooke.ca. Keywords: Spliced alignment, CDS ortholog groups, Multiple CDS alignment, Gene structure, Gene family.
• So far, fingerprinting studies have focused on identifying features from single-modality MRI data, which capture individual characteristics in terms of brain structure, function, or white matter microstructure. However, due to the lack of a framework for comparing across multiple modalities, studies based on multi-modal data remain elusive. This paper presents a multi-modal analysis of genetically-related subjects to compare and contrast the information provided by various MRI modalities. The proposed framework represents MRI scans as bags of SIFT features, and uses these features in a nearest-neighbor graph to measure subject similarity. Experiments using the T1/T2-weighted MRI and diffusion MRI data of 861 Human Connectome Project subjects demonstrate strong links between the proposed similarity measure and genetic proximity.
• The extraction of fibers from dMRI data typically produces a large number of fibers, it is common to group fibers into bundles. To this end, many specialized distance measures, such as MCP, have been used for fiber similarity. However, these distance based approaches require point-wise correspondence and focus only on the geometry of the fibers. Recent publications have highlighted that using microstructure measures along fibers improves tractography analysis. Also, many neurodegenerative diseases impacting white matter require the study of microstructure measures as well as the white matter geometry. Motivated by these, we propose to use a novel computational model for fibers, called functional varifolds, characterized by a metric that considers both the geometry and microstructure measure (e.g. GFA) along the fiber pathway. We use it to cluster fibers with a dictionary learning and sparse coding-based framework, and present a preliminary analysis using HCP data.
• This paper explores the discrete Dynamic Causal Modeling (DDCM) and its relationship with Directed Information (DI). We prove the conditional equivalence between DDCM and DI in characterizing the causal relationship between two brain regions. The theoretical results are demonstrated using fMRI data obtained under both resting state and stimulus based state. Our numerical analysis is consistent with that reported in previous study.
• We present the concept of fiber-flux density for locally quantifying white matter (WM) fiber bundles. By combining scalar diffusivity measures (e.g., fractional anisotropy) with fiber-flux measurements, we define new local descriptors called Fiber-Flux Diffusion Density (FFDD) vectors. Applying each descriptor throughout fiber bundles allows along-tract coupling of a specific diffusion measure with geometrical properties, such as fiber orientation and coherence. A key step in the proposed framework is the construction of an FFDD dissimilarity measure for sub-voxel alignment of fiber bundles, based on the fast marching method (FMM). The obtained aligned WM tract-profiles enable meaningful inter-subject comparisons and group-wise statistical analysis. We demonstrate our method using two different datasets of contact sports players. Along-tract pairwise comparison as well as group-wise analysis, with respect to non-player healthy controls, reveal significant and spatially-consistent FFDD anomalies. Comparing our method with along-tract FA analysis shows improved sensitivity to subtle structural anomalies in football players over standard FA measurements.
• Motivation: Protein secondary structure prediction can provide important information for protein 3D structure prediction and protein functions. Deep learning, which has been successfully applied to various research fields such as image classification and voice recognition, provides a new opportunity to significantly improve the secondary structure prediction accuracy. Although several deep-learning methods have been developed for secondary structure prediction, there is room for improvement. MUFold-SS was developed to address these issues. Results: Here, a very deep neural network, the deep inception-inside-inception networks (Deep3I), is proposed for protein secondary structure prediction and a software tool was implemented using this network. This network takes two inputs: a protein sequence and a profile generated by PSI-BLAST. The output is the predicted eight states (Q8) or three states (Q3) of secondary structures. The proposed Deep3I not only achieves the state-of-the-art performance but also runs faster than other tools. Deep3I achieves Q3 82.8% and Q8 71.1% accuracies on the CB513 benchmark.

