# Genomics (q-bio.GN)

• DNA rearrangement processes recombine gene segments that are organized on the chromosome in a variety of ways. The segments can overlap, interleave or one may be a subsegment of another. We use directed graphs to represent segment organizations on a given locus where contigs containing rearranged segments represent vertices and the edges correspond to the segment relationships. Using graph properties we associate a point in a higher dimensional Euclidean space to each graph such that cluster formations and analysis can be performed with methods from topological data analysis. The method is applied to a recently sequenced model organism \textitOxytricha trifallax, a species of ciliate with highly scrambled genome that undergoes massive rearrangement process after conjugation. The analysis shows some emerging star-like graph structures indicating that segments of a single gene can interleave, or even contain all of the segments from fifteen or more other genes in between its segments. We also observe that as many as six genes can have their segments mutually interleaving or overlapping.
• Commonly in biomedical research, studies collect data in which an outcome measure contains informative excess zeros; for example when observing the burden of neuritic plaques in brain pathology studies, those who show none contribute to our understanding of neurodegenerative disease. The outcome may be characterized by a mixture distribution with one component being the structural zero' and the other component being a Poisson distribution. We propose a novel variance components score test of genetic association between a set of genetic markers and a zero-inflated count outcome from a mixture distribution. This test shares advantageous properties with SNP-set tests which have been previously devised for standard continuous or binary outcomes, such as the Sequence Kernel Association Test (SKAT). In particular, our method has superior statistical power compared to competing methods, especially when there is correlation within the group of markers, and when the SNPs are associated with both the mixing proportion and the rate of the Poisson distribution. We apply the method to Alzheimer's data from the Rush University Religious Orders Study and Memory and Aging Project (ROSMAP), where as proof of principle we find highly significant associations with the APOE gene, in both the structural zero' and `count' parameters, when applied to a zero-inflated neuritic plaques count outcome.