Aug 01 2017 cs.DS
We study hierarchical clusterings of metric spaces that change over time. This is a natural geometric primitive for the analysis of dynamic data sets. Specifically, we introduce and study the problem of finding a temporally coherent sequence of hierarchical clusterings from a sequence of unlabeled point sets. We encode the clustering objective by embedding each point set into an ultrametric space, which naturally induces a hierarchical clustering of the set of points. We enforce temporal coherence among the embeddings by finding correspondences between successive pairs of ultrametric spaces which exhibit small distortion in the Gromov-Hausdorff sense. We present both upper and lower bounds on the approximability of the resulting optimization problems.
Injuries have a great impact on professional soccer, due to their large influence on team performance and the considerable costs of rehabilitation for players. Existing studies in the literature provide just a preliminary understanding of which factors mostly affect injury risk, while an evaluation of the potential of statistical models in forecasting injuries is still missing. In this paper, we propose a multidimensional approach to injury prediction in professional soccer which is based on GPS measurements and machine learning. By using GPS tracking technology, we collect data describing the training workload of players in a professional soccer club during a season. We show that our injury predictors are both accurate and interpretable by providing a set of case studies of interest to soccer practitioners. Our approach opens a novel perspective on injury prevention, providing a set of simple and practical rules for evaluating and interpreting the complex relations between injury risk and training performance in professional soccer.
May 23 2017 cs.DM
The Maximum Balanced Biclique Problem (MBBP) is a prominent model with numerous applications. Yet, the problem is NP-hard and thus computationally challenging. We propose novel ideas for designing effective exact algorithms for MBBP. Firstly, we introduce an Upper Bound Propagation procedure to pre-compute an upper bound involving each vertex. Then we extend an existing branch-and-bound algorithm by integrating the pre-computed upper bounds. We also present a set of new valid inequalities induced from the upper bounds to tighten an existing mathematical formulation for MBBP. Lastly, we investigate another exact algorithm scheme which enumerates a subset of balanced bicliques based on our upper bounds. Experiments show that compared to existing approaches, the proposed algorithms and formulations are more efficient in solving a set of random graphs and large real-life instances.
Apr 21 2017 cs.DS
We study the problem of clustering sequences of unlabeled point sets taken from a common metric space. Such scenarios arise naturally in applications where a system or process is observed in distinct time intervals, such as biological surveys and contagious disease surveillance. In this more general setting existing algorithms for classical (i.e.~static) clustering problems are not applicable anymore. We propose a set of optimization problems which we collectively refer to as 'temporal clustering'. The quality of a solution to a temporal clustering instance can be quantified using three parameters: the number of clusters $k$, the spatial clustering cost $r$, and the maximum cluster displacement $\delta$ between consecutive time steps. We consider spatial clustering costs which generalize the well-studied $k$-center, discrete $k$-median, and discrete $k$-means objectives of classical clustering problems. We develop new algorithms that achieve trade-offs between the three objectives $k$, $r$, and $\delta$. Our upper bounds are complemented by inapproximability results.
This report provides an introduction to some Machine Learning tools within the most common development environments. It mainly focuses on practical problems, skipping any theoretical introduction. It is oriented to both students trying to approach Machine Learning and experts looking for new frameworks.
Jan 11 2017 cs.LG
In this document we shows a first implementation and some preliminary results of a new theory, facing Machine Learning problems in the frameworks of Classical Mechanics and Variational Calculus. We give a general formulation of the problem and then we studies basic behaviors of the model on simple practical implementations.
The lack of open-source tools for hyperspectral data visualization and analysiscreates a demand for new tools. In this paper we present the new PlanetServer,a set of tools comprising a web Geographic Information System (GIS) and arecently developed Python Application Programming Interface (API) capableof visualizing and analyzing a wide variety of hyperspectral data from differentplanetary bodies. Current WebGIS open-source tools are evaluated in orderto give an overview and contextualize how PlanetServer can help in this mat-ters. The web client is thoroughly described as well as the datasets availablein PlanetServer. Also, the Python API is described and exposed the reason ofits development. Two different examples of mineral characterization of differenthydrosilicates such as chlorites, prehnites and kaolinites in the Nili Fossae areaon Mars are presented. As the obtained results show positive outcome in hyper-spectral analysis and visualization compared to previous literature, we suggestusing the PlanetServer approach for such investigations.
Jan 05 2017 cs.LG
We analyze a new approach to Machine Learning coming from a modification of classical regularization networks by casting the process in the time dimension, leading to a sort of collapse of dimensionality in the problem of learning the model parameters. This approach allows the definition of a online learning algorithm that progressively accumulates the knowledge provided in the input trajectory. The regularization principle leads to a solution based on a dynamical system that is paired with a procedure to develop a graph structure that stores the input regularities acquired from the temporal evolution. We report an extensive experimental exploration on the behavior of the parameter of the proposed model and an evaluation on artificial dataset.
In image registration, a proper transformation should be topology preserving. Especially for landmark-based image registration, if the displacement of one landmark is larger enough than those of neighbourhood landmarks, topology violation will be occurred. This paper aim to analyse the topology preservation of some Radial Basis Functions (RBFs) which are used to model deformations in image registration. Matérn functions are quite common in the statistic literature (see, e.g. \citeMatern86,Stein99). In this paper, we use them to solve the landmark-based image registration problem. We present the topology preservation properties of RBFs in one landmark and four landmarks model respectively. Numerical results of three kinds of Matérn transformations are compared with results of Gaussian, Wendland's, and Wu's functions.
Apr 04 2014 cs.DS
A popular graph clustering method is to consider the embedding of an input graph into R^k induced by the first k eigenvectors of its Laplacian, and to partition the graph via geometric manipulations on the resulting metric space. Despite the practical success of this methodology, there is limited understanding of several heuristics that follow this framework. We provide theoretical justification for one such natural and computationally efficient variant. Our result can be summarized as follows. A partition of a graph is called strong if each cluster has small external conductance, and large internal conductance. We present a simple greedy spectral clustering algorithm which returns a partition that is provably close to a suitably strong partition, provided that such a partition exists. A recent result shows that strong partitions exist for graphs with a sufficiently large spectral gap between the k-th and (k+1)-th eigenvalues. Taking this together with our main theorem gives a spectral algorithm which finds a partition close to a strong one for graphs with large enough spectral gap. We also show how this simple greedy algorithm can be implemented in near-linear time for any fixed k and error guarantee. Finally, we evaluate our algorithm on some real-world and synthetic inputs.
Considerable efforts have been made in recent years to produce detailed topologies of the Internet. Although Internet topology data have been brought to the attention of a wide and somewhat diverse audience of scholars, so far they have been overlooked by economists. In this paper, we suggest that such data could be effectively treated as a proxy to characterize the size of the "digital economy" at country level and outsourcing: thus, we analyse the topological structure of the network of trade in digital services (trade in bits) and compare it with that of the more traditional flow of manufactured goods across countries. To perform meaningful comparisons across networks with different characteristics, we define a stochastic benchmark for the number of connections among each country-pair, based on hypergeometric distribution. Original data are thus filtered by means of different thresholds, so that we only focus on the strongest links, i.e., statistically significant links. We find that trade in bits displays a sparser and less hierarchical network structure, which is more similar to trade in high-skill manufactured goods than total trade. Lastly, distance plays a more prominent role in shaping the network of international trade in physical goods than trade in digital services.