Apr 21 2017 cs.DS
We study the problem of clustering sequences of unlabeled point sets taken from a common metric space. Such scenarios arise naturally in applications where a system or process is observed in distinct time intervals, such as biological surveys and contagious disease surveillance. In this more general setting existing algorithms for classical (i.e.~static) clustering problems are not applicable anymore. We propose a set of optimization problems which we collectively refer to as 'temporal clustering'. The quality of a solution to a temporal clustering instance can be quantified using three parameters: the number of clusters $k$, the spatial clustering cost $r$, and the maximum cluster displacement $\delta$ between consecutive time steps. We consider spatial clustering costs which generalize the well-studied $k$-center, discrete $k$-median, and discrete $k$-means objectives of classical clustering problems. We develop new algorithms that achieve trade-offs between the three objectives $k$, $r$, and $\delta$. Our upper bounds are complemented by inapproximability results.
This report provides an introduction to some Machine Learning tools within the most common development environments. It mainly focuses on practical problems, skipping any theoretical introduction. It is oriented to both students trying to approach Machine Learning and experts looking for new frameworks.
Jan 11 2017 cs.LG
In this document we shows a first implementation and some preliminary results of a new theory, facing Machine Learning problems in the frameworks of Classical Mechanics and Variational Calculus. We give a general formulation of the problem and then we studies basic behaviors of the model on simple practical implementations.
In this paper we present the new PlanetServer, a set of tools comprising a web Geographic Information System (GIS) and a recently developed Python API capable of analyzing a wide variety of hyperspectral data from different planetary bodies. The research case studies are focusing on 1) the characterization of different hydrosilicates such as chlorites, prehnites and kaolinites in the Nili Fossae area on Mars, and 2) the characterization of ice (CO 2 and H 2 O ice) in two different areas of Mars where ice was reported in a nearly pure state. Results show positive outcome in hyperspectral analysis and visualization compared to previous literature, therefore we suggest using PlanetServer for such investigations.
Jan 05 2017 cs.LG
We analyze a new approach to Machine Learning coming from a modification of classical regularization networks by casting the process in the time dimension, leading to a sort of collapse of dimensionality in the problem of learning the model parameters. This approach allows the definition of a online learning algorithm that progressively accumulates the knowledge provided in the input trajectory. The regularization principle leads to a solution based on a dynamical system that is paired with a procedure to develop a graph structure that stores the input regularities acquired from the temporal evolution. We report an extensive experimental exploration on the behavior of the parameter of the proposed model and an evaluation on artificial dataset.
In image registration, a proper transformation should be topology preserving. Especially for landmark-based image registration, if the displacement of one landmark is larger enough than those of neighbourhood landmarks, topology violation will be occurred. This paper aim to analyse the topology preservation of some Radial Basis Functions (RBFs) which are used to model deformations in image registration. Matérn functions are quite common in the statistic literature (see, e.g. \citeMatern86,Stein99). In this paper, we use them to solve the landmark-based image registration problem. We present the topology preservation properties of RBFs in one landmark and four landmarks model respectively. Numerical results of three kinds of Matérn transformations are compared with results of Gaussian, Wendland's, and Wu's functions.
Apr 04 2014 cs.DS
A popular graph clustering method is to consider the embedding of an input graph into R^k induced by the first k eigenvectors of its Laplacian, and to partition the graph via geometric manipulations on the resulting metric space. Despite the practical success of this methodology, there is limited understanding of several heuristics that follow this framework. We provide theoretical justification for one such natural and computationally efficient variant. Our result can be summarized as follows. A partition of a graph is called strong if each cluster has small external conductance, and large internal conductance. We present a simple greedy spectral clustering algorithm which returns a partition that is provably close to a suitably strong partition, provided that such a partition exists. A recent result shows that strong partitions exist for graphs with a sufficiently large spectral gap between the k-th and (k+1)-th eigenvalues. Taking this together with our main theorem gives a spectral algorithm which finds a partition close to a strong one for graphs with large enough spectral gap. We also show how this simple greedy algorithm can be implemented in near-linear time for any fixed k and error guarantee. Finally, we evaluate our algorithm on some real-world and synthetic inputs.
Considerable efforts have been made in recent years to produce detailed topologies of the Internet. Although Internet topology data have been brought to the attention of a wide and somewhat diverse audience of scholars, so far they have been overlooked by economists. In this paper, we suggest that such data could be effectively treated as a proxy to characterize the size of the "digital economy" at country level and outsourcing: thus, we analyse the topological structure of the network of trade in digital services (trade in bits) and compare it with that of the more traditional flow of manufactured goods across countries. To perform meaningful comparisons across networks with different characteristics, we define a stochastic benchmark for the number of connections among each country-pair, based on hypergeometric distribution. Original data are thus filtered by means of different thresholds, so that we only focus on the strongest links, i.e., statistically significant links. We find that trade in bits displays a sparser and less hierarchical network structure, which is more similar to trade in high-skill manufactured goods than total trade. Lastly, distance plays a more prominent role in shaping the network of international trade in physical goods than trade in digital services.