We study the problem of clustering sequences of unlabeled point sets taken from a common metric space. Such scenarios arise naturally in applications where a system or process is observed in distinct time intervals, such as biological surveys and contagious disease surveillance. In this more general setting existing algorithms for classical (i.e.~static) clustering problems are not applicable anymore. We propose a set of optimization problems which we collectively refer to as 'temporal clustering'. The quality of a solution to a temporal clustering instance can be quantified using three parameters: the number of clusters $k$, the spatial clustering cost $r$, and the maximum cluster displacement $\delta$ between consecutive time steps. We consider spatial clustering costs which generalize the well-studied $k$-center, discrete $k$-median, and discrete $k$-means objectives of classical clustering problems. We develop new algorithms that achieve trade-offs between the three objectives $k$, $r$, and $\delta$. Our upper bounds are complemented by inapproximability results.
Submitted 20 Apr 2017 to Data Structures and Algorithms
Published 21 Apr 2017
Author comments: 27 pages, 10 figureshttp://arxiv.org/abs/1704.05964http://arxiv.org/pdf/1704.05964.pdf