-
Improved Learning via k-DTW: A Novel Dissimilarity Measure for Curves
Authors:
Amer Krivošija,
Alexander Munteanu,
André Nusser,
Chris Schwiegelshohn
Abstract:
This paper introduces $k$-Dynamic Time Warping ($k$-DTW), a novel dissimilarity measure for polygonal curves. $k$-DTW has stronger metric properties than Dynamic Time Warping (DTW) and is more robust to outliers than the Fréchet distance, which are the two gold standards of dissimilarity measures for polygonal curves. We show interesting properties of $k$-DTW and give an exact algorithm as well as…
▽ More
This paper introduces $k$-Dynamic Time Warping ($k$-DTW), a novel dissimilarity measure for polygonal curves. $k$-DTW has stronger metric properties than Dynamic Time Warping (DTW) and is more robust to outliers than the Fréchet distance, which are the two gold standards of dissimilarity measures for polygonal curves. We show interesting properties of $k$-DTW and give an exact algorithm as well as a $(1+\varepsilon)$-approximation algorithm for $k$-DTW by a parametric search for the $k$-th largest matched distance. We prove the first dimension-free learning bounds for curves and further learning theoretic results. $k$-DTW not only admits smaller sample size than DTW for the problem of learning the median of curves, where some factors depending on the curves' complexity $m$ are replaced by $k$, but we also show a surprising separation on the associated Rademacher and Gaussian complexities: $k$-DTW admits strictly smaller bounds than DTW, by a factor $\tildeΩ(\sqrt{m})$ when $k\ll m$. We complement our theoretical findings with an experimental illustration of the benefits of using $k$-DTW for clustering and nearest neighbor classification.
△ Less
Submitted 29 May, 2025;
originally announced May 2025.
-
Scalable Learning of Item Response Theory Models
Authors:
Susanne Frick,
Amer Krivošija,
Alexander Munteanu
Abstract:
Item Response Theory (IRT) models aim to assess latent abilities of $n$ examinees along with latent difficulty characteristics of $m$ test items from categorical data that indicates the quality of their corresponding answers. Classical psychometric assessments are based on a relatively small number of examinees and items, say a class of $200$ students solving an exam comprising $10$ problems. More…
▽ More
Item Response Theory (IRT) models aim to assess latent abilities of $n$ examinees along with latent difficulty characteristics of $m$ test items from categorical data that indicates the quality of their corresponding answers. Classical psychometric assessments are based on a relatively small number of examinees and items, say a class of $200$ students solving an exam comprising $10$ problems. More recent global large scale assessments such as PISA, or internet studies, may lead to significantly increased numbers of participants. Additionally, in the context of Machine Learning where algorithms take the role of examinees and data analysis problems take the role of items, both $n$ and $m$ may become very large, challenging the efficiency and scalability of computations. To learn the latent variables in IRT models from large data, we leverage the similarity of these models to logistic regression, which can be approximated accurately using small weighted subsets called coresets. We develop coresets for their use in alternating IRT training algorithms, facilitating scalable learning from large data.
△ Less
Submitted 15 August, 2024; v1 submitted 1 March, 2024;
originally announced March 2024.
-
Computing the Fréchet distance of trees and graphs of bounded tree width
Authors:
Maike Buchin,
Amer Krivošija,
Alexander Neuhaus
Abstract:
We give algorithms to compute the Fréchet distance of trees and graphs with bounded tree width. Our algorithms run in $O(n^2)$ time for trees of bounded degree, and $O(n^2\sqrt{n \log n})$ time for trees of arbitrary degree. For graphs of bounded tree width we show one can compute the Fréchet distance in FPT (fixed parameter tractable) time.
We give algorithms to compute the Fréchet distance of trees and graphs with bounded tree width. Our algorithms run in $O(n^2)$ time for trees of bounded degree, and $O(n^2\sqrt{n \log n})$ time for trees of arbitrary degree. For graphs of bounded tree width we show one can compute the Fréchet distance in FPT (fixed parameter tractable) time.
△ Less
Submitted 28 January, 2020;
originally announced January 2020.
-
On the complexity of the middle curve problem
Authors:
Maike Buchin,
Nicole Funk,
Amer Krivošija
Abstract:
For a set of curves, Ahn et al. introduced the notion of a middle curve and gave algorithms computing these with run time exponential in the number of curves. Here we study the computational complexity of this problem: we show that it is NP-complete and give approximation algorithms.
For a set of curves, Ahn et al. introduced the notion of a middle curve and gave algorithms computing these with run time exponential in the number of curves. Here we study the computational complexity of this problem: we show that it is NP-complete and give approximation algorithms.
△ Less
Submitted 28 January, 2020;
originally announced January 2020.
-
Probabilistic smallest enclosing ball in high dimensions via subgradient sampling
Authors:
Amer Krivošija,
Alexander Munteanu
Abstract:
We study a variant of the median problem for a collection of point sets in high dimensions. This generalizes the geometric median as well as the (probabilistic) smallest enclosing ball (pSEB) problems. Our main objective and motivation is to improve the previously best algorithm for the pSEB problem by reducing its exponential dependence on the dimension to linear. This is achieved via a novel com…
▽ More
We study a variant of the median problem for a collection of point sets in high dimensions. This generalizes the geometric median as well as the (probabilistic) smallest enclosing ball (pSEB) problems. Our main objective and motivation is to improve the previously best algorithm for the pSEB problem by reducing its exponential dependence on the dimension to linear. This is achieved via a novel combination of sampling techniques for clustering problems in metric spaces with the framework of stochastic subgradient descent. As a result, the algorithm becomes applicable to shape fitting problems in Hilbert spaces of unbounded dimension via kernel functions. We present an exemplary application by extending the support vector data description (SVDD) shape fitting method to the probabilistic case. This is done by simulating the pSEB algorithm implicitly in the feature space induced by the kernel function.
△ Less
Submitted 28 February, 2019;
originally announced February 2019.
-
Probabilistic embeddings of the Fréchet distance
Authors:
Anne Driemel,
Amer Krivošija
Abstract:
The Fréchet distance is a popular distance measure for curves which naturally lends itself to fundamental computational tasks, such as clustering, nearest-neighbor searching, and spherical range searching in the corresponding metric space. However, its inherent complexity poses considerable computational challenges in practice. To address this problem we study distortion of the probabilistic embed…
▽ More
The Fréchet distance is a popular distance measure for curves which naturally lends itself to fundamental computational tasks, such as clustering, nearest-neighbor searching, and spherical range searching in the corresponding metric space. However, its inherent complexity poses considerable computational challenges in practice. To address this problem we study distortion of the probabilistic embedding that results from projecting the curves to a randomly chosen line. Such an embedding could be used in combination with, e.g. locality-sensitive hashing. We show that in the worst case and under reasonable assumptions, the discrete Fréchet distance between two polygonal curves of complexity $t$ in $\mathbb{R}^d$, where $d\in\lbrace 2,3,4,5\rbrace$, degrades by a factor linear in $t$ with constant probability. We show upper and lower bounds on the distortion. We also evaluate our findings empirically on a benchmark data set. The preliminary experimental results stand in stark contrast with our lower bounds. They indicate that highly distorted projections happen very rarely in practice, and only for strongly conditioned input curves.
Keywords: Fréchet distance, metric embeddings, random projections
△ Less
Submitted 6 August, 2018;
originally announced August 2018.
-
Clustering time series under the Fréchet distance
Authors:
Anne Driemel,
Amer Krivošija,
Christian Sohler
Abstract:
The Fréchet distance is a popular distance measure for curves. We study the problem of clustering time series under the Fréchet distance. In particular, we give $(1+\varepsilon)$-approximation algorithms for variations of the following problem with parameters $k$ and $\ell$. Given $n$ univariate time series $P$, each of complexity at most $m$, we find $k$ time series, not necessarily from $P$, whi…
▽ More
The Fréchet distance is a popular distance measure for curves. We study the problem of clustering time series under the Fréchet distance. In particular, we give $(1+\varepsilon)$-approximation algorithms for variations of the following problem with parameters $k$ and $\ell$. Given $n$ univariate time series $P$, each of complexity at most $m$, we find $k$ time series, not necessarily from $P$, which we call \emph{cluster centers} and which each have complexity at most $\ell$, such that (a) the maximum distance of an element of $P$ to its nearest cluster center or (b) the sum of these distances is minimized. Our algorithms have running time near-linear in the input size for constant $\varepsilon$, $k$ and $\ell$. To the best of our knowledge, our algorithms are the first clustering algorithms for the Fréchet distance which achieve an approximation factor of $(1+\varepsilon)$ or better.
Keywords: time series, longitudinal data, functional data, clustering, Fréchet distance, dynamic time warping, approximation algorithms.
△ Less
Submitted 14 December, 2015;
originally announced December 2015.