Skip to main content

Showing 1–21 of 21 results for author: Roos, T

Searching in archive cs. Search in all archives.
.
  1. arXiv:2505.17810  [pdf, ps, other

    cs.LG cs.IR

    VIBE: Vector Index Benchmark for Embeddings

    Authors: Elias Jääsaari, Ville Hyvönen, Matteo Ceccarello, Teemu Roos, Martin Aumüller

    Abstract: Approximate nearest neighbor (ANN) search is a performance-critical component of many machine learning pipelines. Rigorous benchmarking is essential for evaluating the performance of vector indexes for ANN search. However, the datasets of the existing benchmarks are no longer representative of the current applications of ANN search. Hence, there is an urgent need for an up-to-date set of benchmark… ▽ More

    Submitted 23 May, 2025; originally announced May 2025.

    Comments: 25 pages

  2. arXiv:2412.13554  [pdf, other

    cs.CY

    An XAI Social Media Platform for Teaching K-12 Students AI-Driven Profiling, Clustering, and Engagement-Based Recommending

    Authors: Nicolas Pope, Juho Kahila, Henriikka Vartiainen, Mohammed Saqr, Sonsoles Lopez-Pernas, Teemu Roos, Jari Laru, Matti Tedre

    Abstract: This paper, submitted to the special track on resources for teaching AI in K-12, presents an explainable AI (XAI) education tool designed for K-12 classrooms, particularly for students in grades 4-9. The tool was designed for interventions on the fundamental processes behind social media platforms, focusing on four AI- and data-driven core concepts: data collection, user profiling, engagement metr… ▽ More

    Submitted 18 December, 2024; originally announced December 2024.

    Comments: 9 pages, 8 figures, accepted to AAAI 25 conference

  3. arXiv:2410.18926  [pdf, other

    cs.LG

    LoRANN: Low-Rank Matrix Factorization for Approximate Nearest Neighbor Search

    Authors: Elias Jääsaari, Ville Hyvönen, Teemu Roos

    Abstract: Approximate nearest neighbor (ANN) search is a key component in many modern machine learning pipelines; recent use cases include retrieval-augmented generation (RAG) and vector databases. Clustering-based ANN algorithms, that use score computation methods based on product quantization (PQ), are often used in industrial-scale applications due to their scalability and suitability for distributed and… ▽ More

    Submitted 24 October, 2024; originally announced October 2024.

    Comments: Accepted to NeurIPS 2024

  4. arXiv:2410.11807  [pdf, other

    physics.ao-ph cs.LG

    Regional Ocean Forecasting with Hierarchical Graph Neural Networks

    Authors: Daniel Holmberg, Emanuela Clementi, Teemu Roos

    Abstract: Accurate ocean forecasting systems are vital for understanding marine dynamics, which play a crucial role in environmental management and climate adaptation strategies. Traditional numerical solvers, while effective, are computationally expensive and time-consuming. Recent advancements in machine learning have revolutionized weather forecasting, offering fast and energy-efficient alternatives. Bui… ▽ More

    Submitted 20 November, 2024; v1 submitted 15 October, 2024; originally announced October 2024.

    Comments: 28 pages, 35 figures. Accepted to the Tackling Climate Change with Machine Learning workshop at NeurIPS 2024

  5. arXiv:2408.14935  [pdf, other

    cs.LG cs.AI

    Quotient Normalized Maximum Likelihood Criterion for Learning Bayesian Network Structures

    Authors: Tomi Silander, Janne Leppä-aho, Elias Jääsaari, Teemu Roos

    Abstract: We introduce an information theoretic criterion for Bayesian network structure learning which we call quotient normalized maximum likelihood (qNML). In contrast to the closely related factorized normalized maximum likelihood criterion, qNML satisfies the property of score equivalence. It is also decomposable and completely free of adjustable hyperparameters. For practical computations, we identify… ▽ More

    Submitted 27 August, 2024; originally announced August 2024.

    Comments: Accepted to AISTATS 2018

    Journal ref: PMLR 84:948-957, 2018

  6. arXiv:2402.14400  [pdf, other

    cs.CV cs.LG eess.IV

    Learning Developmental Age from 3D Infant Kinetics Using Adaptive Graph Neural Networks

    Authors: Daniel Holmberg, Manu Airaksinen, Viviana Marchi, Andrea Guzzetta, Anna Kivi, Leena Haataja, Sampsa Vanhatalo, Teemu Roos

    Abstract: Reliable methods for the neurodevelopmental assessment of infants are essential for early detection of problems that may need prompt interventions. Spontaneous motor activity, or 'kinetics', is shown to provide a powerful surrogate measure of upcoming neurodevelopment. However, its assessment is by and large qualitative and subjective, focusing on visually identified, age-specific gestures. In thi… ▽ More

    Submitted 4 December, 2024; v1 submitted 22 February, 2024; originally announced February 2024.

    Comments: 15 pages, 9 figures. Code repository available via https://github.com/deinal/infant-aagcn

    MSC Class: 68T06 ACM Class: I.2; I.4; J.3

  7. arXiv:2402.01813  [pdf, other

    cs.CY

    An Educational Tool for Learning about Social Media Tracking, Profiling, and Recommendation

    Authors: Nicolas Pope, Juho Kahila, Jari Laru, Henriikka Vartiainen, Teemu Roos, Matti Tedre

    Abstract: This paper introduces an educational tool for classroom use, based on explainable AI (XAI), designed to demystify key social media mechanisms - tracking, profiling, and content recommendation - for novice learners. The tool provides a familiar, interactive interface that resonates with learners' experiences with popular social media platforms, while also offering the means to "peek under the hood"… ▽ More

    Submitted 2 February, 2024; originally announced February 2024.

    Comments: 5 pages, 5 figures, submitted to ICALT 2024

  8. arXiv:2307.09469  [pdf, other

    physics.plasm-ph cs.LG

    Graph Representation of the Magnetic Field Topology in High-Fidelity Plasma Simulations for Machine Learning Applications

    Authors: Ioanna Bouri, Fanni Franssila, Markku Alho, Giulia Cozzani, Ivan Zaitsev, Minna Palmroth, Teemu Roos

    Abstract: Topological analysis of the magnetic field in simulated plasmas allows the study of various physical phenomena in a wide range of settings. One such application is magnetic reconnection, a phenomenon related to the dynamics of the magnetic field topology, which is difficult to detect and characterize in three dimensions. We propose a scalable pipeline for topological data analysis and spatiotempor… ▽ More

    Submitted 26 July, 2023; v1 submitted 10 July, 2023; originally announced July 2023.

    Comments: 6 pages, 3 figures, Accepted at the ICML 2023 Workshop on Machine Learning for Astrophysics

  9. arXiv:2103.12068  [pdf, other

    cs.CV cs.LG

    Transfer Learning with Ensembles of Deep Neural Networks for Skin Cancer Detection in Imbalanced Data Sets

    Authors: Aqsa Saeed Qureshi, Teemu Roos

    Abstract: Several machine learning techniques for accurate detection of skin cancer from medical images have been reported. Many of these techniques are based on pre-trained convolutional neural networks (CNNs), which enable training the models based on limited amounts of training data. However, the classification accuracy of these models still tends to be severely limited by the scarcity of representative… ▽ More

    Submitted 17 May, 2021; v1 submitted 22 March, 2021; originally announced March 2021.

  10. arXiv:2004.02569  [pdf, other

    cs.LG cond-mat.mtrl-sci physics.comp-ph stat.ML

    Gradient-Based Training and Pruning of Radial Basis Function Networks with an Application in Materials Physics

    Authors: Jussi Määttä, Viacheslav Bazaliy, Jyri Kimari, Flyura Djurabekova, Kai Nordlund, Teemu Roos

    Abstract: Many applications, especially in physics and other sciences, call for easily interpretable and robust machine learning techniques. We propose a fully gradient-based technique for training radial basis function networks with an efficient and scalable open-source implementation. We derive novel closed-form optimization criteria for pruning the models for continuous as well as binary data which arise… ▽ More

    Submitted 6 April, 2020; originally announced April 2020.

    Journal ref: Neural Networks 133, 123 (2021)

  11. arXiv:1910.08322  [pdf, other

    cs.LG stat.ML

    A Multilabel Classification Framework for Approximate Nearest Neighbor Search

    Authors: Ville Hyvönen, Elias Jääsaari, Teemu Roos

    Abstract: Both supervised and unsupervised machine learning algorithms have been used to learn partition-based index structures for approximate nearest neighbor (ANN) search. Existing supervised algorithms formulate the learning task as finding a partition in which the nearest neighbors of a training set point belong to the same partition element as the point itself, so that the nearest neighbor candidates… ▽ More

    Submitted 13 October, 2022; v1 submitted 18 October, 2019; originally announced October 2019.

    Comments: To appear in the proceedings of Conference on Neural Information Processing Systems (NeurIPS) 2022

    ACM Class: G.3; H.3.3

  12. arXiv:1908.08484  [pdf, ps, other

    stat.ME cs.IT cs.LG stat.ML

    Minimum Description Length Revisited

    Authors: Peter Grünwald, Teemu Roos

    Abstract: This is an up-to-date introduction to and overview of the Minimum Description Length (MDL) Principle, a theory of inductive inference that can be applied to general problems in statistics, machine learning and pattern recognition. While MDL was originally based on data compression ideas, this introduction can be read without any knowledge thereof. It takes into account all major developments since… ▽ More

    Submitted 18 December, 2019; v1 submitted 21 August, 2019; originally announced August 2019.

    Comments: to appear in International Journal of Mathematics for Industry

  13. Efficient Autotuning of Hyperparameters in Approximate Nearest Neighbor Search

    Authors: Elias Jääsaari, Ville Hyvönen, Teemu Roos

    Abstract: Approximate nearest neighbor algorithms are used to speed up nearest neighbor search in a wide array of applications. However, current indexing methods feature several hyperparameters that need to be tuned to reach an acceptable accuracy--speed trade-off. A grid search in the parameter space is often impractically slow due to a time-consuming index-building procedure. Therefore, we propose an algo… ▽ More

    Submitted 18 December, 2018; originally announced December 2018.

    Comments: Accepted for the 23rd Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD) 2019

    Journal ref: Advances in Knowledge Discovery and Data Mining. PAKDD 2019. Lecture Notes in Computer Science, vol 11440. Springer, Cham. pp. 590-602

  14. arXiv:1811.11811  [pdf, other

    cs.IT cs.DC

    An Application of Storage-Optimal MatDot Codes for Coded Matrix Multiplication: Fast k-Nearest Neighbors Estimation

    Authors: Utsav Sheth, Sanghamitra Dutta, Malhar Chaudhari, Haewon Jeong, Yaoqing Yang, Jukka Kohonen, Teemu Roos, Pulkit Grover

    Abstract: We propose a novel application of coded computing to the problem of the nearest neighbor estimation using MatDot Codes [Fahim. et.al. 2017], that are known to be optimal for matrix multiplication in terms of recovery threshold under storage constraints. In approximate nearest neighbor algorithms, it is common to construct efficient in-memory indexes to improve query response time. One such strateg… ▽ More

    Submitted 28 November, 2018; originally announced November 2018.

    Comments: Accepted for publication at the IEEE Big Data 2018

  15. arXiv:1708.02497  [pdf, other

    cs.LG cs.IT stat.ML

    Learning non-parametric Markov networks with mutual information

    Authors: Janne Leppä-aho, Santeri Räisänen, Xiao Yang, Teemu Roos

    Abstract: We propose a method for learning Markov network structures for continuous data without invoking any assumptions about the distribution of the variables. The method makes use of previous work on a non-parametric estimator for mutual information which is used to create a non-parametric test for multivariate conditional independence. This independence test is then combined with an efficient constrain… ▽ More

    Submitted 8 August, 2017; originally announced August 2017.

  16. Learning Gaussian Graphical Models With Fractional Marginal Pseudo-likelihood

    Authors: Janne Leppä-aho, Johan Pensar, Teemu Roos, Jukka Corander

    Abstract: We propose a Bayesian approximate inference method for learning the dependence structure of a Gaussian graphical model. Using pseudo-likelihood, we derive an analytical expression to approximate the marginal likelihood for an arbitrary graph structure without invoking any assumptions about decomposability. The majority of the existing methods for learning Gaussian graphical models are either restr… ▽ More

    Submitted 25 February, 2016; originally announced February 2016.

  17. Fast k-NN search

    Authors: Ville Hyvönen, Teemu Pitkänen, Sotiris Tasoulis, Elias Jääsaari, Risto Tuomainen, Liang Wang, Jukka Corander, Teemu Roos

    Abstract: Efficient index structures for fast approximate nearest neighbor queries are required in many applications such as recommendation systems. In high-dimensional spaces, many conventional methods suffer from excessive usage of memory and slow response times. We propose a method where multiple random projection trees are combined by a novel voting scheme. The key idea is to exploit the redundancy in a… ▽ More

    Submitted 19 August, 2016; v1 submitted 23 September, 2015; originally announced September 2015.

    Journal ref: IEEE International Conference on Big Data 2016, p. 881-888

  18. arXiv:1401.7116  [pdf, other

    cs.IT cs.LG stat.ML

    Bayesian Properties of Normalized Maximum Likelihood and its Fast Computation

    Authors: Andrew Barron, Teemu Roos, Kazuho Watanabe

    Abstract: The normalized maximized likelihood (NML) provides the minimax regret solution in universal data compression, gambling, and prediction, and it plays an essential role in the minimum description length (MDL) method of statistical modeling and estimation. Here we show that the normalized maximum likelihood has a Bayes-like representation as a mixture of the component models, even in finite samples,… ▽ More

    Submitted 28 January, 2014; originally announced January 2014.

    Comments: Submitted to ISIT-2004 conference

  19. arXiv:1401.0561  [pdf, other

    cs.CR cs.HC

    User-Generated Free-Form Gestures for Authentication: Security and Memorability

    Authors: Michael Sherman, Gradeigh Clark, Yulong Yang, Shridatt Sugrim, Arttu Modig, Janne Lindqvist, Antti Oulasvirta, Teemu Roos

    Abstract: This paper studies the security and memorability of free-form multitouch gestures for mobile authentication. Towards this end, we collected a dataset with a generate-test-retest paradigm where participants (N=63) generated free-form gestures, repeated them, and were later retested for memory. Half of the participants decided to generate one-finger gestures, and the other half generated multi-finge… ▽ More

    Submitted 2 January, 2014; originally announced January 2014.

  20. arXiv:1102.5225  [pdf, other

    cs.IT cs.HC physics.bio-ph q-bio.NC

    Let Us Dance Just a Little Bit More --- On the Information Capacity of the Human Motor System

    Authors: Teemu Roos, Antti Oulasvirta, Laura Leppänen, Arttu Modig

    Abstract: Fitts' law is a fundamental tool in measuring the capacity of the human motor system. However, it is, by definition, limited to aimed movements toward spatially expanded targets. We revisit its information-theoretic basis with the goal of generalizing it into unconstrained trained movement such as dance and sports. The proposed new measure is based on a subject's ability to accurately reproduce a… ▽ More

    Submitted 13 February, 2012; v1 submitted 25 February, 2011; originally announced February 2011.

    Comments: Presented at the 2012 Information Theory and Applications Workshop, San Diego, CA

  21. MDL Denoising Revisited

    Authors: Teemu Roos, Petri Myllymäki, Jorma Rissanen

    Abstract: We refine and extend an earlier MDL denoising criterion for wavelet-based denoising. We start by showing that the denoising problem can be reformulated as a clustering problem, where the goal is to obtain separate clusters for informative and non-informative wavelet coefficients, respectively. This suggests two refinements, adding a code-length for the model index, and extending the model in ord… ▽ More

    Submitted 25 September, 2006; originally announced September 2006.

    Comments: Submitted to IEEE Transactions on Information Theory, June 2006