Search | arXiv e-print repository

doi 10.1109/TSP.2025.3532102

Single-Source Localization as an Eigenvalue Problem

Authors: Martin Larsson, Viktor Larsson, Kalle Åström, Magnus Oskarsson

Abstract: This paper introduces a novel method for solving the single-source localization problem, specifically addressing the case of trilateration. We formulate the problem as a weighted least-squares problem in the squared distances and demonstrate how suitable weights are chosen to accommodate different noise distributions. By transforming this formulation into an eigenvalue problem, we leverage existin… ▽ More This paper introduces a novel method for solving the single-source localization problem, specifically addressing the case of trilateration. We formulate the problem as a weighted least-squares problem in the squared distances and demonstrate how suitable weights are chosen to accommodate different noise distributions. By transforming this formulation into an eigenvalue problem, we leverage existing eigensolvers to achieve a fast, numerically stable, and easily implemented solver. Furthermore, our theoretical analysis establishes that the globally optimal solution corresponds to the largest real eigenvalue, drawing parallels to the existing literature on the trust-region subproblem. Unlike previous works, we give special treatment to degenerate cases, where multiple and possibly infinitely many solutions exist. We provide a geometric interpretation of the solution sets and design the proposed method to handle these cases gracefully. Finally, we validate against a range of state-of-the-art methods using synthetic and real data, demonstrating how the proposed method is among the fastest and most numerically stable. △ Less

Submitted 25 February, 2025; originally announced February 2025.

Comments: Copyright 2025 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works

MSC Class: 51K05

Journal ref: IEEE Transactions on Signal Processing, p. 574-583, Vol. 73, 2025

arXiv:2411.13179 [pdf, other]

SONNET: Enhancing Time Delay Estimation by Leveraging Simulated Audio

Authors: Erik Tegler, Magnus Oskarsson, Kalle Åström

Abstract: Time delay estimation or Time-Difference-Of-Arrival estimates is a critical component for multiple localization applications such as multilateration, direction of arrival, and self-calibration. The task is to estimate the time difference between a signal arriving at two different sensors. For the audio sensor modality, most current systems are based on classical methods such as the Generalized Cro… ▽ More Time delay estimation or Time-Difference-Of-Arrival estimates is a critical component for multiple localization applications such as multilateration, direction of arrival, and self-calibration. The task is to estimate the time difference between a signal arriving at two different sensors. For the audio sensor modality, most current systems are based on classical methods such as the Generalized Cross-Correlation Phase Transform (GCC-PHAT) method. In this paper we demonstrate that learning based methods can, even based on synthetic data, significantly outperform GCC-PHAT on novel real world data. To overcome the lack of data with ground truth for the task, we train our model on a simulated dataset which is sufficiently large and varied, and that captures the relevant characteristics of the real world problem. We provide our trained model, SONNET (Simulation Optimized Neural Network Estimator of Timeshifts), which is runnable in real-time and works on novel data out of the box for many real data applications, i.e. without re-training. We further demonstrate greatly improved performance on the downstream task of self-calibration when using our model compared to classical methods. △ Less

Submitted 20 November, 2024; originally announced November 2024.

arXiv:2411.04596 [pdf, other]

The Impact of Semi-Supervised Learning on Line Segment Detection

Authors: Johanna Engman, Karl Åström, Magnus Oskarsson

Abstract: In this paper we present a method for line segment detection in images, based on a semi-supervised framework. Leveraging the use of a consistency loss based on differently augmented and perturbed unlabeled images with a small amount of labeled data, we show comparable results to fully supervised methods. This opens up application scenarios where annotation is difficult or expensive, and for domain… ▽ More In this paper we present a method for line segment detection in images, based on a semi-supervised framework. Leveraging the use of a consistency loss based on differently augmented and perturbed unlabeled images with a small amount of labeled data, we show comparable results to fully supervised methods. This opens up application scenarios where annotation is difficult or expensive, and for domain specific adaptation of models. We are specifically interested in real-time and online applications, and investigate small and efficient learning backbones. Our method is to our knowledge the first to target line detection using modern state-of-the-art methodologies for semi-supervised learning. We test the method on both standard benchmarks and domain specific scenarios for forestry applications, showing the tractability of the proposed method. △ Less

Submitted 7 November, 2024; originally announced November 2024.

Comments: 9 pages, 6 figures, 7 tables

arXiv:2408.17166 [pdf, other]

Learning Multi-Target TDOA Features for Sound Event Localization and Detection

Authors: Axel Berg, Johanna Engman, Jens Gulin, Karl Åström, Magnus Oskarsson

Abstract: Sound event localization and detection (SELD) systems using audio recordings from a microphone array rely on spatial cues for determining the location of sound events. As a consequence, the localization performance of such systems is to a large extent determined by the quality of the audio features that are used as inputs to the system. We propose a new feature, based on neural generalized cross-c… ▽ More Sound event localization and detection (SELD) systems using audio recordings from a microphone array rely on spatial cues for determining the location of sound events. As a consequence, the localization performance of such systems is to a large extent determined by the quality of the audio features that are used as inputs to the system. We propose a new feature, based on neural generalized cross-correlations with phase-transform (NGCC-PHAT), that learns audio representations suitable for localization. Using permutation invariant training for the time-difference of arrival (TDOA) estimation problem enables NGCC-PHAT to learn TDOA features for multiple overlapping sound events. These features can be used as a drop-in replacement for GCC-PHAT inputs to a SELD-network. We test our method on the STARSS23 dataset and demonstrate improved localization performance compared to using standard GCC-PHAT or SALSA-Lite input features. △ Less

Submitted 30 August, 2024; originally announced August 2024.

Comments: DCASE 2024

arXiv:2408.15771 [pdf, other]

doi 10.1109/IPIN62893.2024.10786105

wav2pos: Sound Source Localization using Masked Autoencoders

Authors: Axel Berg, Jens Gulin, Mark O'Connor, Chuteng Zhou, Karl Åström, Magnus Oskarsson

Abstract: We present a novel approach to the 3D sound source localization task for distributed ad-hoc microphone arrays by formulating it as a set-to-set regression problem. By training a multi-modal masked autoencoder model that operates on audio recordings and microphone coordinates, we show that such a formulation allows for accurate localization of the sound source, by reconstructing coordinates masked… ▽ More We present a novel approach to the 3D sound source localization task for distributed ad-hoc microphone arrays by formulating it as a set-to-set regression problem. By training a multi-modal masked autoencoder model that operates on audio recordings and microphone coordinates, we show that such a formulation allows for accurate localization of the sound source, by reconstructing coordinates masked in the input. Our approach is flexible in the sense that a single model can be used with an arbitrary number of microphones, even when a subset of audio recordings and microphone coordinates are missing. We test our method on simulated and real-world recordings of music and speech in indoor environments, and demonstrate competitive performance compared to both classical and other learning based localization methods. △ Less

Submitted 28 August, 2024; originally announced August 2024.

Comments: IPIN 2024

arXiv:2212.02319 [pdf, other]

Robust and Accurate Cylinder Triangulation

Authors: Anna Gummeson, Magnus Oskarsson

Abstract: In this paper we present methods for triangulation of infinite cylinders from image line silhouettes. We show numerically that linear estimation of a general quadric surface is inherently a badly posed problem. Instead we propose to constrain the conic section to a circle, and give algebraic constraints on the dual conic, that models this manifold. Using these constraints we derive a fast minimal… ▽ More In this paper we present methods for triangulation of infinite cylinders from image line silhouettes. We show numerically that linear estimation of a general quadric surface is inherently a badly posed problem. Instead we propose to constrain the conic section to a circle, and give algebraic constraints on the dual conic, that models this manifold. Using these constraints we derive a fast minimal solver based on three image silhouette lines, that can be used to bootstrap robust estimation schemes such as RANSAC. We also present a constrained least squares solver that can incorporate all available image lines for accurate estimation. The algorithms are tested on both synthetic and real data, where they are shown to give accurate results, compared to previous methods. △ Less

Submitted 5 December, 2022; originally announced December 2022.

Comments: To be published in proceedings of the Scandinavian Conference on Image Analysis (SCIA) 2023

arXiv:2208.04654 [pdf, other]

doi 10.21437/Interspeech.2022-524

Extending GCC-PHAT using Shift Equivariant Neural Networks

Authors: Axel Berg, Mark O'Connor, Kalle Åström, Magnus Oskarsson

Abstract: Speaker localization using microphone arrays depends on accurate time delay estimation techniques. For decades, methods based on the generalized cross correlation with phase transform (GCC-PHAT) have been widely adopted for this purpose. Recently, the GCC-PHAT has also been used to provide input features to neural networks in order to remove the effects of noise and reverberation, but at the cost… ▽ More Speaker localization using microphone arrays depends on accurate time delay estimation techniques. For decades, methods based on the generalized cross correlation with phase transform (GCC-PHAT) have been widely adopted for this purpose. Recently, the GCC-PHAT has also been used to provide input features to neural networks in order to remove the effects of noise and reverberation, but at the cost of losing theoretical guarantees in noise-free conditions. We propose a novel approach to extending the GCC-PHAT, where the received signals are filtered using a shift equivariant neural network that preserves the timing information contained in the signals. By extensive experiments we show that our model consistently reduces the error of the GCC-PHAT in adverse environments, with guarantees of exact time delay recovery in ideal conditions. △ Less

Submitted 9 August, 2022; originally announced August 2022.

Comments: Proceedings of INTERSPEECH

Journal ref: Proc. Interspeech 2022, 1791-1795

arXiv:2205.11299 [pdf, other]

Multiple Offsets Multilateration: a new paradigm for sensor network calibration with unsynchronized reference nodes

Authors: Luca Ferranti, Kalle Åström, Magnus Oskarsson, Jani Boutellier, Juho Kannala

Abstract: Positioning using wave signal measurements is used in several applications, such as GPS systems, structure from sound and Wifi based positioning. Mathematically, such problems require the computation of the positions of receivers and/or transmitters as well as time offsets if the devices are unsynchronized. In this paper, we expand the previous state-of-the-art on positioning formulations by intro… ▽ More Positioning using wave signal measurements is used in several applications, such as GPS systems, structure from sound and Wifi based positioning. Mathematically, such problems require the computation of the positions of receivers and/or transmitters as well as time offsets if the devices are unsynchronized. In this paper, we expand the previous state-of-the-art on positioning formulations by introducing Multiple Offsets Multilateration (MOM), a new mathematical framework to compute the receivers positions with pseudoranges from unsynchronized reference transmitters at known positions. This could be applied in several scenarios, for example structure from sound and positioning with LEO satellites. We mathematically describe MOM, determining how many receivers and transmitters are needed for the network to be solvable, a study on the number of possible distinct solutions is presented and stable solvers based on homotopy continuation are derived. The solvers are shown to be efficient and robust to noise both for synthetic and real audio data. △ Less

Submitted 23 May, 2022; originally announced May 2022.

Comments: accepted to ICASSP2022

arXiv:2204.03957 [pdf, other]

Points to Patches: Enabling the Use of Self-Attention for 3D Shape Recognition

Authors: Axel Berg, Magnus Oskarsson, Mark O'Connor

Abstract: While the Transformer architecture has become ubiquitous in the machine learning field, its adaptation to 3D shape recognition is non-trivial. Due to its quadratic computational complexity, the self-attention operator quickly becomes inefficient as the set of input points grows larger. Furthermore, we find that the attention mechanism struggles to find useful connections between individual points… ▽ More While the Transformer architecture has become ubiquitous in the machine learning field, its adaptation to 3D shape recognition is non-trivial. Due to its quadratic computational complexity, the self-attention operator quickly becomes inefficient as the set of input points grows larger. Furthermore, we find that the attention mechanism struggles to find useful connections between individual points on a global scale. In order to alleviate these problems, we propose a two-stage Point Transformer-in-Transformer (Point-TnT) approach which combines local and global attention mechanisms, enabling both individual points and patches of points to attend to each other effectively. Experiments on shape classification show that such an approach provides more useful features for downstream tasks than the baseline Transformer, while also being more computationally efficient. In addition, we also extend our method to feature matching for scene reconstruction, showing that it can be used in conjunction with existing scene reconstruction pipelines. △ Less

Submitted 8 April, 2022; originally announced April 2022.

Comments: Accepted to the 26th International Conference on Pattern Recognition

arXiv:2108.00667 [pdf, other]

Homotopy Continuation for Sensor Networks Self-Calibration

Authors: Luca Ferranti, Kalle Åström, Magnus Oskarsson, Jani Boutellier, Juho Kannala

Abstract: Given a sensor network, TDOA self-calibration aims at simultaneously estimating the positions of receivers and transmitters, and transmitters time offsets. This can be formulated as a system of polynomial equations. Due to the elevated number of unknowns and the nonlinearity of the problem, obtaining an accurate solution efficiently is nontrivial. Previous work has shown that iterative algorithms… ▽ More Given a sensor network, TDOA self-calibration aims at simultaneously estimating the positions of receivers and transmitters, and transmitters time offsets. This can be formulated as a system of polynomial equations. Due to the elevated number of unknowns and the nonlinearity of the problem, obtaining an accurate solution efficiently is nontrivial. Previous work has shown that iterative algorithms are sensitive to initialization and little noise can lead to failure in convergence. Hence, research has focused on algebraic techniques. Stable and efficient algebraic solvers have been proposed for some network configurations, but they do not work for smaller networks. In this paper, we use homotopy continuation to solve four previously unsolved configurations in 2D TDOA self-calibration, including a minimal one. As a theoretical contribution, we investigate the number of solutions of the new minimal configuration, showing this is much lower than previous estimates. As a more practical contribution, we also present new subminimal solvers, which can be used to achieve unique accurate solutions in previously unsolvable configurations. We demonstrate our solvers are stable both with clean and noisy data, even without nonlinear refinement afterwards. Moreover, we demonstrate the suitability of homotopy continuation for sensor network calibration problems, opening prospects to new applications. △ Less

Submitted 2 August, 2021; originally announced August 2021.

Comments: accepted to EUSIPCO2021

arXiv:2012.06178 [pdf, other]

Detailed 3D Human Body Reconstruction from Multi-view Images Combining Voxel Super-Resolution and Learned Implicit Representation

Authors: Zhongguo Li, Magnus Oskarsson, Anders Heyden

Abstract: The task of reconstructing detailed 3D human body models from images is interesting but challenging in computer vision due to the high freedom of human bodies. In order to tackle the problem, we propose a coarse-to-fine method to reconstruct a detailed 3D human body from multi-view images combining voxel super-resolution based on learning the implicit representation. Firstly, the coarse 3D models… ▽ More The task of reconstructing detailed 3D human body models from images is interesting but challenging in computer vision due to the high freedom of human bodies. In order to tackle the problem, we propose a coarse-to-fine method to reconstruct a detailed 3D human body from multi-view images combining voxel super-resolution based on learning the implicit representation. Firstly, the coarse 3D models are estimated by learning an implicit representation based on multi-scale features which are extracted by multi-stage hourglass networks from the multi-view images. Then, taking the low resolution voxel grids which are generated by the coarse 3D models as input, the voxel super-resolution based on an implicit representation is learned through a multi-stage 3D convolutional neural network. Finally, the refined detailed 3D human body models can be produced by the voxel super-resolution which can preserve the details and reduce the false reconstruction of the coarse 3D models. Benefiting from the implicit representation, the training process in our method is memory efficient and the detailed 3D human body produced by our method from multi-view images is the continuous decision boundary with high-resolution geometry. In addition, the coarse-to-fine method based on voxel super-resolution can remove false reconstructions and preserve the appearance details in the final reconstruction, simultaneously. In the experiments, our method quantitatively and qualitatively achieves the competitive 3D human body reconstructions from images with various poses and shapes on both the real and synthetic datasets. △ Less

Submitted 11 December, 2020; originally announced December 2020.

arXiv:2012.06109 [pdf, other]

A novel joint points and silhouette-based method to estimate 3D human pose and shape

Authors: Zhongguo Li, Anders Heyden, Magnus Oskarsson

Abstract: This paper presents a novel method for 3D human pose and shape estimation from images with sparse views, using joint points and silhouettes, based on a parametric model. Firstly, the parametric model is fitted to the joint points estimated by deep learning-based human pose estimation. Then, we extract the correspondence between the parametric model of pose fitting and silhouettes on 2D and 3D spac… ▽ More This paper presents a novel method for 3D human pose and shape estimation from images with sparse views, using joint points and silhouettes, based on a parametric model. Firstly, the parametric model is fitted to the joint points estimated by deep learning-based human pose estimation. Then, we extract the correspondence between the parametric model of pose fitting and silhouettes on 2D and 3D space. A novel energy function based on the correspondence is built and minimized to fit parametric model to the silhouettes. Our approach uses sufficient shape information because the energy function of silhouettes is built from both 2D and 3D space. This also means that our method only needs images from sparse views, which balances data used and the required prior information. Results on synthetic data and real data demonstrate the competitive performance of our approach on pose and shape estimation of the human body. △ Less

Submitted 10 December, 2020; originally announced December 2020.

Comments: Accepted to ICPR 2020 3DHU workshop

arXiv:2006.15864 [pdf, other]

doi 10.1109/ICPR48806.2021.9412608

Deep Ordinal Regression with Label Diversity

Authors: Axel Berg, Magnus Oskarsson, Mark O'Connor

Abstract: Regression via classification (RvC) is a common method used for regression problems in deep learning, where the target variable belongs to a set of continuous values. By discretizing the target into a set of non-overlapping classes, it has been shown that training a classifier can improve neural network accuracy compared to using a standard regression approach. However, it is not clear how the set… ▽ More Regression via classification (RvC) is a common method used for regression problems in deep learning, where the target variable belongs to a set of continuous values. By discretizing the target into a set of non-overlapping classes, it has been shown that training a classifier can improve neural network accuracy compared to using a standard regression approach. However, it is not clear how the set of discrete classes should be chosen and how it affects the overall solution. In this work, we propose that using several discrete data representations simultaneously can improve neural network learning compared to a single representation. Our approach is end-to-end differentiable and can be added as a simple extension to conventional learning methods, such as deep neural networks. We test our method on three challenging tasks and show that our method reduces the prediction error compared to a baseline RvC approach while maintaining a similar model complexity. △ Less

Submitted 29 June, 2020; originally announced June 2020.

Comments: Accepted to ICPR2020

Journal ref: 2020 25th International Conference on Pattern Recognition (ICPR), 2021, pp. 2740-2747

arXiv:2005.10298 [pdf, ps, other]

Sensor Networks TDOA Self-Calibration: 2D Complexity Analysis and Solutions

Authors: Luca Ferranti, Kalle Åström, Magnus Oskarsson, Jani Boutellier, Juho Kannala

Abstract: Given a network of receivers and transmitters, the process of determining their positions from measured pseudoranges is known as network self-calibration. In this paper we consider 2D networks with synchronized receivers but unsynchronized transmitters and the corresponding calibration techniques, known as Time-Difference-Of-Arrival (TDOA) techniques. Despite previous work, TDOA self-calibration i… ▽ More Given a network of receivers and transmitters, the process of determining their positions from measured pseudoranges is known as network self-calibration. In this paper we consider 2D networks with synchronized receivers but unsynchronized transmitters and the corresponding calibration techniques, known as Time-Difference-Of-Arrival (TDOA) techniques. Despite previous work, TDOA self-calibration is computationally challenging. Iterative algorithms are very sensitive to the initialization, causing convergence issues. In this paper, we present a novel approach, which gives an algebraic solution to two previously unsolved scenarios. We also demonstrate that our solvers produce an excellent initial value for non-linear optimisation algorithms, leading to a full pipeline robust to noise. △ Less

Submitted 22 October, 2020; v1 submitted 20 May, 2020; originally announced May 2020.

arXiv:1811.04494 [pdf, other]

Massive MIMO-based Localization and Mapping Exploiting Phase Information of Multipath Components

Authors: Xuhong Li, Erik Leitinger, Magnus Oskarsson, Kalle Åström, Fredrik Tufvesson

Abstract: In this paper, we present a robust multipath-based localization and mapping framework that exploits the phases of specular multipath components (MPCs) using a massive multiple-input multiple-output (MIMO) array at the base station. Utilizing the phase information related to the propagation distances of the MPCs enables the possibility of localization with extraordinary accuracy even with limited b… ▽ More In this paper, we present a robust multipath-based localization and mapping framework that exploits the phases of specular multipath components (MPCs) using a massive multiple-input multiple-output (MIMO) array at the base station. Utilizing the phase information related to the propagation distances of the MPCs enables the possibility of localization with extraordinary accuracy even with limited bandwidth. The specular MPC parameters along with the parameters of the noise and the dense multipath component (DMC) are tracked using an extended Kalman filter (EKF), which enables to preserve the distance-related phase changes of the MPC complex amplitudes. The DMC comprises all non-resolvable MPCs, which occur due to finite measurement aperture. The estimation of the DMC parameters enhances the estimation quality of the specular MPCs and therefore also the quality of localization and mapping. The estimated MPC propagation distances are subsequently used as input to a distance-based localization and mapping algorithm. This algorithm does not need prior knowledge about the surrounding environment and base station position. The performance is demonstrated with real radio-channel measurements using an antenna array with 128 ports at the base station side and a standard cellular signal bandwidth of 40 MHz. The results show that high accuracy localization is possible even with such a low bandwidth. △ Less

Submitted 12 June, 2019; v1 submitted 11 November, 2018; originally announced November 2018.

Comments: 14 pages (two columns), 13 figures. This work has been submitted to the IEEE Transaction on Wireless Communications for possible publication

arXiv:1805.10705 [pdf, ps, other]

A fast minimal solver for absolute camera pose with unknown focal length and radial distortion from four planar points

Authors: Magnus Oskarsson

Abstract: In this paper we present a fast minimal solver for absolute camera pose estimation from four known points that lie in a plane. We assume a perspective camera model with unknown focal length and unknown radial distortion. The radial distortion is modelled using the division model with one parameter. We show that the solutions to this problem can be found from a univariate six-degree polynomial. Thi… ▽ More In this paper we present a fast minimal solver for absolute camera pose estimation from four known points that lie in a plane. We assume a perspective camera model with unknown focal length and unknown radial distortion. The radial distortion is modelled using the division model with one parameter. We show that the solutions to this problem can be found from a univariate six-degree polynomial. This results in a very fast and numerically stable solver. △ Less

Submitted 5 June, 2018; v1 submitted 27 May, 2018; originally announced May 2018.

arXiv:1803.04360 [pdf, other]

Beyond Gröbner Bases: Basis Selection for Minimal Solvers

Authors: Viktor Larsson, Magnus Oskarsson, Kalle Åström, Alge Wallis, Zuzana Kukelova, Tomas Pajdla

Abstract: Many computer vision applications require robust estimation of the underlying geometry, in terms of camera motion and 3D structure of the scene. These robust methods often rely on running minimal solvers in a RANSAC framework. In this paper we show how we can make polynomial solvers based on the action matrix method faster, by careful selection of the monomial bases. These monomial bases have trad… ▽ More Many computer vision applications require robust estimation of the underlying geometry, in terms of camera motion and 3D structure of the scene. These robust methods often rely on running minimal solvers in a RANSAC framework. In this paper we show how we can make polynomial solvers based on the action matrix method faster, by careful selection of the monomial bases. These monomial bases have traditionally been based on a Gröbner basis for the polynomial ideal. Here we describe how we can enumerate all such bases in an efficient way. We also show that going beyond Gröbner bases leads to more efficient solvers in many cases. We present a novel basis sampling scheme that we evaluate on a number of problems. △ Less

Submitted 12 March, 2018; originally announced March 2018.

Showing 1–17 of 17 results for author: Oskarsson, M