-
Safety and optimality in learning-based control at low computational cost
Authors:
Dominik Baumann,
Krzysztof Kowalczyk,
Cristian R. Rojas,
Koen Tiels,
Pawel Wachel
Abstract:
Applying machine learning methods to physical systems that are supposed to act in the real world requires providing safety guarantees. However, methods that include such guarantees often come at a high computational cost, making them inapplicable to large datasets and embedded devices with low computational power. In this paper, we propose CoLSafe, a computationally lightweight safe learning algor…
▽ More
Applying machine learning methods to physical systems that are supposed to act in the real world requires providing safety guarantees. However, methods that include such guarantees often come at a high computational cost, making them inapplicable to large datasets and embedded devices with low computational power. In this paper, we propose CoLSafe, a computationally lightweight safe learning algorithm whose computational complexity grows sublinearly with the number of data points. We derive both safety and optimality guarantees and showcase the effectiveness of our algorithm on a seven-degrees-of-freedom robot arm.
△ Less
Submitted 12 May, 2025;
originally announced May 2025.
-
PainFormer: a Vision Foundation Model for Automatic Pain Assessment
Authors:
Stefanos Gkikas,
Raul Fernandez Rojas,
Manolis Tsiknakis
Abstract:
Pain is a manifold condition that impacts a significant percentage of the population. Accurate and reliable pain evaluation for the people suffering is crucial to developing effective and advanced pain management protocols. Automatic pain assessment systems provide continuous monitoring and support decision-making processes, ultimately aiming to alleviate distress and prevent functionality decline…
▽ More
Pain is a manifold condition that impacts a significant percentage of the population. Accurate and reliable pain evaluation for the people suffering is crucial to developing effective and advanced pain management protocols. Automatic pain assessment systems provide continuous monitoring and support decision-making processes, ultimately aiming to alleviate distress and prevent functionality decline. This study introduces PainFormer, a vision foundation model based on multi-task learning principles trained simultaneously on 14 tasks/datasets with a total of 10.9 million samples. Functioning as an embedding extractor for various input modalities, the foundation model provides feature representations to the Embedding-Mixer, a transformer-based module that performs the final pain assessment. Extensive experiments employing behavioral modalities-including RGB, synthetic thermal, and estimated depth videos-and physiological modalities such as ECG, EMG, GSR, and fNIRS revealed that PainFormer effectively extracts high-quality embeddings from diverse input modalities. The proposed framework is evaluated on two pain datasets, BioVid and AI4Pain, and directly compared to 75 different methodologies documented in the literature. Experiments conducted in unimodal and multimodal settings demonstrate state-of-the-art performances across modalities and pave the way toward general-purpose models for automatic pain assessment.
△ Less
Submitted 18 May, 2025; v1 submitted 2 May, 2025;
originally announced May 2025.
-
On Word-of-Mouth and Private-Prior Sequential Social Learning
Authors:
Andrea Da Col,
Cristian R. Rojas,
Vikram Krishnamurthy
Abstract:
Social learning provides a fundamental framework in economics and social sciences for studying interactions among rational agents who observe each other's actions but lack direct access to individual beliefs. This paper investigates a specific social learning paradigm known as Word-of-Mouth (WoM), where a series of agents seeks to estimate the state of a dynamical system. The first agent receives…
▽ More
Social learning provides a fundamental framework in economics and social sciences for studying interactions among rational agents who observe each other's actions but lack direct access to individual beliefs. This paper investigates a specific social learning paradigm known as Word-of-Mouth (WoM), where a series of agents seeks to estimate the state of a dynamical system. The first agent receives noisy measurements of the state, while each subsequent agent relies solely on a degraded version of her predecessor's estimate. A defining feature of WoM is that the final agent's belief is publicly broadcast and adopted by all agents, in place of their own. We analyze this setting both theoretically and through numerical simulations, showing that some agents benefit from using the public belief broadcast by the last agent, while others suffer from performance deterioration.
△ Less
Submitted 7 April, 2025; v1 submitted 3 April, 2025;
originally announced April 2025.
-
Deep Learning on Hester Davis Scores for Inpatient Fall Prediction
Authors:
Hojjat Salehinejad,
Ricky Rojas,
Kingsley Iheasirim,
Mohammed Yousufuddin,
Bijan Borah
Abstract:
Fall risk prediction among hospitalized patients is a critical aspect of patient safety in clinical settings, and accurate models can help prevent adverse events. The Hester Davis Score (HDS) is commonly used to assess fall risk, with current clinical practice relying on a threshold-based approach. In this method, a patient is classified as high-risk when their HDS exceeds a predefined threshold.…
▽ More
Fall risk prediction among hospitalized patients is a critical aspect of patient safety in clinical settings, and accurate models can help prevent adverse events. The Hester Davis Score (HDS) is commonly used to assess fall risk, with current clinical practice relying on a threshold-based approach. In this method, a patient is classified as high-risk when their HDS exceeds a predefined threshold. However, this approach may fail to capture dynamic patterns in fall risk over time. In this study, we model the threshold-based approach and propose two machine learning approaches for enhanced fall prediction: One-step ahead fall prediction and sequence-to-point fall prediction. The one-step ahead model uses the HDS at the current timestamp to predict the risk at the next timestamp, while the sequence-to-point model leverages all preceding HDS values to predict fall risk using deep learning. We compare these approaches to assess their accuracy in fall risk prediction, demonstrating that deep learning can outperform the traditional threshold-based method by capturing temporal patterns and improving prediction reliability. These findings highlight the potential for data-driven approaches to enhance patient safety through more reliable fall prevention strategies.
△ Less
Submitted 10 January, 2025;
originally announced January 2025.
-
Unconventional Universal Computation in Babbage's Analytical Engine
Authors:
Raul Rojas
Abstract:
This paper shows that the programming model of Babbage's Analytical Engine, although unconventional, can be harnessed in order to simulate indirect addressing, a capability that was not included in the original instruction set. That is, in a theoretical sense, the Analytical Engine was as universal as computers we have today. We show how to implement indirect addressing for a working memory of fix…
▽ More
This paper shows that the programming model of Babbage's Analytical Engine, although unconventional, can be harnessed in order to simulate indirect addressing, a capability that was not included in the original instruction set. That is, in a theoretical sense, the Analytical Engine was as universal as computers we have today. We show how to implement indirect addressing for a working memory of fixed size; this makes it possible to simulate a Turing machine with a finite tape. The result is, of course, only of theoretical and historical interest, without any practical implications.
△ Less
Submitted 20 July, 2024;
originally announced August 2024.
-
GroundGrid:LiDAR Point Cloud Ground Segmentation and Terrain Estimation
Authors:
Nicolai Steinke,
Daniel Göhring,
Raùl Rojas
Abstract:
The precise point cloud ground segmentation is a crucial prerequisite of virtually all perception tasks for LiDAR sensors in autonomous vehicles. Especially the clustering and extraction of objects from a point cloud usually relies on an accurate removal of ground points. The correct estimation of the surrounding terrain is important for aspects of the drivability of a surface, path planning, and…
▽ More
The precise point cloud ground segmentation is a crucial prerequisite of virtually all perception tasks for LiDAR sensors in autonomous vehicles. Especially the clustering and extraction of objects from a point cloud usually relies on an accurate removal of ground points. The correct estimation of the surrounding terrain is important for aspects of the drivability of a surface, path planning, and obstacle prediction. In this article, we propose our system GroundGrid which relies on 2D elevation maps to solve the terrain estimation and point cloud ground segmentation problems. We evaluate the ground segmentation and terrain estimation performance of GroundGrid and compare it to other state-of-the-art methods using the SemanticKITTI dataset and a novel evaluation method relying on airborne LiDAR scanning. The results show that GroundGrid is capable of outperforming other state-of-the-art systems with an average IoU of 94.78% while maintaining a high run-time performance of 171Hz. The source code is available at https://github.com/dcmlr/groundgrid
△ Less
Submitted 24 May, 2024;
originally announced May 2024.
-
Kernel-based learning with guarantees for multi-agent applications
Authors:
Krzysztof Kowalczyk,
Paweł Wachel,
Cristian R. Rojas
Abstract:
This paper addresses a kernel-based learning problem for a network of agents locally observing a latent multidimensional, nonlinear phenomenon in a noisy environment. We propose a learning algorithm that requires only mild a priori knowledge about the phenomenon under investigation and delivers a model with corresponding non-asymptotic high probability error bounds. Both non-asymptotic analysis of…
▽ More
This paper addresses a kernel-based learning problem for a network of agents locally observing a latent multidimensional, nonlinear phenomenon in a noisy environment. We propose a learning algorithm that requires only mild a priori knowledge about the phenomenon under investigation and delivers a model with corresponding non-asymptotic high probability error bounds. Both non-asymptotic analysis of the method and numerical simulation results are presented and discussed in the paper.
△ Less
Submitted 15 April, 2024;
originally announced April 2024.
-
Enhancing Testing at Meta with Rich-State Simulated Populations
Authors:
Nadia Alshahwan,
Arianna Blasi,
Kinga Bojarczuk,
Andrea Ciancone,
Natalija Gucevska,
Mark Harman,
Simon Schellaert,
Inna Harper,
Yue Jia,
Michał Królikowski,
Will Lewis,
Dragos Martac,
Rubmary Rojas,
Kate Ustiuzhanina
Abstract:
This paper reports the results of the deployment of Rich-State Simulated Populations at Meta for both automated and manual testing. We use simulated users (aka test users) to mimic user interactions and acquire state in much the same way that real user accounts acquire state. For automated testing, we present empirical results from deployment on the Facebook, Messenger, and Instagram apps for iOS…
▽ More
This paper reports the results of the deployment of Rich-State Simulated Populations at Meta for both automated and manual testing. We use simulated users (aka test users) to mimic user interactions and acquire state in much the same way that real user accounts acquire state. For automated testing, we present empirical results from deployment on the Facebook, Messenger, and Instagram apps for iOS and Android Platforms. These apps consist of tens of millions of lines of code, communicating with hundreds of millions of lines of backend code, and are used by over 2 billion people every day. Our results reveal that rich state increases average code coverage by 38\%, and endpoint coverage by 61\%. More importantly, it also yields an average increase of 115\% in the faults found by automated testing. The rich-state test user populations are also deployed in a (continually evolving) Test Universe; a web-enabled simulation platform for privacy-safe manual testing, which has been used by over 21,000 Meta engineers since its deployment in November 2022.
△ Less
Submitted 22 March, 2024;
originally announced March 2024.
-
Unraveling the Control Engineer's Craft with Neural Networks
Authors:
Braghadeesh Lakshminarayanan,
Federico Dettù,
Cristian R. Rojas,
Simone Formentin
Abstract:
Many industrial processes require suitable controllers to meet their performance requirements. More often, a sophisticated digital twin is available, which is a highly complex model that is a virtual representation of a given physical process, whose parameters may not be properly tuned to capture the variations in the physical process. In this paper, we present a sim2real, direct data-driven contr…
▽ More
Many industrial processes require suitable controllers to meet their performance requirements. More often, a sophisticated digital twin is available, which is a highly complex model that is a virtual representation of a given physical process, whose parameters may not be properly tuned to capture the variations in the physical process. In this paper, we present a sim2real, direct data-driven controller tuning approach, where the digital twin is used to generate input-output data and suitable controllers for several perturbations in its parameters. State-of-the art neural-network architectures are then used to learn the controller tuning rule that maps input-output data onto the controller parameters, based on artificially generated data from perturbed versions of the digital twin. In this way, as far as we are aware, we tackle for the first time the problem of re-calibrating the controller by meta-learning the tuning rule directly from data, thus practically replacing the control engineer with a machine learning model. The benefits of this methodology are illustrated via numerical simulations for several choices of neural-network architectures.
△ Less
Submitted 20 November, 2023;
originally announced November 2023.
-
How Charles Babbage invented the Computer
Authors:
Raul Rojas
Abstract:
This paper provides an overview of the successive stages in the development of Charles Babbage's Analytical Engine, based on the blueprints held in the Babbage Papers Archive, accessible online through the Science Museum in London. The first person to decipher these schematics was Allan Bromley, whose contributions in the 1980s and 1990s significantly advanced our understanding of Babbage's pionee…
▽ More
This paper provides an overview of the successive stages in the development of Charles Babbage's Analytical Engine, based on the blueprints held in the Babbage Papers Archive, accessible online through the Science Museum in London. The first person to decipher these schematics was Allan Bromley, whose contributions in the 1980s and 1990s significantly advanced our understanding of Babbage's pioneering work. The Science Museum's digitization of the Babbage Papers enables a chronological exploration of the evolution of Babbage's machines. The focus is on the Analytical Engine, shedding light on its lesser known but crucial transitional phases.
△ Less
Submitted 7 November, 2023;
originally announced November 2023.
-
Algorithms for Proportional Representation in Parliament in Divisor and Multiplicative Form
Authors:
Raul Rojas
Abstract:
We consider three algorithms for allocating parliamentary seats by proportional representation. The usual approach to describing such algorithms is to compute a quota of votes that each party uses to "acquire'' representatives. This kind of description follows a divisor method, since the number of representatives for a party is equal to the number of votes for that party, divided by the quota. We…
▽ More
We consider three algorithms for allocating parliamentary seats by proportional representation. The usual approach to describing such algorithms is to compute a quota of votes that each party uses to "acquire'' representatives. This kind of description follows a divisor method, since the number of representatives for a party is equal to the number of votes for that party, divided by the quota. We show that a simple multiplicative form with different rounding methods produces algorithms equivalent to the divisor methods. The multiplicative form is intuitive and easier to understand for a wider audience.
△ Less
Submitted 3 November, 2023;
originally announced November 2023.
-
Explainable Depression Detection via Head Motion Patterns
Authors:
Monika Gahalawat,
Raul Fernandez Rojas,
Tanaya Guha,
Ramanathan Subramanian,
Roland Goecke
Abstract:
While depression has been studied via multimodal non-verbal behavioural cues, head motion behaviour has not received much attention as a biomarker. This study demonstrates the utility of fundamental head-motion units, termed \emph{kinemes}, for depression detection by adopting two distinct approaches, and employing distinctive features: (a) discovering kinemes from head motion data corresponding t…
▽ More
While depression has been studied via multimodal non-verbal behavioural cues, head motion behaviour has not received much attention as a biomarker. This study demonstrates the utility of fundamental head-motion units, termed \emph{kinemes}, for depression detection by adopting two distinct approaches, and employing distinctive features: (a) discovering kinemes from head motion data corresponding to both depressed patients and healthy controls, and (b) learning kineme patterns only from healthy controls, and computing statistics derived from reconstruction errors for both the patient and control classes. Employing machine learning methods, we evaluate depression classification performance on the \emph{BlackDog} and \emph{AVEC2013} datasets. Our findings indicate that: (1) head motion patterns are effective biomarkers for detecting depressive symptoms, and (2) explanatory kineme patterns consistent with prior findings can be observed for the two classes. Overall, we achieve peak F1 scores of 0.79 and 0.82, respectively, over BlackDog and AVEC2013 for binary classification over episodic \emph{thin-slices}, and a peak F1 of 0.72 over videos for AVEC2013.
△ Less
Submitted 23 July, 2023;
originally announced July 2023.
-
DRCFS: Doubly Robust Causal Feature Selection
Authors:
Francesco Quinzan,
Ashkan Soleymani,
Patrick Jaillet,
Cristian R. Rojas,
Stefan Bauer
Abstract:
Knowing the features of a complex system that are highly relevant to a particular target variable is of fundamental interest in many areas of science. Existing approaches are often limited to linear settings, sometimes lack guarantees, and in most cases, do not scale to the problem at hand, in particular to images. We propose DRCFS, a doubly robust feature selection method for identifying the caus…
▽ More
Knowing the features of a complex system that are highly relevant to a particular target variable is of fundamental interest in many areas of science. Existing approaches are often limited to linear settings, sometimes lack guarantees, and in most cases, do not scale to the problem at hand, in particular to images. We propose DRCFS, a doubly robust feature selection method for identifying the causal features even in nonlinear and high dimensional settings. We provide theoretical guarantees, illustrate necessary conditions for our assumptions, and perform extensive experiments across a wide range of simulated and semi-synthetic datasets. DRCFS significantly outperforms existing state-of-the-art methods, selecting robust features even in challenging highly non-linear and high-dimensional problems.
△ Less
Submitted 5 July, 2023; v1 submitted 12 June, 2023;
originally announced June 2023.
-
Decentralized diffusion-based learning under non-parametric limited prior knowledge
Authors:
Paweł Wachel,
Krzysztof Kowalczyk,
Cristian R. Rojas
Abstract:
We study the problem of diffusion-based network learning of a nonlinear phenomenon, $m$, from local agents' measurements collected in a noisy environment. For a decentralized network and information spreading merely between directly neighboring nodes, we propose a non-parametric learning algorithm, that avoids raw data exchange and requires only mild \textit{a priori} knowledge about $m$. Non-asym…
▽ More
We study the problem of diffusion-based network learning of a nonlinear phenomenon, $m$, from local agents' measurements collected in a noisy environment. For a decentralized network and information spreading merely between directly neighboring nodes, we propose a non-parametric learning algorithm, that avoids raw data exchange and requires only mild \textit{a priori} knowledge about $m$. Non-asymptotic estimation error bounds are derived for the proposed method. Its potential applications are illustrated through simulation experiments.
△ Less
Submitted 5 May, 2023;
originally announced May 2023.
-
Diagnosing and Augmenting Feature Representations in Correctional Inverse Reinforcement Learning
Authors:
Inês Lourenço,
Andreea Bobu,
Cristian R. Rojas,
Bo Wahlberg
Abstract:
Robots have been increasingly better at doing tasks for humans by learning from their feedback, but still often suffer from model misalignment due to missing or incorrectly learned features. When the features the robot needs to learn to perform its task are missing or do not generalize well to new settings, the robot will not be able to learn the task the human wants and, even worse, may learn a c…
▽ More
Robots have been increasingly better at doing tasks for humans by learning from their feedback, but still often suffer from model misalignment due to missing or incorrectly learned features. When the features the robot needs to learn to perform its task are missing or do not generalize well to new settings, the robot will not be able to learn the task the human wants and, even worse, may learn a completely different and undesired behavior. Prior work shows how the robot can detect when its representation is missing some feature and can, thus, ask the human to be taught about the new feature; however, these works do not differentiate between features that are completely missing and those that exist but do not generalize to new environments. In the latter case, the robot would detect misalignment and simply learn a new feature, leading to an arbitrarily growing feature representation that can, in turn, lead to spurious correlations and incorrect learning down the line. In this work, we propose separating the two sources of misalignment: we propose a framework for determining whether a feature the robot needs is incorrectly learned and does not generalize to new environment setups vs. is entirely missing from the robot's representation. Once we detect the source of error, we show how the human can initiate the realignment process for the model: if the feature is missing, we follow prior work for learning new features; however, if the feature exists but does not generalize, we use data augmentation to expand its training and, thus, complete the correction. We demonstrate the proposed approach in experiments with a simulated 7DoF robot manipulator and physical human corrections.
△ Less
Submitted 13 April, 2023; v1 submitted 11 April, 2023;
originally announced April 2023.
-
Optimal Transport for Correctional Learning
Authors:
Rebecka Winqvist,
Inês Lourenco,
Francesco Quinzan,
Cristian R. Rojas,
Bo Wahlberg
Abstract:
The contribution of this paper is a generalized formulation of correctional learning using optimal transport, which is about how to optimally transport one mass distribution to another. Correctional learning is a framework developed to enhance the accuracy of parameter estimation processes by means of a teacher-student approach. In this framework, an expert agent, referred to as the teacher, modif…
▽ More
The contribution of this paper is a generalized formulation of correctional learning using optimal transport, which is about how to optimally transport one mass distribution to another. Correctional learning is a framework developed to enhance the accuracy of parameter estimation processes by means of a teacher-student approach. In this framework, an expert agent, referred to as the teacher, modifies the data used by a learning agent, known as the student, to improve its estimation process. The objective of the teacher is to alter the data such that the student's estimation error is minimized, subject to a fixed intervention budget. Compared to existing formulations of correctional learning, our novel optimal transport approach provides several benefits. It allows for the estimation of more complex characteristics as well as the consideration of multiple intervention policies for the teacher. We evaluate our approach on two theoretical examples, and on a human-robot interaction application in which the teacher's role is to improve the robots performance in an inverse reinforcement learning setting.
△ Less
Submitted 4 April, 2023;
originally announced April 2023.
-
The First Computer Program
Authors:
Raúl Rojas
Abstract:
In 1837, the first computer program in history was sketched by the renowned mathematician and inventor Charles Babbage. It was a program for the Analytical Engine. The program consists of a sequence of arithmetical operations and the necessary variable addresses (memory locations) of the arguments and the result, displayed in tabular fashion, like a program trace. The program computes the solution…
▽ More
In 1837, the first computer program in history was sketched by the renowned mathematician and inventor Charles Babbage. It was a program for the Analytical Engine. The program consists of a sequence of arithmetical operations and the necessary variable addresses (memory locations) of the arguments and the result, displayed in tabular fashion, like a program trace. The program computes the solutions for a system of two linear equations in two unknowns.
△ Less
Submitted 23 March, 2023;
originally announced March 2023.
-
Revisiting Syllables in Language Modelling and their Application on Low-Resource Machine Translation
Authors:
Arturo Oncevay,
Kervy Dante Rivas Rojas,
Liz Karen Chavez Sanchez,
Roberto Zariquiey
Abstract:
Language modelling and machine translation tasks mostly use subword or character inputs, but syllables are seldom used. Syllables provide shorter sequences than characters, require less-specialised extracting rules than morphemes, and their segmentation is not impacted by the corpus size. In this study, we first explore the potential of syllables for open-vocabulary language modelling in 21 langua…
▽ More
Language modelling and machine translation tasks mostly use subword or character inputs, but syllables are seldom used. Syllables provide shorter sequences than characters, require less-specialised extracting rules than morphemes, and their segmentation is not impacted by the corpus size. In this study, we first explore the potential of syllables for open-vocabulary language modelling in 21 languages. We use rule-based syllabification methods for six languages and address the rest with hyphenation, which works as a syllabification proxy. With a comparable perplexity, we show that syllables outperform characters and other subwords. Moreover, we study the importance of syllables on neural machine translation for a non-related and low-resource language-pair (Spanish--Shipibo-Konibo). In pairwise and multilingual systems, syllables outperform unsupervised subwords, and further morphological segmentation methods, when translating into a highly synthetic language with a transparent orthography (Shipibo-Konibo). Finally, we perform some human evaluation, and discuss limitations and opportunities.
△ Less
Submitted 5 October, 2022;
originally announced October 2022.
-
Measuring Cognitive Workload Using Multimodal Sensors
Authors:
Niraj Hirachan,
Anita Mathews,
Julio Romero,
Raul Fernandez Rojas
Abstract:
This study aims to identify a set of indicators to estimate cognitive workload using a multimodal sensing approach and machine learning. A set of three cognitive tests were conducted to induce cognitive workload in twelve participants at two levels of task difficulty (Easy and Hard). Four sensors were used to measure the participants' physiological change, including, Electrocardiogram (ECG), elect…
▽ More
This study aims to identify a set of indicators to estimate cognitive workload using a multimodal sensing approach and machine learning. A set of three cognitive tests were conducted to induce cognitive workload in twelve participants at two levels of task difficulty (Easy and Hard). Four sensors were used to measure the participants' physiological change, including, Electrocardiogram (ECG), electrodermal activity (EDA), respiration (RESP), and blood oxygen saturation (SpO2). To understand the perceived cognitive workload, NASA-TLX was used after each test and analysed using Chi-Square test. Three well-know classifiers (LDA, SVM, and DT) were trained and tested independently using the physiological data. The statistical analysis showed that participants' perceived cognitive workload was significantly different (p<0.001) between the tests, which demonstrated the validity of the experimental conditions to induce different cognitive levels. Classification results showed that a fusion of ECG and EDA presented good discriminating power (acc=0.74) for cognitive workload detection. This study provides preliminary results in the identification of a possible set of indicators of cognitive workload. Future work needs to be carried out to validate the indicators using more realistic scenarios and with a larger population.
△ Less
Submitted 5 May, 2022;
originally announced May 2022.
-
A Teacher-Student Markov Decision Process-based Framework for Online Correctional Learning
Authors:
Inês Lourenço,
Rebecka Winqvist,
Cristian R. Rojas,
Bo Wahlberg
Abstract:
A classical learning setting typically concerns an agent/student who collects data, or observations, from a system in order to estimate a certain property of interest. Correctional learning is a type of cooperative teacher-student framework where a teacher, who has partial knowledge about the system, has the ability to observe and alter (correct) the observations received by the student in order t…
▽ More
A classical learning setting typically concerns an agent/student who collects data, or observations, from a system in order to estimate a certain property of interest. Correctional learning is a type of cooperative teacher-student framework where a teacher, who has partial knowledge about the system, has the ability to observe and alter (correct) the observations received by the student in order to improve the accuracy of its estimate. In this paper, we show how the variance of the estimate of the student can be reduced with the help of the teacher. We formulate the corresponding online problem - where the teacher has to decide, at each time instant, whether or not to change the observations due to a limited budget - as a Markov decision process, from which the optimal policy is derived using dynamic programming. We validate the framework in numerical experiments, and compare the optimal online policy with the one from the batch setting.
△ Less
Submitted 29 March, 2022; v1 submitted 15 November, 2021;
originally announced November 2021.
-
Asymptotically Optimal Bandits under Weighted Information
Authors:
Matias I. Müller,
Cristian R. Rojas
Abstract:
We study the problem of regret minimization in a multi-armed bandit setup where the agent is allowed to play multiple arms at each round by spreading the resources usually allocated to only one arm. At each iteration the agent selects a normalized power profile and receives a Gaussian vector as outcome, where the unknown variance of each sample is inversely proportional to the power allocated to t…
▽ More
We study the problem of regret minimization in a multi-armed bandit setup where the agent is allowed to play multiple arms at each round by spreading the resources usually allocated to only one arm. At each iteration the agent selects a normalized power profile and receives a Gaussian vector as outcome, where the unknown variance of each sample is inversely proportional to the power allocated to that arm. The reward corresponds to a linear combination of the power profile and the outcomes, resembling a linear bandit. By spreading the power, the agent can choose to collect information much faster than in a traditional multi-armed bandit at the price of reducing the accuracy of the samples. This setup is fundamentally different from that of a linear bandit -- the regret is known to scale as $Θ(\sqrt{T})$ for linear bandits, while in this setup the agent receives a much more detailed feedback, for which we derive a tight $\log(T)$ problem-dependent lower-bound. We propose a Thompson-Sampling-based strategy, called Weighted Thompson Sampling (\WTS), that designs the power profile as its posterior belief of each arm being the best arm, and show that its upper bound matches the derived logarithmic lower bound. Finally, we apply this strategy to a problem of control and system identification, where the goal is to estimate the maximum gain (also called $\mathcal{H}_\infty$-norm) of a linear dynamical system based on batches of input-output samples.
△ Less
Submitted 28 May, 2021;
originally announced May 2021.
-
Pain Assessment based on fNIRS using Bidirectional LSTMs
Authors:
Raul Fernandez Rojas,
Julio Romero,
Jehu Lopez-Aparicio,
Keng-Liang Ou
Abstract:
Assessing pain in patients unable to speak (also called non-verbal patients) is extremely complicated and often is done by clinical judgement. However, this method is not reliable since patients vital signs can fluctuate significantly due to other underlying medical conditions. No objective diagnosis test exists to date that can assist medical practitioners in the diagnosis of pain. In this study…
▽ More
Assessing pain in patients unable to speak (also called non-verbal patients) is extremely complicated and often is done by clinical judgement. However, this method is not reliable since patients vital signs can fluctuate significantly due to other underlying medical conditions. No objective diagnosis test exists to date that can assist medical practitioners in the diagnosis of pain. In this study we propose the use of functional near-infrared spectroscopy (fNIRS) and deep learning for the assessment of human pain. The aim of this study is to explore the use deep learning to automatically learn features from fNIRS raw data to reduce the level of subjectivity and domain knowledge required in the design of hand-crafted features. Four deep learning models were evaluated, multilayer perceptron (MLP), forward and backward long short-term memory net-works (LSTM), and bidirectional LSTM. The results showed that the Bi-LSTM model achieved the highest accuracy (90.6%)and faster than the other three models. These results advance knowledge in pain assessment using neuroimaging as a method of diagnosis and represent a step closer to developing a physiologically based diagnosis of human pain that will benefit vulnerable populations who cannot self-report pain.
△ Less
Submitted 27 December, 2020; v1 submitted 24 December, 2020;
originally announced December 2020.
-
Revisiting Neural Language Modelling with Syllables
Authors:
Arturo Oncevay,
Kervy Rivas Rojas
Abstract:
Language modelling is regularly analysed at word, subword or character units, but syllables are seldom used. Syllables provide shorter sequences than characters, they can be extracted with rules, and their segmentation typically requires less specialised effort than identifying morphemes. We reconsider syllables for an open-vocabulary generation task in 20 languages. We use rule-based syllabificat…
▽ More
Language modelling is regularly analysed at word, subword or character units, but syllables are seldom used. Syllables provide shorter sequences than characters, they can be extracted with rules, and their segmentation typically requires less specialised effort than identifying morphemes. We reconsider syllables for an open-vocabulary generation task in 20 languages. We use rule-based syllabification methods for five languages and address the rest with a hyphenation tool, which behaviour as syllable proxy is validated. With a comparable perplexity, we show that syllables outperform characters, annotated morphemes and unsupervised subwords. Finally, we also study the overlapping of syllables concerning other subword pieces and discuss some limitations and opportunities.
△ Less
Submitted 24 October, 2020;
originally announced October 2020.
-
Efficient strategies for hierarchical text classification: External knowledge and auxiliary tasks
Authors:
Kervy Rivas Rojas,
Gina Bustamante,
Arturo Oncevay,
Marco A. Sobrevilla Cabezudo
Abstract:
In hierarchical text classification, we perform a sequence of inference steps to predict the category of a document from top to bottom of a given class taxonomy. Most of the studies have focused on developing novels neural network architectures to deal with the hierarchical structure, but we prefer to look for efficient ways to strengthen a baseline model. We first define the task as a sequence-to…
▽ More
In hierarchical text classification, we perform a sequence of inference steps to predict the category of a document from top to bottom of a given class taxonomy. Most of the studies have focused on developing novels neural network architectures to deal with the hierarchical structure, but we prefer to look for efficient ways to strengthen a baseline model. We first define the task as a sequence-to-sequence problem. Afterwards, we propose an auxiliary synthetic task of bottom-up-classification. Then, from external dictionaries, we retrieve textual definitions for the classes of all the hierarchy's layers, and map them into the word vector space. We use the class-definition embeddings as an additional input to condition the prediction of the next layer and in an adapted beam search. Whereas the modified search did not provide large gains, the combination of the auxiliary task and the additional input of class-definitions significantly enhance the classification accuracy. With our efficient approaches, we outperform previous studies, using a drastically reduced number of parameters, in two well-known English datasets.
△ Less
Submitted 22 May, 2020; v1 submitted 5 May, 2020;
originally announced May 2020.
-
Exploring Maximum Entropy Distributions with Evolutionary Algorithms
Authors:
Raul Rojas
Abstract:
This paper shows how to evolve numerically the maximum entropy probability distributions for a given set of constraints, which is a variational calculus problem. An evolutionary algorithm can obtain approximations to some well-known analytical results, but is even more flexible and can find distributions for which a closed formula cannot be readily stated. The numerical approach handles distributi…
▽ More
This paper shows how to evolve numerically the maximum entropy probability distributions for a given set of constraints, which is a variational calculus problem. An evolutionary algorithm can obtain approximations to some well-known analytical results, but is even more flexible and can find distributions for which a closed formula cannot be readily stated. The numerical approach handles distributions over finite intervals. We show that there are two ways of conducting the procedure: by direct optimization of the Lagrangian of the constrained problem, or by optimizing the entropy among the subset of distributions which fulfill the constraints. An incremental evolutionary strategy easily obtains the uniform, the exponential, the Gaussian, the log-normal, the Laplace, among other distributions, once the constrained problem is solved with any of the two methods. Solutions for mixed ("chimera") distributions can be also found. We explain why many of the distributions are symmetrical and continuous, but some are not.
△ Less
Submitted 5 February, 2020;
originally announced February 2020.
-
A Finite-Sample Deviation Bound for Stable Autoregressive Processes
Authors:
Rodrigo A. González,
Cristian R. Rojas
Abstract:
In this paper, we study non-asymptotic deviation bounds of the least squares estimator in Gaussian AR($n$) processes. By relying on martingale concentration inequalities and a tail-bound for $χ^2$ distributed variables, we provide a concentration bound for the sample covariance matrix of the process output. With this, we present a problem-dependent finite-time bound on the deviation probability of…
▽ More
In this paper, we study non-asymptotic deviation bounds of the least squares estimator in Gaussian AR($n$) processes. By relying on martingale concentration inequalities and a tail-bound for $χ^2$ distributed variables, we provide a concentration bound for the sample covariance matrix of the process output. With this, we present a problem-dependent finite-time bound on the deviation probability of any fixed linear combination of the estimated parameters of the AR$(n)$ process. We discuss extensions and limitations of our approach.
△ Less
Submitted 25 May, 2020; v1 submitted 17 December, 2019;
originally announced December 2019.
-
Bayesian Model Selection for Change Point Detection and Clustering
Authors:
Othmane Mazhar,
Cristian R. Rojas,
Carlo Fischione,
Mohammad R. Hesamzadeh
Abstract:
We address the new problem of estimating a piece-wise constant signal with the purpose of detecting its change points and the levels of clusters. Our approach is to model it as a nonparametric penalized least square model selection on a family of models indexed over the collection of partitions of the design points and propose a computationally efficient algorithm to approximately solve it. Statis…
▽ More
We address the new problem of estimating a piece-wise constant signal with the purpose of detecting its change points and the levels of clusters. Our approach is to model it as a nonparametric penalized least square model selection on a family of models indexed over the collection of partitions of the design points and propose a computationally efficient algorithm to approximately solve it. Statistically, minimizing such a penalized criterion yields an approximation to the maximum a posteriori probability (MAP) estimator. The criterion is then analyzed and an oracle inequality is derived using a Gaussian concentration inequality. The oracle inequality is used to derive on one hand conditions for consistency and on the other hand an adaptive upper bound on the expected square risk of the estimator, which statistically motivates our approximation. Finally, we apply our algorithm to simulated data to experimentally validate the statistical guarantees and illustrate its behavior.
△ Less
Submitted 3 December, 2019;
originally announced December 2019.
-
Testing Randomness in Quantum Mechanics
Authors:
Aldo C. Martínez,
Aldo Solís,
Rafael Díaz Hernández Rojas,
Alfred B. U'Ren,
Jorge G. Hirsch,
Isaac Pérez Castillo
Abstract:
Pseudo-random number generators are widely used in many branches of science, mainly in applications related to Monte Carlo methods, although they are deterministic in design and, therefore, unsuitable for tackling fundamental problems in security and cryptography. The natural laws of the microscopic realm provide a fairly simple method to generate non-deterministic sequences of random numbers, bas…
▽ More
Pseudo-random number generators are widely used in many branches of science, mainly in applications related to Monte Carlo methods, although they are deterministic in design and, therefore, unsuitable for tackling fundamental problems in security and cryptography. The natural laws of the microscopic realm provide a fairly simple method to generate non-deterministic sequences of random numbers, based on measurements of quantum states. In practice, however, the experimental devices on which quantum random number generators are based are often unable to pass some tests of randomness. In this review, we briefly discuss two such tests, point out the challenges that we have encountered and finally present a fairly simple method that successfully generates non-deterministic maximally random sequences.
△ Less
Submitted 19 October, 2018;
originally announced October 2018.
-
Dancing Honey bee Robot Elicits Dance-Following and Recruits Foragers
Authors:
Tim Landgraf,
David Bierbach,
Andreas Kirbach,
Rachel Cusing,
Michael Oertel,
Konstantin Lehmann,
Uwe Greggers,
Randolf Menzel,
Raúl Rojas
Abstract:
The honey bee dance communication system is one of the most popular examples of animal communication. Forager bees communicate the flight vector towards food, water, or resin sources to nestmates by performing a stereotypical motion pattern on the comb surface in the darkness of the hive. Bees that actively follow the circles of the dancer, so called dance-followers, may decode the message and fly…
▽ More
The honey bee dance communication system is one of the most popular examples of animal communication. Forager bees communicate the flight vector towards food, water, or resin sources to nestmates by performing a stereotypical motion pattern on the comb surface in the darkness of the hive. Bees that actively follow the circles of the dancer, so called dance-followers, may decode the message and fly according to the indicated vector that refers to the sun compass and their visual odometer. We investigated the dance communication system with a honeybee robot that reproduced the waggle dance pattern for a flight vector chosen by the experimenter. The dancing robot, called RoboBee, generated multiple cues contained in the biological dance pattern and elicited natural dance-following behavior in live bees. By tracking the flight trajectory of departing bees after following the dancing robot via harmonic radar we confirmed that bees used information obtained from the robotic dance to adjust their flight path. This is the first report on successful dance following and subsequent flight performance of bees recruited by a biomimetic robot.
△ Less
Submitted 19 March, 2018;
originally announced March 2018.
-
Automatic detection and decoding of honey bee waggle dances
Authors:
Fernando Wario,
Benjamin Wild,
Raúl Rojas,
Tim Landgraf
Abstract:
The waggle dance is one of the most popular examples of animal communication. Forager bees direct their nestmates to profitable resources via a complex motor display. Essentially, the dance encodes the polar coordinates to the resource in the field. Unemployed foragers follow the dancer's movements and then search for the advertised spots in the field. Throughout the last decades, biologists have…
▽ More
The waggle dance is one of the most popular examples of animal communication. Forager bees direct their nestmates to profitable resources via a complex motor display. Essentially, the dance encodes the polar coordinates to the resource in the field. Unemployed foragers follow the dancer's movements and then search for the advertised spots in the field. Throughout the last decades, biologists have employed different techniques to measure key characteristics of the waggle dance and decode the information it conveys. Early techniques involved the use of protractors and stopwatches to measure the dance orientation and duration directly from the observation hive. Recent approaches employ digital video recordings and manual measurements on screen. However, manual approaches are very time-consuming. Most studies, therefore, regard only small numbers of animals in short periods of time. We have developed a system capable of automatically detecting, decoding and mapping communication dances in real-time. In this paper, we describe our recording setup, the image processing steps performed for dance detection and decoding and an algorithm to map dances to the field. The proposed system performs with a detection accuracy of 90.07\%. The decoded waggle orientation has an average error of -2.92° ($\pm$ 7.37° ), well within the range of human error. To evaluate and exemplify the system's performance, a group of bees was trained to an artificial feeder, and all dances in the colony were automatically detected, decoded and mapped. The system presented here is the first of this kind made publicly available, including source code and hardware specifications. We hope this will foster quantitative analyses of the honey bee waggle dance.
△ Less
Submitted 4 December, 2017; v1 submitted 22 August, 2017;
originally announced August 2017.
-
Deepest Neural Networks
Authors:
Raul Rojas
Abstract:
This paper shows that a long chain of perceptrons (that is, a multilayer perceptron, or MLP, with many hidden layers of width one) can be a universal classifier. The classification procedure is not necessarily computationally efficient, but the technique throws some light on the kind of computations possible with narrow and deep MLPs.
This paper shows that a long chain of perceptrons (that is, a multilayer perceptron, or MLP, with many hidden layers of width one) can be a universal classifier. The classification procedure is not necessarily computationally efficient, but the technique throws some light on the kind of computations possible with narrow and deep MLPs.
△ Less
Submitted 9 July, 2017;
originally announced July 2017.
-
A Class of Nonconvex Penalties Preserving Overall Convexity in Optimization-Based Mean Filtering
Authors:
Mohammadreza Malek-Mohammadi,
Cristian R. Rojas,
Bo Wahlberg
Abstract:
$\ell_1$ mean filtering is a conventional, optimization-based method to estimate the positions of jumps in a piecewise constant signal perturbed by additive noise. In this method, the $\ell_1…
▽ More
$\ell_1$ mean filtering is a conventional, optimization-based method to estimate the positions of jumps in a piecewise constant signal perturbed by additive noise. In this method, the $\ell_1$ norm penalizes sparsity of the first-order derivative of the signal. Theoretical results, however, show that in some situations, which can occur frequently in practice, even when the jump amplitudes tend to $\infty$, the conventional method identifies false change points. This issue is referred to as stair-casing problem and restricts practical importance of $\ell_1$ mean filtering. In this paper, sparsity is penalized more tightly than the $\ell_1$ norm by exploiting a certain class of nonconvex functions, while the strict convexity of the consequent optimization problem is preserved. This results in a higher performance in detecting change points. To theoretically justify the performance improvements over $\ell_1$ mean filtering, deterministic and stochastic sufficient conditions for exact change point recovery are derived. In particular, theoretical results show that in the stair-casing problem, our approach might be able to exclude the false change points, while $\ell_1$ mean filtering may fail. A number of numerical simulations assist to show superiority of our method over $\ell_1$ mean filtering and another state-of-the-art algorithm that promotes sparsity tighter than the $\ell_1$ norm. Specifically, it is shown that our approach can consistently detect change points when the jump amplitudes become sufficiently large, while the two other competitors cannot.
△ Less
Submitted 22 April, 2016;
originally announced April 2016.
-
The Design Principles of Konrad Zuse's Mechanical Computers
Authors:
Raul Rojas
Abstract:
Konrad Zuse built the Z1, a mechanical programmable computing machine, between 1935/36 and 1937/38. The Z1 was a binary floating-point computing device. The individual logical gates were constructed using metallic plates and interconnection rods. This paper describes the design principles Zuse followed in order to complete a complex calculating machine, as the Z1 was. Zuse called his basic switchi…
▽ More
Konrad Zuse built the Z1, a mechanical programmable computing machine, between 1935/36 and 1937/38. The Z1 was a binary floating-point computing device. The individual logical gates were constructed using metallic plates and interconnection rods. This paper describes the design principles Zuse followed in order to complete a complex calculating machine, as the Z1 was. Zuse called his basic switching elements "mechanical relays" in analogy to the electrical relays used in telephony.
△ Less
Submitted 8 March, 2016;
originally announced March 2016.
-
Estimator Selection: End-Performance Metric Aspects
Authors:
Dimitrios Katselis,
Cristian R. Rojas,
Carolyn L. Beck
Abstract:
Recently, a framework for application-oriented optimal experiment design has been introduced. In this context, the distance of the estimated system from the true one is measured in terms of a particular end-performance metric. This treatment leads to superior unknown system estimates to classical experiment designs based on usual pointwise functional distances of the estimated system from the true…
▽ More
Recently, a framework for application-oriented optimal experiment design has been introduced. In this context, the distance of the estimated system from the true one is measured in terms of a particular end-performance metric. This treatment leads to superior unknown system estimates to classical experiment designs based on usual pointwise functional distances of the estimated system from the true one. The separation of the system estimator from the experiment design is done within this new framework by choosing and fixing the estimation method to either a maximum likelihood (ML) approach or a Bayesian estimator such as the minimum mean square error (MMSE). Since the MMSE estimator delivers a system estimate with lower mean square error (MSE) than the ML estimator for finite-length experiments, it is usually considered the best choice in practice in signal processing and control applications. Within the application-oriented framework a related meaningful question is: Are there end-performance metrics for which the ML estimator outperforms the MMSE when the experiment is finite-length? In this paper, we affirmatively answer this question based on a simple linear Gaussian regression example.
△ Less
Submitted 26 July, 2015;
originally announced July 2015.
-
Evaluation of Spectral Learning for the Identification of Hidden Markov Models
Authors:
Robert Mattila,
Cristian R. Rojas,
Bo Wahlberg
Abstract:
Hidden Markov models have successfully been applied as models of discrete time series in many fields. Often, when applied in practice, the parameters of these models have to be estimated. The currently predominating identification methods, such as maximum-likelihood estimation and especially expectation-maximization, are iterative and prone to have problems with local minima. A non-iterative metho…
▽ More
Hidden Markov models have successfully been applied as models of discrete time series in many fields. Often, when applied in practice, the parameters of these models have to be estimated. The currently predominating identification methods, such as maximum-likelihood estimation and especially expectation-maximization, are iterative and prone to have problems with local minima. A non-iterative method employing a spectral subspace-like approach has recently been proposed in the machine learning literature. This paper evaluates the performance of this algorithm, and compares it to the performance of the expectation-maximization algorithm, on a number of numerical examples. We find that the performance is mixed; it successfully identifies some systems with relatively few available observations, but fails completely for some systems even when a large amount of observations is available. An open question is how this discrepancy can be explained. We provide some indications that it could be related to how well-conditioned some system parameters are.
△ Less
Submitted 22 July, 2015;
originally announced July 2015.
-
Successive Concave Sparsity Approximation for Compressed Sensing
Authors:
Mohammadreza Malek-Mohammadi,
Ali Koochakzadeh,
Massoud Babaie-Zadeh,
Magnus Jansson,
Cristian R. Rojas
Abstract:
In this paper, based on a successively accuracy-increasing approximation of the $\ell_0$ norm, we propose a new algorithm for recovery of sparse vectors from underdetermined measurements. The approximations are realized with a certain class of concave functions that aggressively induce sparsity and their closeness to the $\ell_0$ norm can be controlled. We prove that the series of the approximatio…
▽ More
In this paper, based on a successively accuracy-increasing approximation of the $\ell_0$ norm, we propose a new algorithm for recovery of sparse vectors from underdetermined measurements. The approximations are realized with a certain class of concave functions that aggressively induce sparsity and their closeness to the $\ell_0$ norm can be controlled. We prove that the series of the approximations asymptotically coincides with the $\ell_1$ and $\ell_0$ norms when the approximation accuracy changes from the worst fitting to the best fitting. When measurements are noise-free, an optimization scheme is proposed which leads to a number of weighted $\ell_1$ minimization programs, whereas, in the presence of noise, we propose two iterative thresholding methods that are computationally appealing. A convergence guarantee for the iterative thresholding method is provided, and, for a particular function in the class of the approximating functions, we derive the closed-form thresholding operator. We further present some theoretical analyses via the restricted isometry, null space, and spherical section properties. Our extensive numerical simulations indicate that the proposed algorithm closely follows the performance of the oracle estimator for a range of sparsity levels wider than those of the state-of-the-art algorithms.
△ Less
Submitted 26 April, 2016; v1 submitted 26 May, 2015;
originally announced May 2015.
-
Upper Bounds on the Error of Sparse Vector and Low-Rank Matrix Recovery
Authors:
Mohammadreza Malek-Mohammadi,
Cristian R. Rojas,
Magnus Jansson,
Massoud Babaie-Zadeh
Abstract:
Suppose that a solution $\widetilde{\mathbf{x}}$ to an underdetermined linear system $\mathbf{b} = \mathbf{A} \mathbf{x}$ is given. $\widetilde{\mathbf{x}}$ is approximately sparse meaning that it has a few large components compared to other small entries. However, the total number of nonzero components of $\widetilde{\mathbf{x}}$ is large enough to violate any condition for the uniqueness of the…
▽ More
Suppose that a solution $\widetilde{\mathbf{x}}$ to an underdetermined linear system $\mathbf{b} = \mathbf{A} \mathbf{x}$ is given. $\widetilde{\mathbf{x}}$ is approximately sparse meaning that it has a few large components compared to other small entries. However, the total number of nonzero components of $\widetilde{\mathbf{x}}$ is large enough to violate any condition for the uniqueness of the sparsest solution. On the other hand, if only the dominant components are considered, then it will satisfy the uniqueness conditions. One intuitively expects that $\widetilde{\mathbf{x}}$ should not be far from the true sparse solution $\mathbf{x}_0$. We show that this intuition is the case by providing an upper bound on $\| \widetilde{\mathbf{x}} - \mathbf{x}_0\|$ which is a function of the magnitudes of small components of $\widetilde{\mathbf{x}}$ but independent from $\mathbf{x}_0$. This result is extended to the case that $\mathbf{b}$ is perturbed by noise. Additionally, we generalize the upper bounds to the low-rank matrix recovery problem.
△ Less
Submitted 26 June, 2015; v1 submitted 13 April, 2015;
originally announced April 2015.
-
A Tutorial Introduction to the Lambda Calculus
Authors:
Raul Rojas
Abstract:
This paper is a concise and painless introduction to the $λ$-calculus. This formalism was developed by Alonzo Church as a tool for studying the mathematical properties of effectively computable functions. The formalism became popular and has provided a strong theoretical foundation for the family of functional programming languages. This tutorial shows how to perform arithmetical and logical compu…
▽ More
This paper is a concise and painless introduction to the $λ$-calculus. This formalism was developed by Alonzo Church as a tool for studying the mathematical properties of effectively computable functions. The formalism became popular and has provided a strong theoretical foundation for the family of functional programming languages. This tutorial shows how to perform arithmetical and logical computations using the $λ$-calculus and how to define recursive functions, even though $λ$-calculus functions are unnamed and thus cannot refer explicitly to themselves.
△ Less
Submitted 27 March, 2015;
originally announced March 2015.
-
Bayesian Learning for Low-Rank matrix reconstruction
Authors:
Martin Sundin,
Cristian R. Rojas,
Magnus Jansson,
Saikat Chatterjee
Abstract:
We develop latent variable models for Bayesian learning based low-rank matrix completion and reconstruction from linear measurements. For under-determined systems, the developed methods are shown to reconstruct low-rank matrices when neither the rank nor the noise power is known a-priori. We derive relations between the latent variable models and several low-rank promoting penalty functions. The r…
▽ More
We develop latent variable models for Bayesian learning based low-rank matrix completion and reconstruction from linear measurements. For under-determined systems, the developed methods are shown to reconstruct low-rank matrices when neither the rank nor the noise power is known a-priori. We derive relations between the latent variable models and several low-rank promoting penalty functions. The relations justify the use of Kronecker structured covariance matrices in a Gaussian based prior. In the methods, we use evidence approximation and expectation-maximization to learn the model parameters. The performance of the methods is evaluated through extensive numerical simulations.
△ Less
Submitted 23 January, 2015;
originally announced January 2015.
-
Alternating Strategies Are Good For Low-Rank Matrix Reconstruction
Authors:
Kezhi Li,
Martin Sundin,
Cristian R. Rojas,
Saikat Chatterjee,
Magnus Jansson
Abstract:
This article focuses on the problem of reconstructing low-rank matrices from underdetermined measurements using alternating optimization strategies. We endeavour to combine an alternating least-squares based estimation strategy with ideas from the alternating direction method of multipliers (ADMM) to recover structured low-rank matrices, such as Hankel structure. We show that merging these two alt…
▽ More
This article focuses on the problem of reconstructing low-rank matrices from underdetermined measurements using alternating optimization strategies. We endeavour to combine an alternating least-squares based estimation strategy with ideas from the alternating direction method of multipliers (ADMM) to recover structured low-rank matrices, such as Hankel structure. We show that merging these two alternating strategies leads to a better performance than the existing alternating least squares (ALS) strategy. The performance is evaluated via numerical simulations.
△ Less
Submitted 12 July, 2014;
originally announced July 2014.
-
Relevance Singular Vector Machine for low-rank matrix sensing
Authors:
Martin Sundin,
Saikat Chatterjee,
Magnus Jansson,
Cristian R. Rojas
Abstract:
In this paper we develop a new Bayesian inference method for low rank matrix reconstruction. We call the new method the Relevance Singular Vector Machine (RSVM) where appropriate priors are defined on the singular vectors of the underlying matrix to promote low rank. To accelerate computations, a numerically efficient approximation is developed. The proposed algorithms are applied to matrix comple…
▽ More
In this paper we develop a new Bayesian inference method for low rank matrix reconstruction. We call the new method the Relevance Singular Vector Machine (RSVM) where appropriate priors are defined on the singular vectors of the underlying matrix to promote low rank. To accelerate computations, a numerically efficient approximation is developed. The proposed algorithms are applied to matrix completion and matrix reconstruction problems and their performance is studied numerically.
△ Less
Submitted 30 June, 2014;
originally announced July 2014.
-
The Z1: Architecture and Algorithms of Konrad Zuse's First Computer
Authors:
Raul Rojas
Abstract:
This paper provides the first comprehensive description of the Z1, the mechanical computer built by the German inventor Konrad Zuse in Berlin from 1936 to 1938. The paper describes the main structural elements of the machine, the high-level architecture, and the dataflow between components. The computer could perform the four basic arithmetic operations using floating-point numbers. Instructions w…
▽ More
This paper provides the first comprehensive description of the Z1, the mechanical computer built by the German inventor Konrad Zuse in Berlin from 1936 to 1938. The paper describes the main structural elements of the machine, the high-level architecture, and the dataflow between components. The computer could perform the four basic arithmetic operations using floating-point numbers. Instructions were read from punched tape. A program consisted of a sequence of arithmetical operations, intermixed with memory store and load instructions, interrupted possibly by input and output operations. Numbers were stored in a mechanical memory. The machine did not include conditional branching in the instruction set. While the architecture of the Z1 is similar to the relay computer Zuse finished in 1941 (the Z3) there are some significant differences. The Z1 implements operations as sequences of microinstructions, as in the Z3, but does not use rotary switches as micro-steppers. The Z1 uses a digital incrementer and a set of conditions which are translated into microinstructions for the exponent and mantissa units, as well as for the memory blocks. Microinstructions select one out of 12 layers in a machine with a 3D mechanical structure of binary mechanical elements. The exception circuits for mantissa zero, necessary for normalized floating-point, were lacking; they were first implemented in the Z3. The information for this article was extracted from careful study of the blueprints drawn by Zuse for the reconstruction of the Z1 for the German Technology Museum in Berlin, from some letters, and from sketches in notebooks. Although the machine has been in exhibition since 1989 (non-operational), no detailed high-level description of the machine's architecture had been available. This paper fills that gap.
△ Less
Submitted 7 June, 2014;
originally announced June 2014.
-
Piecewise Toeplitz Matrices-based Sensing for Rank Minimization
Authors:
Kezhi Li,
Cristian R. Rojas,
Saikat Chatterjee,
Håkan Hjalmarsson
Abstract:
This paper proposes a set of piecewise Toeplitz matrices as the linear mapping/sensing operator $\mathcal{A}: \mathbf{R}^{n_1 \times n_2} \rightarrow \mathbf{R}^M$ for recovering low rank matrices from few measurements. We prove that such operators efficiently encode the information so there exists a unique reconstruction matrix under mild assumptions. This work provides a significant extension of…
▽ More
This paper proposes a set of piecewise Toeplitz matrices as the linear mapping/sensing operator $\mathcal{A}: \mathbf{R}^{n_1 \times n_2} \rightarrow \mathbf{R}^M$ for recovering low rank matrices from few measurements. We prove that such operators efficiently encode the information so there exists a unique reconstruction matrix under mild assumptions. This work provides a significant extension of the compressed sensing and rank minimization theory, and it achieves a tradeoff between reducing the memory required for storing the sampling operator from $\mathcal{O}(n_1n_2M)$ to $\mathcal{O}(\max(n_1,n_2)M)$ but at the expense of increasing the number of measurements by $r$. Simulation results show that the proposed operator can recover low rank matrices efficiently with a reconstruction performance close to the cases of using random unstructured operators.
△ Less
Submitted 1 June, 2014;
originally announced June 2014.
-
On the Design of Channel Estimators for given Signal Estimators and Detectors
Authors:
Dimitrios Katselis,
Cristian R. Rojas,
Håkan Hjalmarsson,
Mats Bengtsson,
Mikael Skoglund
Abstract:
The fundamental task of a digital receiver is to decide the transmitted symbols in the best possible way, i.e., with respect to an appropriately defined performance metric. Examples of usual performance metrics are the probability of error and the Mean Square Error (MSE) of a symbol estimator. In a coherent receiver, the symbol decisions are made based on the use of a channel estimate. This paper…
▽ More
The fundamental task of a digital receiver is to decide the transmitted symbols in the best possible way, i.e., with respect to an appropriately defined performance metric. Examples of usual performance metrics are the probability of error and the Mean Square Error (MSE) of a symbol estimator. In a coherent receiver, the symbol decisions are made based on the use of a channel estimate. This paper focuses on examining the optimality of usual estimators such as the minimum variance unbiased (MVU) and the minimum mean square error (MMSE) estimators for these metrics and on proposing better estimators whenever it is necessary. For illustration purposes, this study is performed on a toy channel model, namely a single input single output (SISO) flat fading channel with additive white Gaussian noise (AWGN). In this way, this paper highlights the design dependencies of channel estimators on target performance metrics.
△ Less
Submitted 18 March, 2013;
originally announced March 2013.
-
Training Sequence Design for MIMO Channels: An Application-Oriented Approach
Authors:
Dimitrios Katselis,
Cristian R. Rojas,
Mats Bengtsson,
Emil Björnson,
Xavier Bombois,
Nafiseh Shariati,
Magnus Jansson,
Håkan Hjalmarsson
Abstract:
In this paper, the problem of training optimization for estimating a multiple-input multiple-output (MIMO) flat fading channel in the presence of spatially and temporally correlated Gaussian noise is studied in an application-oriented setup. So far, the problem of MIMO channel estimation has mostly been treated within the context of minimizing the mean square error (MSE) of the channel estimate su…
▽ More
In this paper, the problem of training optimization for estimating a multiple-input multiple-output (MIMO) flat fading channel in the presence of spatially and temporally correlated Gaussian noise is studied in an application-oriented setup. So far, the problem of MIMO channel estimation has mostly been treated within the context of minimizing the mean square error (MSE) of the channel estimate subject to various constraints, such as an upper bound on the available training energy. We introduce a more general framework for the task of training sequence design in MIMO systems, which can treat not only the minimization of channel estimator's MSE, but also the optimization of a final performance metric of interest related to the use of the channel estimate in the communication system. First, we show that the proposed framework can be used to minimize the training energy budget subject to a quality constraint on the MSE of the channel estimator. A deterministic version of the "dual" problem is also provided. We then focus on four specific applications, where the training sequence can be optimized with respect to the classical channel estimation MSE, a weighted channel estimation MSE and the MSE of the equalization error due to the use of an equalizer at the receiver or an appropriate linear precoder at the transmitter. In this way, the intended use of the channel estimate is explicitly accounted for. The superiority of the proposed designs over existing methods is demonstrated via numerical simulations.
△ Less
Submitted 16 January, 2013;
originally announced January 2013.