Skip to main content

Showing 1–47 of 47 results for author: Moreno, P

Searching in archive cs. Search in all archives.
.
  1. arXiv:2507.05029  [pdf, ps, other

    cs.CV

    Estimating Object Physical Properties from RGB-D Vision and Depth Robot Sensors Using Deep Learning

    Authors: Ricardo Cardoso, Plinio Moreno

    Abstract: Inertial mass plays a crucial role in robotic applications such as object grasping, manipulation, and simulation, providing a strong prior for planning and control. Accurately estimating an object's mass before interaction can significantly enhance the performance of various robotic tasks. However, mass estimation using only vision sensors is a relatively underexplored area. This paper proposes a… ▽ More

    Submitted 7 July, 2025; originally announced July 2025.

  2. arXiv:2504.16183  [pdf, other

    cs.RO

    Measuring Uncertainty in Shape Completion to Improve Grasp Quality

    Authors: Nuno Ferreira Duarte, Seyed S. Mohammadi, Plinio Moreno, Alessio Del Bue, Jose Santos-Victor

    Abstract: Shape completion networks have been used recently in real-world robotic experiments to complete the missing/hidden information in environments where objects are only observed in one or few instances where self-occlusions are bound to occur. Nowadays, most approaches rely on deep neural networks that handle rich 3D point cloud data that lead to more precise and realistic object geometries. However,… ▽ More

    Submitted 22 April, 2025; originally announced April 2025.

    Comments: 7 pages, 5 figures

  3. arXiv:2503.23050  [pdf, other

    cs.LG cs.CV

    Prediction of 30-day hospital readmission with clinical notes and EHR information

    Authors: Tiago Almeida, Plinio Moreno, Catarina Barata

    Abstract: High hospital readmission rates are associated with significant costs and health risks for patients. Therefore, it is critical to develop predictive models that can support clinicians to determine whether or not a patient will return to the hospital in a relatively short period of time (e.g, 30-days). Nowadays, it is possible to collect both structured (electronic health records - EHR) and unstruc… ▽ More

    Submitted 29 March, 2025; originally announced March 2025.

  4. arXiv:2503.00193  [pdf, other

    cs.RO cs.LG eess.SY

    ProDapt: Proprioceptive Adaptation using Long-term Memory Diffusion

    Authors: Federico Pizarro Bejarano, Bryson Jones, Daniel Pastor Moreno, Joseph Bowkett, Paul G. Backes, Angela P. Schoellig

    Abstract: Diffusion models have revolutionized imitation learning, allowing robots to replicate complex behaviours. However, diffusion often relies on cameras and other exteroceptive sensors to observe the environment and lacks long-term memory. In space, military, and underwater applications, robots must be highly robust to failures in exteroceptive sensors, operating using only proprioceptive information.… ▽ More

    Submitted 28 February, 2025; originally announced March 2025.

    Comments: 7 pages, 8 figures. Accepted to IEEE ICRA 2025. Code is publicly available at https://github.com/Federico-PizarroBejarano/prodapt

  5. arXiv:2411.01734  [pdf, other

    cs.CV

    Next Best View For Point-Cloud Model Acquisition: Bayesian Approximation and Uncertainty Analysis

    Authors: Madalena Caldeira, Plinio Moreno

    Abstract: The Next Best View problem is a computer vision problem widely studied in robotics. To solve it, several methodologies have been proposed over the years. Some, more recently, propose the use of deep learning models. Predictions obtained with the help of deep learning models naturally have some uncertainty associated with them. Despite this, the standard models do not allow for their quantification… ▽ More

    Submitted 3 November, 2024; originally announced November 2024.

  6. arXiv:2410.06572  [pdf, other

    cs.SD cs.CR cs.LG

    Can DeepFake Speech be Reliably Detected?

    Authors: Hongbin Liu, Youzheng Chen, Arun Narayanan, Athula Balachandran, Pedro J. Moreno, Lun Wang

    Abstract: Recent advances in text-to-speech (TTS) systems, particularly those with voice cloning capabilities, have made voice impersonation readily accessible, raising ethical and legal concerns due to potential misuse for malicious activities like misinformation campaigns and fraud. While synthetic speech detectors (SSDs) exist to combat this, they are vulnerable to ``test domain shift", exhibiting decrea… ▽ More

    Submitted 9 October, 2024; originally announced October 2024.

  7. arXiv:2404.10836  [pdf, other

    cs.CV eess.IV

    Semantic-Based Active Perception for Humanoid Visual Tasks with Foveal Sensors

    Authors: João Luzio, Alexandre Bernardino, Plinio Moreno

    Abstract: The aim of this work is to establish how accurately a recent semantic-based foveal active perception model is able to complete visual tasks that are regularly performed by humans, namely, scene exploration and visual search. This model exploits the ability of current object detectors to localize and classify a large number of object classes and to update a semantic description of a scene across mu… ▽ More

    Submitted 16 April, 2024; originally announced April 2024.

  8. arXiv:2403.19840  [pdf, other

    cs.RO

    Pose-free object classification from surface contact features in sequences of Robotic grasps

    Authors: Teresa Alves, Alexandre Bernardino, Plinio Moreno

    Abstract: In this work, we propose two cost efficient methods for object identification, using a multi-fingered robotic hand equipped with proprioceptive sensing. Both methods are trained on known objects and rely on a limited set of features, obtained during a few grasps on an object. Contrary to most methods in the literature, our methods do not rely on the knowledge of the relative pose between object an… ▽ More

    Submitted 28 March, 2024; originally announced March 2024.

  9. arXiv:2402.17184  [pdf, other

    cs.CL cs.SD eess.AS

    Extreme Encoder Output Frame Rate Reduction: Improving Computational Latencies of Large End-to-End Models

    Authors: Rohit Prabhavalkar, Zhong Meng, Weiran Wang, Adam Stooke, Xingyu Cai, Yanzhang He, Arun Narayanan, Dongseong Hwang, Tara N. Sainath, Pedro J. Moreno

    Abstract: The accuracy of end-to-end (E2E) automatic speech recognition (ASR) models continues to improve as they are scaled to larger sizes, with some now reaching billions of parameters. Widespread deployment and adoption of these models, however, requires computationally efficient strategies for decoding. In the present work, we study one such strategy: applying multiple frame reduction layers in the enc… ▽ More

    Submitted 26 February, 2024; originally announced February 2024.

    Comments: Accepted to 2024 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2024)

  10. arXiv:2310.10553  [pdf, other

    cs.LG cs.MA stat.ML

    TacticAI: an AI assistant for football tactics

    Authors: Zhe Wang, Petar Veličković, Daniel Hennes, Nenad Tomašev, Laurel Prince, Michael Kaisers, Yoram Bachrach, Romuald Elie, Li Kevin Wenliang, Federico Piccinini, William Spearman, Ian Graham, Jerome Connor, Yi Yang, Adrià Recasens, Mina Khan, Nathalie Beauguerlange, Pablo Sprechmann, Pol Moreno, Nicolas Heess, Michael Bowling, Demis Hassabis, Karl Tuyls

    Abstract: Identifying key patterns of tactics implemented by rival teams, and developing effective responses, lies at the heart of modern football. However, doing so algorithmically remains an open research challenge. To address this unmet need, we propose TacticAI, an AI football tactics assistant developed and evaluated in close collaboration with domain experts from Liverpool FC. We focus on analysing co… ▽ More

    Submitted 17 October, 2023; v1 submitted 16 October, 2023; originally announced October 2023.

    Comments: 32 pages, 10 figures

  11. arXiv:2310.09916  [pdf, other

    cs.RO

    Socially reactive navigation models for mobile robots in dynamic environments

    Authors: Ricarte Ribeiro, Plinio Moreno

    Abstract: The objective of this work is to expand upon previous works, considering socially acceptable behaviours within robot navigation and interaction, and allow a robot to closely approach static and dynamic individuals or groups. The space models developed in this dissertation are adaptive, that is, capable of changing over time to accommodate the changing circumstances often existent within a social e… ▽ More

    Submitted 15 October, 2023; originally announced October 2023.

  12. arXiv:2306.08133  [pdf, ps, other

    eess.AS cs.CL

    Large-scale Language Model Rescoring on Long-form Data

    Authors: Tongzhou Chen, Cyril Allauzen, Yinghui Huang, Daniel Park, David Rybach, W. Ronny Huang, Rodrigo Cabrera, Kartik Audhkhasi, Bhuvana Ramabhadran, Pedro J. Moreno, Michael Riley

    Abstract: In this work, we study the impact of Large-scale Language Models (LLM) on Automated Speech Recognition (ASR) of YouTube videos, which we use as a source for long-form ASR. We demonstrate up to 8\% relative reduction in Word Error Eate (WER) on US English (en-us) and code-switched Indian English (en-in) long-form ASR test sets and a reduction of up to 30\% relative on Salient Term Error Rate (STER)… ▽ More

    Submitted 5 September, 2023; v1 submitted 13 June, 2023; originally announced June 2023.

    Comments: 5 pages, accepted in ICASSP 2023

    Journal ref: ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

  13. arXiv:2304.05741  [pdf, other

    cs.CV

    Learning to search for and detect objects in foveal images using deep learning

    Authors: Beatriz Paula, Plinio Moreno

    Abstract: The human visual system processes images with varied degrees of resolution, with the fovea, a small portion of the retina, capturing the highest acuity region, which gradually declines toward the field of view's periphery. However, the majority of existing object localization methods rely on images acquired by image sensors with space-invariant resolution, ignoring biological attention mechanisms.… ▽ More

    Submitted 12 April, 2023; originally announced April 2023.

  14. Force Feedback Control For Dexterous Robotic Hands Using Conditional Postural Synergies

    Authors: Dimitrios Dimou, Jose Santos-Victor, Plinio Moreno

    Abstract: We present a force feedback controller for a dexterous robotic hand equipped with force sensors on its fingertips. Our controller uses the conditional postural synergies framework to generate the grasp postures, i.e. the finger configuration of the robot, at each time step based on forces measured on the robot's fingertips. Using this framework we are able to control the hand during different gras… ▽ More

    Submitted 10 March, 2023; originally announced March 2023.

  15. arXiv:2303.01037  [pdf, other

    cs.CL cs.SD eess.AS

    Google USM: Scaling Automatic Speech Recognition Beyond 100 Languages

    Authors: Yu Zhang, Wei Han, James Qin, Yongqiang Wang, Ankur Bapna, Zhehuai Chen, Nanxin Chen, Bo Li, Vera Axelrod, Gary Wang, Zhong Meng, Ke Hu, Andrew Rosenberg, Rohit Prabhavalkar, Daniel S. Park, Parisa Haghani, Jason Riesa, Ginger Perng, Hagen Soltau, Trevor Strohman, Bhuvana Ramabhadran, Tara Sainath, Pedro Moreno, Chung-Cheng Chiu, Johan Schalkwyk , et al. (2 additional authors not shown)

    Abstract: We introduce the Universal Speech Model (USM), a single large model that performs automatic speech recognition (ASR) across 100+ languages. This is achieved by pre-training the encoder of the model on a large unlabeled multilingual dataset of 12 million (M) hours spanning over 300 languages, and fine-tuning on a smaller labeled dataset. We use multilingual pre-training with random-projection quant… ▽ More

    Submitted 24 September, 2023; v1 submitted 2 March, 2023; originally announced March 2023.

    Comments: 20 pages, 7 figures, 8 tables

  16. arXiv:2302.06520  [pdf, other

    cs.DC

    Releasing Memory with Optimistic Access: A Hybrid Approach to Memory Reclamation and Allocation in Lock-Free Programs

    Authors: Pedro Moreno, Ricardo Rocha

    Abstract: Lock-free data structures are an important tool for the development of concurrent programs as they provide scalability, low latency and avoid deadlocks, livelocks and priority inversion. However, they require some sort of additional support to guarantee memory reclamation. The Optimistic Access (OA) method has most of the desired properties for memory reclamation, but since it allows memory to be… ▽ More

    Submitted 13 February, 2023; originally announced February 2023.

  17. arXiv:2301.05747  [pdf, other

    cs.CV cs.AI

    Laser: Latent Set Representations for 3D Generative Modeling

    Authors: Pol Moreno, Adam R. Kosiorek, Heiko Strathmann, Daniel Zoran, Rosalia G. Schneider, Björn Winckler, Larisa Markeeva, Théophane Weber, Danilo J. Rezende

    Abstract: NeRF provides unparalleled fidelity of novel view synthesis: rendering a 3D scene from an arbitrary viewpoint. NeRF requires training on a large number of views that fully cover a scene, which limits its applicability. While these issues can be addressed by learning a prior over scenes in various forms, previous approaches have been either applied to overly simple scenes or struggling to render un… ▽ More

    Submitted 13 January, 2023; originally announced January 2023.

    Comments: See https://laser-nv-paper.github.io/ for video results

  18. arXiv:2301.00866  [pdf, other

    cs.RO cs.AI

    3DSGrasp: 3D Shape-Completion for Robotic Grasp

    Authors: Seyed S. Mohammadi, Nuno F. Duarte, Dimitris Dimou, Yiming Wang, Matteo Taiana, Pietro Morerio, Atabak Dehban, Plinio Moreno, Alexandre Bernardino, Alessio Del Bue, Jose Santos-Victor

    Abstract: Real-world robotic grasping can be done robustly if a complete 3D Point Cloud Data (PCD) of an object is available. However, in practice, PCDs are often incomplete when objects are viewed from few and sparse viewpoints before the grasping action, leading to the generation of wrong or inaccurate grasp poses. We propose a novel grasping strategy, named 3DSGrasp, that predicts the missing geometry fr… ▽ More

    Submitted 2 January, 2023; originally announced January 2023.

  19. arXiv:2210.17049  [pdf, other

    cs.CL cs.AI cs.LG cs.SD eess.AS

    Modular Hybrid Autoregressive Transducer

    Authors: Zhong Meng, Tongzhou Chen, Rohit Prabhavalkar, Yu Zhang, Gary Wang, Kartik Audhkhasi, Jesse Emond, Trevor Strohman, Bhuvana Ramabhadran, W. Ronny Huang, Ehsan Variani, Yinghui Huang, Pedro J. Moreno

    Abstract: Text-only adaptation of a transducer model remains challenging for end-to-end speech recognition since the transducer has no clearly separated acoustic model (AM), language model (LM) or blank model. In this work, we propose a modular hybrid autoregressive transducer (MHAT) that has structurally separated label and blank decoders to predict label and blank distributions, respectively, along with a… ▽ More

    Submitted 16 February, 2023; v1 submitted 30 October, 2022; originally announced October 2022.

    Comments: 8 pages, 1 figure, in SLT 2022

    Journal ref: 2022 IEEE Spoken Language Technology Workshop (SLT), Doha, Qatar

  20. arXiv:2210.10879  [pdf, other

    cs.LG cs.CL cs.SD eess.AS

    G-Augment: Searching for the Meta-Structure of Data Augmentation Policies for ASR

    Authors: Gary Wang, Ekin D. Cubuk, Andrew Rosenberg, Shuyang Cheng, Ron J. Weiss, Bhuvana Ramabhadran, Pedro J. Moreno, Quoc V. Le, Daniel S. Park

    Abstract: Data augmentation is a ubiquitous technique used to provide robustness to automatic speech recognition (ASR) training. However, even as so much of the ASR training process has become automated and more "end-to-end", the data augmentation policy (what augmentation functions to use, and how to apply them) remains hand-crafted. We present Graph-Augment, a technique to define the augmentation space as… ▽ More

    Submitted 24 October, 2022; v1 submitted 19 October, 2022; originally announced October 2022.

    Comments: 6 pages, accepted at SLT 2022. Updated with copyright

  21. arXiv:2210.10027  [pdf, other

    cs.CL cs.SD eess.AS

    Maestro-U: Leveraging joint speech-text representation learning for zero supervised speech ASR

    Authors: Zhehuai Chen, Ankur Bapna, Andrew Rosenberg, Yu Zhang, Bhuvana Ramabhadran, Pedro Moreno, Nanxin Chen

    Abstract: Training state-of-the-art Automated Speech Recognition (ASR) models typically requires a substantial amount of transcribed speech. In this work, we demonstrate that a modality-matched joint speech and text model can be leveraged to train a massively multilingual ASR model without any supervised (manually transcribed) speech for some languages. This paper explores the use of jointly learnt speech a… ▽ More

    Submitted 21 October, 2022; v1 submitted 18 October, 2022; originally announced October 2022.

    Comments: Accepted by SLT 2022

    MSC Class: 68T10 ACM Class: I.2.7

  22. arXiv:2209.06096  [pdf, other

    cs.CL cs.SD eess.AS

    Analysis of Self-Attention Head Diversity for Conformer-based Automatic Speech Recognition

    Authors: Kartik Audhkhasi, Yinghui Huang, Bhuvana Ramabhadran, Pedro J. Moreno

    Abstract: Attention layers are an integral part of modern end-to-end automatic speech recognition systems, for instance as part of the Transformer or Conformer architecture. Attention is typically multi-headed, where each head has an independent set of learned parameters and operates on the same input feature sequence. The output of multi-headed attention is a fusion of the outputs from the individual heads… ▽ More

    Submitted 13 September, 2022; originally announced September 2022.

    Comments: Accepted for publication in Interspeech 2022

  23. arXiv:2208.11594  [pdf, other

    cs.CV eess.SY

    Active Gaze Control for Foveal Scene Exploration

    Authors: Alexandre M. F. Dias, Luís Simões, Plinio Moreno, Alexandre Bernardino

    Abstract: Active perception and foveal vision are the foundations of the human visual system. While foveal vision reduces the amount of information to process during a gaze fixation, active perception will change the gaze direction to the most promising parts of the visual field. We propose a methodology to emulate how humans and robots with foveal cameras would explore a scene, identifying the objects pres… ▽ More

    Submitted 24 August, 2022; originally announced August 2022.

    Comments: 6 pages, 8 figures, ICDL 2022 (International Conference on Development and Learning, formerly ICDL-EpiRob)

  24. arXiv:2204.03409  [pdf, other

    cs.CL cs.SD eess.AS

    MAESTRO: Matched Speech Text Representations through Modality Matching

    Authors: Zhehuai Chen, Yu Zhang, Andrew Rosenberg, Bhuvana Ramabhadran, Pedro Moreno, Ankur Bapna, Heiga Zen

    Abstract: We present Maestro, a self-supervised training method to unify representations learnt from speech and text modalities. Self-supervised learning from speech signals aims to learn the latent structure inherent in the signal, while self-supervised learning from text attempts to capture lexical information. Learning aligned representations from unpaired speech and text sequences is a challenging task.… ▽ More

    Submitted 1 July, 2022; v1 submitted 7 April, 2022; originally announced April 2022.

    Comments: Accepted by Interspeech 2022

    MSC Class: 68T10 ACM Class: I.2.7

  25. A Scalable Model Specialization Framework for Training and Inference using Submodels and its Application to Speech Model Personalization

    Authors: Fadi Biadsy, Youzheng Chen, Xia Zhang, Oleg Rybakov, Andrew Rosenberg, Pedro J. Moreno

    Abstract: Model fine-tuning and adaptation have become a common approach for model specialization for downstream tasks or domains. Fine-tuning the entire model or a subset of the parameters using light-weight adaptation has shown considerable success across different specialization tasks. Fine-tuning a model for a large number of domains typically requires starting a new training job for every domain posing… ▽ More

    Submitted 13 September, 2022; v1 submitted 23 March, 2022; originally announced March 2022.

    Comments: Submitted to INTERSPEECH

  26. arXiv:2202.12719  [pdf, other

    cs.SD cs.CL eess.AS

    Ask2Mask: Guided Data Selection for Masked Speech Modeling

    Authors: Murali Karthick Baskar, Andrew Rosenberg, Bhuvana Ramabhadran, Yu Zhang, Pedro Moreno

    Abstract: Masked speech modeling (MSM) methods such as wav2vec2 or w2v-BERT learn representations over speech frames which are randomly masked within an utterance. While these methods improve performance of Automatic Speech Recognition (ASR) systems, they have one major limitation. They treat all unsupervised speech samples with equal weight, which hinders learning as not all samples have relevant informati… ▽ More

    Submitted 24 February, 2022; originally announced February 2022.

  27. arXiv:2108.12226  [pdf, other

    cs.CL cs.SD eess.AS

    Injecting Text in Self-Supervised Speech Pretraining

    Authors: Zhehuai Chen, Yu Zhang, Andrew Rosenberg, Bhuvana Ramabhadran, Gary Wang, Pedro Moreno

    Abstract: Self-supervised pretraining for Automated Speech Recognition (ASR) has shown varied degrees of success. In this paper, we propose to jointly learn representations during pretraining from two different modalities: speech and text. The proposed method, tts4pretrain complements the power of contrastive learning in self-supervision with linguistic/lexical representations derived from synthesized speec… ▽ More

    Submitted 27 August, 2021; originally announced August 2021.

    Comments: submit to ASRU 2021

    MSC Class: 68T10 ACM Class: I.2.7

  28. arXiv:2104.00587  [pdf, other

    stat.ML cs.LG

    NeRF-VAE: A Geometry Aware 3D Scene Generative Model

    Authors: Adam R. Kosiorek, Heiko Strathmann, Daniel Zoran, Pol Moreno, Rosalia Schneider, Soňa Mokrá, Danilo J. Rezende

    Abstract: We propose NeRF-VAE, a 3D scene generative model that incorporates geometric structure via NeRF and differentiable volume rendering. In contrast to NeRF, our model takes into account shared structure across scenes, and is able to infer the structure of a novel scene -- without the need to re-train -- using amortized inference. NeRF-VAE's explicit 3D rendering process further contrasts previous gen… ▽ More

    Submitted 1 April, 2021; originally announced April 2021.

    Comments: 17 pages, 15 figures, under review

  29. arXiv:2103.01439  [pdf, other

    stat.ML cs.LG

    Fast Adaptation with Linearized Neural Networks

    Authors: Wesley J. Maddox, Shuai Tang, Pablo Garcia Moreno, Andrew Gordon Wilson, Andreas Damianou

    Abstract: The inductive biases of trained neural networks are difficult to understand and, consequently, to adapt to new settings. We study the inductive biases of linearizations of neural networks, which we show to be surprisingly good summaries of the full network functions. Inspired by this finding, we propose a technique for embedding these inductive biases into Gaussian processes through a kernel desig… ▽ More

    Submitted 28 April, 2021; v1 submitted 1 March, 2021; originally announced March 2021.

    Comments: AISTATS 2021

  30. arXiv:2102.02274  [pdf, other

    cs.LG cs.AI cs.MA

    Neural Recursive Belief States in Multi-Agent Reinforcement Learning

    Authors: Pol Moreno, Edward Hughes, Kevin R. McKee, Bernardo Avila Pires, Théophane Weber

    Abstract: In multi-agent reinforcement learning, the problem of learning to act is particularly difficult because the policies of co-players may be heavily conditioned on information only observed by them. On the other hand, humans readily form beliefs about the knowledge possessed by their peers and leverage beliefs to inform decision-making. Such abilities underlie individual success in a wide range of Ma… ▽ More

    Submitted 3 February, 2021; originally announced February 2021.

  31. arXiv:2101.10892  [pdf, other

    cs.RO cs.AI cs.CV cs.LG

    Online Body Schema Adaptation through Cost-Sensitive Active Learning

    Authors: Gonçalo Cunha, Pedro Vicente, Alexandre Bernardino, Ricardo Ribeiro, Plínio Moreno

    Abstract: Humanoid robots have complex bodies and kinematic chains with several Degrees-of-Freedom (DoF) which are difficult to model. Learning the parameters of a kinematic model can be achieved by observing the position of the robot links during prospective motions and minimising the prediction errors. This work proposes a movement efficient approach for estimating online the body-schema of a humanoid rob… ▽ More

    Submitted 10 February, 2022; v1 submitted 26 January, 2021; originally announced January 2021.

    Comments: 6 pages, 7 figures

  32. arXiv:2011.09192  [pdf, other

    cs.AI cs.GT cs.MA

    Game Plan: What AI can do for Football, and What Football can do for AI

    Authors: Karl Tuyls, Shayegan Omidshafiei, Paul Muller, Zhe Wang, Jerome Connor, Daniel Hennes, Ian Graham, William Spearman, Tim Waskett, Dafydd Steele, Pauline Luc, Adria Recasens, Alexandre Galashov, Gregory Thornton, Romuald Elie, Pablo Sprechmann, Pol Moreno, Kris Cao, Marta Garnelo, Praneet Dutta, Michal Valko, Nicolas Heess, Alex Bridgland, Julien Perolat, Bart De Vylder , et al. (11 additional authors not shown)

    Abstract: The rapid progress in artificial intelligence (AI) and machine learning has opened unprecedented analytics possibilities in various team and individual sports, including baseball, basketball, and tennis. More recently, AI techniques have been applied to football, due to a huge increase in data collection by professional teams, increased computational power, and advances in machine learning, with t… ▽ More

    Submitted 18 November, 2020; originally announced November 2020.

  33. arXiv:2008.01505  [pdf, other

    cs.LG stat.ML

    Interpretable Anomaly Detection with Mondrian P{ó}lya Forests on Data Streams

    Authors: Charlie Dickens, Eric Meissner, Pablo G. Moreno, Tom Diethe

    Abstract: Anomaly detection at scale is an extremely challenging problem of great practicality. When data is large and high-dimensional, it can be difficult to detect which observations do not fit the expected behaviour. Recent work has coalesced on variations of (random) $k$\emph{d-trees} to summarise data for anomaly detection. However, these methods rely on ad-hoc score functions that are not easy to int… ▽ More

    Submitted 4 August, 2020; originally announced August 2020.

  34. arXiv:2006.16938  [pdf, other

    cs.LG stat.ML

    Tomographic Auto-Encoder: Unsupervised Bayesian Recovery of Corrupted Data

    Authors: Francesco Tonolini, Pablo G. Moreno, Andreas Damianou, Roderick Murray-Smith

    Abstract: We propose a new probabilistic method for unsupervised recovery of corrupted data. Given a large ensemble of degraded samples, our method recovers accurate posteriors of clean values, allowing the exploration of the manifold of possible reconstructed data and hence characterising the underlying uncertainty. In this setting, direct application of classical variational methods often gives rise to co… ▽ More

    Submitted 30 June, 2020; originally announced June 2020.

    Comments: 8+12 pages

  35. arXiv:2004.12696  [pdf, other

    cs.LG stat.ML

    Empirical Bayes Transductive Meta-Learning with Synthetic Gradients

    Authors: Shell Xu Hu, Pablo G. Moreno, Yang Xiao, Xi Shen, Guillaume Obozinski, Neil D. Lawrence, Andreas Damianou

    Abstract: We propose a meta-learning approach that learns from multiple tasks in a transductive setting, by leveraging the unlabeled query set in addition to the support set to generate a more powerful model for each task. To develop our framework, we revisit the empirical Bayes formulation for multi-task learning. The evidence lower bound of the marginal log-likelihood of empirical Bayes decomposes as a su… ▽ More

    Submitted 27 April, 2020; originally announced April 2020.

    Comments: ICLR 2020

  36. arXiv:2003.11435  [pdf, other

    cs.LG stat.ML

    Preferential Batch Bayesian Optimization

    Authors: Eero Siivola, Akash Kumar Dhaka, Michael Riis Andersen, Javier Gonzalez, Pablo Garcia Moreno, Aki Vehtari

    Abstract: Most research in Bayesian optimization (BO) has focused on \emph{direct feedback} scenarios, where one has access to exact values of some expensive-to-evaluate objective. This direction has been mainly driven by the use of BO in machine learning hyper-parameter configuration problems. However, in domains such as modelling human preferences, A/B tests, or recommender systems, there is a need for me… ▽ More

    Submitted 31 August, 2021; v1 submitted 25 March, 2020; originally announced March 2020.

    Comments: 6 pages + 7 pages in supplementary material

  37. arXiv:1910.02564  [pdf, other

    cs.CV cs.RO eess.IV

    Action-conditioned Benchmarking of Robotic Video Prediction Models: a Comparative Study

    Authors: Manuel Serra Nunes, Atabak Dehban, Plinio Moreno, José Santos-Victor

    Abstract: A defining characteristic of intelligent systems is the ability to make action decisions based on the anticipated outcomes. Video prediction systems have been demonstrated as a solution for predicting how the future will unfold visually, and thus, many models have been proposed that are capable of predicting future frames based on a history of observed frames~(and sometimes robot actions). However… ▽ More

    Submitted 6 October, 2019; originally announced October 2019.

  38. arXiv:1910.00714  [pdf, other

    cs.RO cs.AI cs.CV

    Action Anticipation for Collaborative Environments: The Impact of Contextual Information and Uncertainty-Based Prediction

    Authors: Clebeson Canuto, Plinio Moreno, Jorge Samatelo, Raquel Vassallo, José Santos-Victor

    Abstract: To interact with humans in collaborative environments, machines need to be able to predict (i.e., anticipate) future events, and execute actions in a timely manner. However, the observation of the human limb movements may not be sufficient to anticipate their actions unambiguously. In this work, we consider two additional sources of information (i.e., context) over time, gaze, movement and object… ▽ More

    Submitted 18 June, 2020; v1 submitted 1 October, 2019; originally announced October 2019.

    Comments: 27 pages, 16 figures, Neurocomputing

  39. arXiv:1909.11699  [pdf, other

    cs.CL cs.SD eess.AS

    Speech Recognition with Augmented Synthesized Speech

    Authors: Andrew Rosenberg, Yu Zhang, Bhuvana Ramabhadran, Ye Jia, Pedro Moreno, Yonghui Wu, Zelin Wu

    Abstract: Recent success of the Tacotron speech synthesis architecture and its variants in producing natural sounding multi-speaker synthesized speech has raised the exciting possibility of replacing expensive, manually transcribed, domain-specific, human speech that is used to train speech recognizers. The multi-speaker speech synthesis architecture can learn latent embedding spaces of prosody, speaker and… ▽ More

    Submitted 25 September, 2019; originally announced September 2019.

    Comments: Accepted for publication at ASRU 2020

  40. arXiv:1908.07613  [pdf

    cs.ET cs.AI cs.CY

    Implications of Quantum Computing for Artificial Intelligence alignment research

    Authors: Jaime Sevilla, Pablo Moreno

    Abstract: We explain some key features of quantum computing via three heuristics and apply them to argue that a deep understanding of quantum computing is unlikely to be helpful to address current bottlenecks in Artificial Intelligence Alignment. Our argument relies on the claims that Quantum Computing leads to compute overhang instead of algorithmic overhang, and that the difficulties associated with the m… ▽ More

    Submitted 24 August, 2019; v1 submitted 19 August, 2019; originally announced August 2019.

    Comments: 10 pages

  41. arXiv:1904.04169  [pdf, other

    eess.AS cs.SD

    Parrotron: An End-to-End Speech-to-Speech Conversion Model and its Applications to Hearing-Impaired Speech and Speech Separation

    Authors: Fadi Biadsy, Ron J. Weiss, Pedro J. Moreno, Dimitri Kanevsky, Ye Jia

    Abstract: We describe Parrotron, an end-to-end-trained speech-to-speech conversion model that maps an input spectrogram directly to another spectrogram, without utilizing any intermediate discrete representation. The network is composed of an encoder, spectrogram and phoneme decoders, followed by a vocoder to synthesize a time-domain waveform. We demonstrate that this model can be trained to normalize speec… ▽ More

    Submitted 29 October, 2019; v1 submitted 8 April, 2019; originally announced April 2019.

    Comments: 5 pages, submitted to Interspeech 2019

  42. arXiv:1904.02288  [pdf

    cs.DC

    Metabolomics in the Cloud: Scaling Computational Tools to Big Data

    Authors: Jianliang Gao, Noureddin Sadawi, Ibrahim Karaman, Jake T M Pearce, Pablo Moreno, Anders Larsson, Marco Capuccini, Paul Elliott, Jeremy K Nicholson, Timothy M D Ebbels, Robert Glen

    Abstract: Background: Metabolomics datasets are becoming increasingly large and complex, with multiple types of algorithms and workflows needed to process and analyse the data. A cloud infrastructure with portable software tools can provide much needed resources enabling faster processing of much larger datasets than would be possible at any individual lab. The PhenoMeNal project has developed such an infra… ▽ More

    Submitted 9 April, 2019; v1 submitted 3 April, 2019; originally announced April 2019.

    Comments: 25 pages, 5 figures

  43. arXiv:1812.01054  [pdf, other

    cs.LG cs.AI cs.NE stat.ML

    Transferring Knowledge across Learning Processes

    Authors: Sebastian Flennerhag, Pablo G. Moreno, Neil D. Lawrence, Andreas Damianou

    Abstract: In complex transfer learning scenarios new tasks might not be tightly linked to previous tasks. Approaches that transfer information contained only in the final parameters of a source model will therefore struggle. Instead, transfer learning at a higher level of abstraction is needed. We propose Leap, a framework that achieves this by transferring knowledge across learning processes. We associate… ▽ More

    Submitted 22 March, 2019; v1 submitted 3 December, 2018; originally announced December 2018.

    Comments: Published as a conference paper at ICLR 2019; 23 pages, 8 figures, 6 tables

  44. arXiv:1809.09190  [pdf, other

    eess.AS cs.CL cs.SD

    From Audio to Semantics: Approaches to end-to-end spoken language understanding

    Authors: Parisa Haghani, Arun Narayanan, Michiel Bacchiani, Galen Chuang, Neeraj Gaur, Pedro Moreno, Rohit Prabhavalkar, Zhongdi Qu, Austin Waters

    Abstract: Conventional spoken language understanding systems consist of two main components: an automatic speech recognition module that converts audio to a transcript, and a natural language understanding module that transforms the resulting text (or top N hypotheses) into a set of domains, intents, and arguments. These modules are typically optimized independently. In this paper, we formulate audio to sem… ▽ More

    Submitted 24 September, 2018; originally announced September 2018.

  45. arXiv:1807.09834  [pdf, other

    cs.CV cs.LG stat.ML

    Applying Domain Randomization to Synthetic Data for Object Category Detection

    Authors: João Borrego, Atabak Dehban, Rui Figueiredo, Plinio Moreno, Alexandre Bernardino, José Santos-Victor

    Abstract: Recent advances in deep learning-based object detection techniques have revolutionized their applicability in several fields. However, since these methods rely on unwieldy and large amounts of data, a common practice is to download models pre-trained on standard datasets and fine-tune them for specific application domains with a small set of domain relevant images. In this work, we show that using… ▽ More

    Submitted 16 July, 2018; originally announced July 2018.

    Comments: 17 pages, 9 figures. Under review for ACCV 2018

  46. arXiv:1711.01694  [pdf, other

    eess.AS cs.AI cs.CL

    Multilingual Speech Recognition With A Single End-To-End Model

    Authors: Shubham Toshniwal, Tara N. Sainath, Ron J. Weiss, Bo Li, Pedro Moreno, Eugene Weinstein, Kanishka Rao

    Abstract: Training a conventional automatic speech recognition (ASR) system to support multiple languages is challenging because the sub-word unit, lexicon and word inventories are typically language specific. In contrast, sequence-to-sequence models are well suited for multilingual ASR because they encapsulate an acoustic, pronunciation and language model jointly in a single network. In this work we presen… ▽ More

    Submitted 15 February, 2018; v1 submitted 5 November, 2017; originally announced November 2017.

    Comments: Accepted in ICASSP 2018

  47. arXiv:1411.1108  [pdf, other

    cs.RO

    High-level Reasoning and Low-level Learning for Grasping: A Probabilistic Logic Pipeline

    Authors: Laura Antanas, Plinio Moreno, Marion Neumann, Rui Pimentel de Figueiredo, Kristian Kersting, José Santos-Victor, Luc De Raedt

    Abstract: While grasps must satisfy the grasping stability criteria, good grasps depend on the specific manipulation scenario: the object, its properties and functionalities, as well as the task and grasp constraints. In this paper, we consider such information for robot grasping by leveraging manifolds and symbolic object parts. Specifically, we introduce a new probabilistic logic module to first semantica… ▽ More

    Submitted 4 November, 2014; originally announced November 2014.