Skip to main content

Showing 1–50 of 84 results for author: García, P

Searching in archive cs. Search in all archives.
.
  1. arXiv:2506.14432  [pdf, ps, other

    eess.IV cs.CV

    A large-scale heterogeneous 3D magnetic resonance brain imaging dataset for self-supervised learning

    Authors: Asbjørn Munk, Stefano Cerri, Jakob Ambsdorf, Julia Machnio, Sebastian Nørgaard Llambias, Vardan Nersesjan, Christian Hedeager Krag, Peirong Liu, Pablo Rocamora García, Mostafa Mehdipour Ghazi, Mikael Boesen, Michael Eriksen Benros, Juan Eugenio Iglesias, Mads Nielsen

    Abstract: We present FOMO60K, a large-scale, heterogeneous dataset of 60,529 brain Magnetic Resonance Imaging (MRI) scans from 13,900 sessions and 11,187 subjects, aggregated from 16 publicly available sources. The dataset includes both clinical- and research-grade images, multiple MRI sequences, and a wide range of anatomical and pathological variability, including scans with large brain anomalies. Minimal… ▽ More

    Submitted 17 June, 2025; originally announced June 2025.

  2. arXiv:2506.02157  [pdf, other

    cs.CL eess.AS

    HENT-SRT: Hierarchical Efficient Neural Transducer with Self-Distillation for Joint Speech Recognition and Translation

    Authors: Amir Hussein, Cihan Xiao, Matthew Wiesner, Dan Povey, Leibny Paola Garcia, Sanjeev Khudanpur

    Abstract: Neural transducers (NT) provide an effective framework for speech streaming, demonstrating strong performance in automatic speech recognition (ASR). However, the application of NT to speech translation (ST) remains challenging, as existing approaches struggle with word reordering and performance degradation when jointly modeling ASR and ST, resulting in a gap with attention-based encoder-decoder (… ▽ More

    Submitted 2 June, 2025; originally announced June 2025.

  3. arXiv:2505.17076  [pdf, ps, other

    cs.CL cs.AI cs.SD eess.AS

    Impact of Frame Rates on Speech Tokenizer: A Case Study on Mandarin and English

    Authors: Haoyang Zhang, Hexin Liu, Xiangyu Zhang, Qiquan Zhang, Yuchen Hu, Junqi Zhao, Fei Tian, Xuerui Yang, Leibny Paola Garcia, Eng Siong Chng

    Abstract: The speech tokenizer plays a crucial role in recent speech tasks, generally serving as a bridge between speech signals and language models. While low-frame-rate codecs are widely employed as speech tokenizers, the impact of frame rates on speech tokens remains underexplored. In this study, we investigate how varying frame rates affect speech tokenization by examining Mandarin and English, two typo… ▽ More

    Submitted 13 June, 2025; v1 submitted 20 May, 2025; originally announced May 2025.

    Comments: 6 pages, 5 figures

    MSC Class: 68T10 ACM Class: I.2.7

  4. arXiv:2504.14755  [pdf, ps, other

    cs.RO eess.SY

    Safe Autonomous Environmental Contact for Soft Robots using Control Barrier Functions

    Authors: Akua K. Dickson, Juan C. Pacheco Garcia, Meredith L. Anderson, Ran Jing, Sarah Alizadeh-Shabdiz, Audrey X. Wang, Charles DeLorey, Zach J. Patterson, Andrew P. Sabelhaus

    Abstract: Robots built from soft materials will inherently apply lower environmental forces than their rigid counterparts, and therefore may be more suitable in sensitive settings with unintended contact. However, these robots' applied forces result from both their design and their control system in closed-loop, and therefore, ensuring bounds on these forces requires controller synthesis for safety as well.… ▽ More

    Submitted 20 April, 2025; originally announced April 2025.

    Comments: 10 pages, 10 figures

  5. arXiv:2503.07196  [pdf, other

    cs.CR

    QKD-KEM: Hybrid QKD Integration into TLS with OpenSSL Providers

    Authors: Javier Blanco-Romero, Pedro Otero García, Daniel Sobral-Blanco, Florina Almenares Mendoza, Ana Fernández Vilas, Rebeca P. Díaz-Redondo

    Abstract: Quantum Key Distribution (QKD) promises information-theoretic security, yet integrating QKD into existing protocols like TLS remains challenging due to its fundamentally different operational model. In this paper, we propose a hybrid QKD-KEM protocol with two distinct integration approaches: a client-initiated flow compatible with both ETSI 004 and 014 specifications, and a server-initiated flow s… ▽ More

    Submitted 10 March, 2025; originally announced March 2025.

  6. arXiv:2502.20965  [pdf, other

    cs.AR

    Understanding intra-node communication in HPC systems and Datacenters

    Authors: Joaquin Tarraga-Moreno, Jesus Escudero-Sahuquillo, Pedro Javier Garcia, Francisco J. Quiles

    Abstract: Over the past decade, specialized computing and storage devices, such as GPUs, TPUs, and high-speed storage, have been increasingly integrated into server nodes within Supercomputers and Data Centers. The advent of high-bandwidth memory (HBM) has facilitated a more compact design for these components, enabling multiple units to be interconnected within a single server node through intra-node netwo… ▽ More

    Submitted 28 February, 2025; originally announced February 2025.

    Comments: 15 pages

  7. arXiv:2502.07288  [pdf, other

    cs.CV cs.AI

    KPIs 2024 Challenge: Advancing Glomerular Segmentation from Patch- to Slide-Level

    Authors: Ruining Deng, Tianyuan Yao, Yucheng Tang, Junlin Guo, Siqi Lu, Juming Xiong, Lining Yu, Quan Huu Cap, Pengzhou Cai, Libin Lan, Ze Zhao, Adrian Galdran, Amit Kumar, Gunjan Deotale, Dev Kumar Das, Inyoung Paik, Joonho Lee, Geongyu Lee, Yujia Chen, Wangkai Li, Zhaoyang Li, Xuege Hou, Zeyuan Wu, Shengjin Wang, Maximilian Fischer , et al. (22 additional authors not shown)

    Abstract: Chronic kidney disease (CKD) is a major global health issue, affecting over 10% of the population and causing significant mortality. While kidney biopsy remains the gold standard for CKD diagnosis and treatment, the lack of comprehensive benchmarks for kidney pathology segmentation hinders progress in the field. To address this, we organized the Kidney Pathology Image Segmentation (KPIs) Challenge… ▽ More

    Submitted 11 February, 2025; originally announced February 2025.

  8. Leveraging InfiniBand Controller to Configure Deadlock-Free Routing Engines for Dragonflies

    Authors: German Maglione-Mathey, Jesus Escudero-Sahuquillo, Pedro Javier Garcia, Francisco J. Quiles, Eitan Zahavi

    Abstract: The Dragonfly topology is currently one of the most popular network topologies in high-performance parallel systems. The interconnection networks of many of these systems are built from components based on the InfiniBand specification. However, due to some constraints in this specification, the available versions of the InfiniBand network controller (OpenSM) do not include routing engines based on… ▽ More

    Submitted 3 February, 2025; originally announced February 2025.

    Journal ref: Journal of Parallel and Distributed Computing (2021)

  9. Congestion Management in High-Performance Interconnection Networks Using Adaptive Routing Notifications

    Authors: Jose Rocher-Gonzalez, Jesus Escudero-Sahuquillo, Pedro J. Garcia, Francisco J. Quiles

    Abstract: The interconnection network is a crucial subsystem in High-Performance Computing clusters and Data-centers, guaranteeing high bandwidth and low latency to the applications' communication operations. Unfortunately, congestion situations may spoil network performance unless the network design applies specific countermeasures. Adaptive routing algorithms are a traditional approach to dealing with con… ▽ More

    Submitted 1 February, 2025; originally announced February 2025.

    Comments: 34 pages

    Journal ref: Journal Of Supercomputing 2023

  10. Towards an Efficient Combination of Adaptive Routing and Queuing Schemes in Fat-Tree Topologies

    Authors: Jose Rocher-Gonzalez, Jesus Escudero-Sahuquillo, Pedro J. Garcia, Francisco J. Quiles, Gaspar Mora

    Abstract: The interconnection network is a key element in High-Performance Computing (HPC) and Datacenter (DC) systems whose performance depends on several design parameters, such as the topology, the switch architecture, and the routing algorithm. Among the most common topologies in HPC systems, the Fat-Tree offers several shortest-path routes between any pair of end-nodes, which allows multi-path routing… ▽ More

    Submitted 1 February, 2025; originally announced February 2025.

    Comments: 53 pages

    Journal ref: Journal of Parallel and Distributed Computing 2021

  11. A* Based Algorithm for Reduced Complexity ML Decoding of Tailbiting Codes

    Authors: Jorge Ortin, Paloma Garcia, Fernando Gutierrez, Antonio Valdovinos

    Abstract: The A* algorithm is a graph search algorithm which has shown good results in terms of computational complexity for Maximum Likelihood (ML) decoding of tailbiting convolutional codes. The decoding of tailbiting codes with this algorithm is performed in two phases. In the first phase, a typical Viterbi decoding is employed to collect information regarding the trellis. The A* algorithm is then applie… ▽ More

    Submitted 25 January, 2025; originally announced January 2025.

    Comments: 3 pages

    Journal ref: IEEE Communications Letters, volume: 14, issue: 9, September 2010

  12. Channel Independent Precoder for OFDM-based Systems over Fading Channels

    Authors: Jorge Ortin, Paloma Garcia, Fernando Gutierrez, Antonio Valdovinos

    Abstract: In this paper we propose an independent channel precoder for orthogonal frequency division multiplexing (OFDM) systems over fading channels. The design of the precoder is based on the information redistribution of the input modulated symbols amongst the output precoded symbols. The proposed precoder decreases the variance of the instantaneous noise power at the receiver produced by the channel var… ▽ More

    Submitted 24 January, 2025; originally announced January 2025.

    Comments: 8 pages, 7 figures

    Journal ref: IEEE Transactions on Broadcasting, volume: 55, issue: 4, December 2009

  13. Performance Analysis of Turbo Decoding Algorithms in Wireless OFDM Systems

    Authors: Jorge Ortin, Paloma Garcia, Fernando Gutierrez, Antonio Valdovinos

    Abstract: Turbo codes are well known to be one of the error correction techniques which achieve closer results to the Shannon limit. Nevertheless, the specific performance of the code highly depends on the particular decoding algorithm used at the receiver. In this sense, the election of the decoding algorithm involves a trade off between the gain introduced by the code and the complexity of the decoding pr… ▽ More

    Submitted 23 January, 2025; originally announced January 2025.

    Comments: 7 pages, 11 figures

    Journal ref: IEEE Transactions on Consumer Electronics, volume: 55, issue: 3, August 2009

  14. Two Step SOVA-Based Decoding Algorithm for Tailbiting Codes

    Authors: Jorge Ortin, Paloma Garcia, Fernando Gutierrez, Antonio Valdovinos

    Abstract: In this work we propose a novel decoding algorithm for tailbiting convolutional codes and evaluate its performance over different channels. The proposed method consists on a fixed two-step Viterbi decoding of the received data. In the first step, an estimation of the most likely state is performed based on a SOVA decoding. The second step consists of a conventional Viterbi decoding that employs th… ▽ More

    Submitted 23 January, 2025; originally announced January 2025.

    Comments: 3 pages, 3 figures

    Journal ref: IEEE Communications Letters, volume: 13, issue: 7, July 2009

  15. arXiv:2501.12054  [pdf, other

    cs.CV physics.ao-ph

    ORCAst: Operational High-Resolution Current Forecasts

    Authors: Pierre Garcia, Inès Larroche, Amélie Pesnec, Hannah Bull, Théo Archambault, Evangelos Moschos, Alexandre Stegner, Anastase Charantonis, Dominique Béréziat

    Abstract: We present ORCAst, a multi-stage, multi-arm network for Operational high-Resolution Current forecAsts over one week. Producing real-time nowcasts and forecasts of ocean surface currents is a challenging problem due to indirect or incomplete information from satellite remote sensing data. Entirely trained on real satellite data and in situ measurements from drifters, our model learns to forecast gl… ▽ More

    Submitted 21 January, 2025; originally announced January 2025.

  16. arXiv:2412.08568  [pdf, other

    cs.RO eess.SY

    Real-Time Trajectory Generation for Soft Robot Manipulators Using Differential Flatness

    Authors: Akua Dickson, Juan C. Pacheco Garcia, Ran Jing, Meredith L. Anderson, Andrew P. Sabelhaus

    Abstract: Soft robots have the potential to interact with sensitive environments and perform complex tasks effectively. However, motion plans and trajectories for soft manipulators are challenging to calculate due to their deformable nature and nonlinear dynamics. This article introduces a fast real-time trajectory generation approach for soft robot manipulators, which creates dynamically-feasible motions f… ▽ More

    Submitted 11 December, 2024; originally announced December 2024.

  17. arXiv:2411.07751  [pdf, other

    cs.SD cs.AI cs.CV cs.MM eess.AS

    SAV-SE: Scene-aware Audio-Visual Speech Enhancement with Selective State Space Model

    Authors: Xinyuan Qian, Jiaran Gao, Yaodan Zhang, Qiquan Zhang, Hexin Liu, Leibny Paola Garcia, Haizhou Li

    Abstract: Speech enhancement plays an essential role in various applications, and the integration of visual information has been demonstrated to bring substantial advantages. However, the majority of current research concentrates on the examination of facial and lip movements, which can be compromised or entirely inaccessible in scenarios where occlusions occur or when the camera view is distant. Whereas co… ▽ More

    Submitted 2 April, 2025; v1 submitted 12 November, 2024; originally announced November 2024.

    Comments: accepted by IEEE Journal of Selected Topics in Signal Processing

  18. arXiv:2411.06741  [pdf, other

    stat.AP cs.LG stat.ML

    Methane projections from Canada's oil sands tailings using scientific deep learning reveal significant underestimation

    Authors: Esha Saha, Oscar Wang, Amit K. Chakraborty, Pablo Venegas Garcia, Russell Milne, Hao Wang

    Abstract: Bitumen extraction for the production of synthetic crude oil in Canada's Athabasca Oil Sands industry has recently come under spotlight for being a significant source of greenhouse gas emission. A major cause of concern is methane, a greenhouse gas produced by the anaerobic biodegradation of hydrocarbons in oil sands residues, or tailings, stored in settle basins commonly known as oil sands tailin… ▽ More

    Submitted 11 November, 2024; originally announced November 2024.

    Comments: 19 pages, 8 figures, 2 tables

  19. arXiv:2410.11110  [pdf, other

    cs.RO cs.HC

    HoloSpot: Intuitive Object Manipulation via Mixed Reality Drag-and-Drop

    Authors: Pablo Soler Garcia, Petar Lukovic, Lucie Reynaud, Andrea Sgobbi, Federica Bruni, Martin Brun, Marc Zünd, Riccardo Bollati, Marc Pollefeys, Hermann Blum, Zuria Bauer

    Abstract: Human-robot interaction through mixed reality (MR) technologies enables novel, intuitive interfaces to control robots in remote operations. Such interfaces facilitate operations in hazardous environments, where human presence is risky, yet human oversight remains crucial. Potential environments include disaster response scenarios and areas with high radiation or toxic chemicals. In this paper we p… ▽ More

    Submitted 14 October, 2024; originally announced October 2024.

    Comments: 6 pages, 8 figures, submitted to ICRA 2025

    ACM Class: I.2.9; H.5.2

  20. arXiv:2409.17111  [pdf, other

    cs.RO

    Self-Sensing for Proprioception and Contact Detection in Soft Robots Using Shape Memory Alloy Artificial Muscles

    Authors: Ran Jing, Meredith L. Anderson, Juan C. Pacheco Garcia, Andrew P. Sabelhaus

    Abstract: Estimating a soft robot's pose and applied forces, also called proprioception, is crucial for safe interaction of the robot with its environment. However, most solutions for soft robot proprioception use dedicated sensors, particularly for external forces, which introduce design trade-offs, rigidity, and risk of failure. This work presents an approach for pose estimation and contact detection for… ▽ More

    Submitted 25 September, 2024; originally announced September 2024.

    Comments: 6 pages, 7 figures

  21. arXiv:2407.16447  [pdf, ps, other

    eess.AS cs.SD

    The CHiME-8 DASR Challenge for Generalizable and Array Agnostic Distant Automatic Speech Recognition and Diarization

    Authors: Samuele Cornell, Taejin Park, Steve Huang, Christoph Boeddeker, Xuankai Chang, Matthew Maciejewski, Matthew Wiesner, Paola Garcia, Shinji Watanabe

    Abstract: This paper presents the CHiME-8 DASR challenge which carries on from the previous edition CHiME-7 DASR (C7DASR) and the past CHiME-6 challenge. It focuses on joint multi-channel distant speech recognition (DASR) and diarization with one or more, possibly heterogeneous, devices. The main goal is to spur research towards meeting transcription approaches that can generalize across arbitrary number of… ▽ More

    Submitted 23 July, 2024; originally announced July 2024.

  22. arXiv:2407.05817  [pdf, other

    cs.DC cs.AI

    Cyber Physical Games

    Authors: Warisa Sritriratanarak, Paulo Garcia

    Abstract: We describe a formulation of multi-agents operating within a Cyber-Physical System, resulting in collaborative or adversarial games. We show that the non-determinism inherent in the communication medium between agents and the underlying physical environment gives rise to environment evolution that is a probabilistic function of agents' strategies. We name these emergent properties Cyber Physical G… ▽ More

    Submitted 8 July, 2024; originally announced July 2024.

  23. An Open and Reconfigurable User Interface to Manage Complex ROS-based Robotic Systems

    Authors: Pablo Malvido Fresnillo, Saigopal Vasudevan, Jose A. Perez Garcia, Jose L. Martinez Lastra

    Abstract: The Robot Operating System (ROS) has significantly gained popularity among robotic engineers and researchers over the past five years, primarily due to its powerful infrastructure for node communication, which enables developers to build modular and large robotic applications. However, ROS presents a steep learning curve and lacks the intuitive usability of vendor-specific robotic Graphical User I… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

    Comments: 14 pages, 12 figures, 3 tables

    Report number: vol. 12, pp. 114601-114617

    Journal ref: IEEE Access 2024

  24. Perfect codes over non-prime power alphabets: an approach based on Diophantine equations

    Authors: Pedro-José Cazorla García

    Abstract: Perfect error correcting codes allow for an optimal transmission of information while guaranteeing error correction. For this reason, proving their existence has been a classical problem in both pure mathematics and information theory. Indeed, the classification of the parameters of $e-$error correcting perfect codes over $q-$ary alphabets was a very active topic of research in the late 20th centu… ▽ More

    Submitted 24 May, 2024; v1 submitted 6 May, 2024; originally announced May 2024.

    Comments: 12 pages, 2 tables. The new version includes the comments by the anonymous referees

    MSC Class: 94B65; 11D61 (Primary); 11G05; 11G50; 14G05 (Secondary)

    Journal ref: Mathematics 2024, 12(11), 1642

  25. Runtime Verification and Field-based Testing for ROS-based Robotic Systems

    Authors: Ricardo Caldas, Juan Antonio Pinera Garcia, Matei Schiopu, Patrizio Pelliccione, Genaina Rodrigues, Thorsten Berger

    Abstract: Robotic systems are becoming pervasive and adopted in increasingly many domains, such as manufacturing, healthcare, and space exploration. To this end, engineering software has emerged as a crucial discipline for building maintainable and reusable robotic systems. The robotics software engineering research field has received increasing attention, fostering autonomy as a fundamental goal. However,… ▽ More

    Submitted 21 August, 2024; v1 submitted 17 April, 2024; originally announced April 2024.

  26. Enhancing Students' Learning Process Through Self-Generated Tests

    Authors: Marcos Sánchez-Élez, Inmaculada Pardines, Pablo García, Guadalupe Miñana, Sara Román, Margarita Sánchez, José L. Risco-Martín

    Abstract: The use of new technologies in higher education has surprisingly emphasized students' tendency to adopt a passive behavior in class. Participation and interaction of students are essential to improve academic results. This paper describes an educational experiment aimed at the promotion of students' autonomous learning by requiring them to generate test type questions related to the contents of th… ▽ More

    Submitted 21 March, 2024; originally announced March 2024.

    Journal ref: Journal of Science Education and Technology, 23(1), pp. 15-25, 2014

  27. arXiv:2402.10642  [pdf, other

    eess.AS cs.AI

    Speaking in Wavelet Domain: A Simple and Efficient Approach to Speed up Speech Diffusion Model

    Authors: Xiangyu Zhang, Daijiao Liu, Hexin Liu, Qiquan Zhang, Hanyu Meng, Leibny Paola Garcia, Eng Siong Chng, Lina Yao

    Abstract: Recently, Denoising Diffusion Probabilistic Models (DDPMs) have attained leading performances across a diverse range of generative tasks. However, in the field of speech synthesis, although DDPMs exhibit impressive performance, their long training duration and substantial inference costs hinder practical deployment. Existing approaches primarily focus on enhancing inference speed, while approaches… ▽ More

    Submitted 23 September, 2024; v1 submitted 16 February, 2024; originally announced February 2024.

  28. arXiv:2402.09734  [pdf, other

    cs.AI

    Agents Need Not Know Their Purpose

    Authors: Paulo Garcia

    Abstract: Ensuring artificial intelligence behaves in such a way that is aligned with human values is commonly referred to as the alignment challenge. Prior work has shown that rational agents, behaving in such a way that maximizes a utility function, will inevitably behave in such a way that is not aligned with human values, especially as their level of intelligence goes up. Prior work has also shown that… ▽ More

    Submitted 15 February, 2024; originally announced February 2024.

  29. arXiv:2402.06201  [pdf, other

    cs.RO eess.SY

    Maximizing Consistent Force Output for Shape Memory Alloy Artificial Muscles in Soft Robots

    Authors: Meredith L. Anderson, Ran Jing, Juan C. Pacheco Garcia, Ilyoung Yang, Sarah Alizadeh-Shabdiz, Charles DeLorey, Andrew P. Sabelhaus

    Abstract: Soft robots have immense potential given their inherent safety and adaptability, but challenges in soft actuator forces and design constraints have limited scaling up soft robots to larger sizes. Electrothermal shape memory alloy (SMA) artificial muscles have the potential to create these large forces and high displacements, but consistently using these muscles under a well-defined model, in-situ… ▽ More

    Submitted 9 February, 2024; originally announced February 2024.

    Comments: 8 pages, 8 figures, accepted by 2024 IEEE International Conference on Soft Robotics (RoboSoft)

  30. arXiv:2401.07726  [pdf, other

    cs.PL cs.RO

    Preserving Power Optimizations Across the High Level Synthesis of Distinct Application-Specific Circuits

    Authors: Paulo Garcia

    Abstract: We evaluate the use of software interpretation to push High Level Synthesis of application-specific accelerators toward a higher level of abstraction. Our methodology is supported by a formal power consumption model that computes the power consumption of accelerator components, accurately predicting the power consumption on new designs from prior optimization estimations. We demonstrate how our ap… ▽ More

    Submitted 9 July, 2024; v1 submitted 15 January, 2024; originally announced January 2024.

    Comments: Accepted at IEEE 10th International Conference on Communications and Electronics (ICCE) 2024

  31. arXiv:2401.07429  [pdf, other

    cs.AR

    Accelerating Boolean Constraint Propagation for Efficient SAT-Solving on FPGAs

    Authors: Hariprasadh Govindasamy, Babak Esfandiari, Paulo Garcia

    Abstract: We present a hardware-accelerated SAT solver targeting processor/Field Programmable Gate Arrays (FPGA) SoCs. Our solution accelerates the most expensive subroutine of the Davis-Putnam-Logemann-Loveland (DPLL) algorithm, Boolean Constraint Propagation (BCP) through fine-grained FPGA parallelism. Unlike prior state-of-the-art solutions, our solver eliminates costly clause look-up operations by assig… ▽ More

    Submitted 13 April, 2024; v1 submitted 14 January, 2024; originally announced January 2024.

    Comments: Accepted at ACM GLSVLSI 2024

  32. arXiv:2312.11279  [pdf, other

    cs.AR

    FPGAs (Can Get Some) SATisfaction

    Authors: Hariprasadh Godindasamy, Babak Esfandiari, Paulo Garcia

    Abstract: We present a hardware-accelerated SAT solver suitable for processor/Field Programmable Gate Arrays (FPGA) hybrid platforms, which have become the norm in the embedded domain. Our solution addresses a known bottleneck in SAT solving acceleration: unlike prior state-of-the-art solutions that have addressed the same bottleneck by limiting the amount of exploited parallelism, our solver takes advantag… ▽ More

    Submitted 18 December, 2023; originally announced December 2023.

  33. arXiv:2312.09546  [pdf, other

    cs.AI

    On a Functional Definition of Intelligence

    Authors: Warisa Sritriratanarak, Paulo Garcia

    Abstract: Without an agreed-upon definition of intelligence, asking "is this system intelligent?"" is an untestable question. This lack of consensus hinders research, and public perception, on Artificial Intelligence (AI), particularly since the rise of generative- and large-language models. Most work on precisely capturing what we mean by "intelligence" has come from the fields of philosophy, psychology, a… ▽ More

    Submitted 15 December, 2023; originally announced December 2023.

    Comments: submitted; under review at "Journal of Intelligent Computing, SPJ"

  34. arXiv:2312.08650  [pdf, other

    cs.CV eess.SP

    PhyOT: Physics-informed object tracking in surveillance cameras

    Authors: Kawisorn Kamtue, Jose M. F. Moura, Orathai Sangpetch, Paulo Garcia

    Abstract: While deep learning has been very successful in computer vision, real world operating conditions such as lighting variation, background clutter, or occlusion hinder its accuracy across several tasks. Prior work has shown that hybrid models -- combining neural networks and heuristics/algorithms -- can outperform vanilla deep learning for several computer vision tasks, such as classification or trac… ▽ More

    Submitted 13 December, 2023; originally announced December 2023.

    Comments: Accepted at IEEE ICASSP 2024 on December 13, 2023

  35. arXiv:2311.15954  [pdf, other

    cs.CL eess.AS

    A Quantitative Approach to Understand Self-Supervised Models as Cross-lingual Feature Extractors

    Authors: Shuyue Stella Li, Beining Xu, Xiangyu Zhang, Hexin Liu, Wenhan Chao, Leibny Paola Garcia

    Abstract: In this work, we study the features extracted by English self-supervised learning (SSL) models in cross-lingual contexts and propose a new metric to predict the quality of feature representations. Using automatic speech recognition (ASR) as a downstream task, we analyze the effect of model size, training objectives, and model architecture on the models' performance as a feature extractor for a set… ▽ More

    Submitted 27 November, 2023; originally announced November 2023.

    Comments: 12 pages, 5 figures, 4 tables

  36. arXiv:2310.01719  [pdf

    cs.SE

    Software Testing and Code Refactoring: A Survey with Practitioners

    Authors: Danilo Leandro Lima, Ronnie de Souza Santos, Guilherme Pires Garcia, Sildemir S. da Silva, Cesar Franca, Luiz Fernando Capretz

    Abstract: Nowadays, software testing professionals are commonly required to develop coding skills to work on test automation. One essential skill required from those who code is the ability to implement code refactoring, a valued quality aspect of software development; however, software developers usually encounter obstacles in successfully applying this practice. In this scenario, the present study aims to… ▽ More

    Submitted 2 October, 2023; originally announced October 2023.

  37. arXiv:2309.16953  [pdf, other

    eess.AS cs.SD

    Enhancing Code-switching Speech Recognition with Interactive Language Biases

    Authors: Hexin Liu, Leibny Paola Garcia, Xiangyu Zhang, Andy W. H. Khong, Sanjeev Khudanpur

    Abstract: Languages usually switch within a multilingual speech signal, especially in a bilingual society. This phenomenon is referred to as code-switching (CS), making automatic speech recognition (ASR) challenging under a multilingual scenario. We propose to improve CS-ASR by biasing the hybrid CTC/attention ASR model with multi-level language information comprising frame- and token-level language posteri… ▽ More

    Submitted 28 September, 2023; originally announced September 2023.

    Comments: Submitted to IEEE ICASSP 2024

  38. arXiv:2309.15018  [pdf, other

    cs.CV cs.AI cs.HC q-bio.NC

    Unidirectional brain-computer interface: Artificial neural network encoding natural images to fMRI response in the visual cortex

    Authors: Ruixing Liang, Xiangyu Zhang, Qiong Li, Lai Wei, Hexin Liu, Avisha Kumar, Kelley M. Kempski Leadingham, Joshua Punnoose, Leibny Paola Garcia, Amir Manbachi

    Abstract: While significant advancements in artificial intelligence (AI) have catalyzed progress across various domains, its full potential in understanding visual perception remains underexplored. We propose an artificial neural network dubbed VISION, an acronym for "Visual Interface System for Imaging Output of Neural activity," to mimic the human brain and show how it can foster neuroscientific inquiries… ▽ More

    Submitted 26 September, 2023; originally announced September 2023.

  39. arXiv:2306.13734  [pdf, other

    eess.AS cs.CL cs.SD

    The CHiME-7 DASR Challenge: Distant Meeting Transcription with Multiple Devices in Diverse Scenarios

    Authors: Samuele Cornell, Matthew Wiesner, Shinji Watanabe, Desh Raj, Xuankai Chang, Paola Garcia, Matthew Maciejewski, Yoshiki Masuyama, Zhong-Qiu Wang, Stefano Squartini, Sanjeev Khudanpur

    Abstract: The CHiME challenges have played a significant role in the development and evaluation of robust automatic speech recognition (ASR) systems. We introduce the CHiME-7 distant ASR (DASR) task, within the 7th CHiME challenge. This task comprises joint ASR and diarization in far-field settings with multiple, and possibly heterogeneous, recording devices. Different from previous challenges, we evaluate… ▽ More

    Submitted 14 July, 2023; v1 submitted 23 June, 2023; originally announced June 2023.

  40. arXiv:2306.01031  [pdf, other

    cs.CL cs.LG cs.SD eess.AS

    Bypass Temporal Classification: Weakly Supervised Automatic Speech Recognition with Imperfect Transcripts

    Authors: Dongji Gao, Matthew Wiesner, Hainan Xu, Leibny Paola Garcia, Daniel Povey, Sanjeev Khudanpur

    Abstract: This paper presents a novel algorithm for building an automatic speech recognition (ASR) model with imperfect training data. Imperfectly transcribed speech is a prevalent issue in human-annotated speech corpora, which degrades the performance of ASR models. To address this problem, we propose Bypass Temporal Classification (BTC) as an expansion of the Connectionist Temporal Classification (CTC) cr… ▽ More

    Submitted 1 June, 2023; originally announced June 2023.

  41. arXiv:2212.10249  [pdf, other

    q-bio.NC cs.LG cs.NE

    Learning efficient backprojections across cortical hierarchies in real time

    Authors: Kevin Max, Laura Kriener, Garibaldi Pineda García, Thomas Nowotny, Ismael Jaras, Walter Senn, Mihai A. Petrovici

    Abstract: Models of sensory processing and learning in the cortex need to efficiently assign credit to synapses in all areas. In deep learning, a known solution is error backpropagation, which however requires biologically implausible weight transport from feed-forward to feedback paths. We introduce Phaseless Alignment Learning (PAL), a bio-plausible method to learn efficient feedback weights in layered… ▽ More

    Submitted 2 February, 2024; v1 submitted 20 December, 2022; originally announced December 2022.

    Comments: Updated with streamlined main part, CIFAR-10 simulations, including DFA and minor fixes

  42. arXiv:2212.06039  [pdf

    cs.CL cs.AI

    Technological taxonomies for hypernym and hyponym retrieval in patent texts

    Authors: You Zuo, Yixuan Li, Alma Parias García, Kim Gerdes

    Abstract: This paper presents an automatic approach to creating taxonomies of technical terms based on the Cooperative Patent Classification (CPC). The resulting taxonomy contains about 170k nodes in 9 separate technological branches and is freely available. We also show that a Text-to-Text Transfer Transformer (T5) model can be fine-tuned to generate hypernyms and hyponyms with relatively high precision, c… ▽ More

    Submitted 13 December, 2022; v1 submitted 14 November, 2022; originally announced December 2022.

    Comments: ToTh 2022 - Terminology & Ontology: Theories and applications, Jun 2022, Chamb{é}ry, France

  43. arXiv:2211.17196  [pdf, other

    cs.CL cs.SD eess.AS

    EURO: ESPnet Unsupervised ASR Open-source Toolkit

    Authors: Dongji Gao, Jiatong Shi, Shun-Po Chuang, Leibny Paola Garcia, Hung-yi Lee, Shinji Watanabe, Sanjeev Khudanpur

    Abstract: This paper describes the ESPnet Unsupervised ASR Open-source Toolkit (EURO), an end-to-end open-source toolkit for unsupervised automatic speech recognition (UASR). EURO adopts the state-of-the-art UASR learning method introduced by the Wav2vec-U, originally implemented at FAIRSEQ, which leverages self-supervised speech representations and adversarial training. In addition to wav2vec2, EURO extend… ▽ More

    Submitted 20 May, 2023; v1 submitted 30 November, 2022; originally announced November 2022.

  44. arXiv:2211.13101  [pdf, other

    cs.DC cs.NI

    High-Quality Fault Resiliency in Fat Trees

    Authors: John Gliksberg, Antoine Capra, Alexandre Louvet, Pedro Javier Garcia, Devan Sohier

    Abstract: Coupling regular topologies with optimised routing algorithms is key in pushing the performance of interconnection networks of supercomputers.In this paper we present Dmodc, a fast deterministic routing algorithm for Parallel Generalised Fat-Trees (PGFTs) which minimises congestion risk even under massive network degradation caused by equipment failure.Dmodc computes forwarding tables with a close… ▽ More

    Submitted 23 November, 2022; originally announced November 2022.

    Comments: arXiv admin note: text overlap with arXiv:2211.11817

    Journal ref: IEEE Micro, 2020, 40 (1), pp.44-49. \&\#x27E8;10.1109/MM.2019.2949978\&\#x27E9

  45. Node-Type-Based Load-Balancing Routing for Parallel Generalized Fat-Trees

    Authors: John Gliksberg, Jean-Noel Quintin, Pedro Javier Garcia

    Abstract: High-Performance Computing (HPC) clusters are made up of a variety of node types (usually compute, I/O, service, and GPGPU nodes) and applications don't use nodes of a different type the same way. Resulting communication patterns reflect organization of groups of nodes, and current optimal routing algorithms for all-to-all patterns will not always maximize performance for group-specific communicat… ▽ More

    Submitted 21 November, 2022; originally announced November 2022.

    Journal ref: 2018 IEEE 4th International Workshop on High-Performance Interconnection Networks in the Exascale and Big-Data Era (HiPINEB), Feb 2018, Vienna, France. pp.9-15

  46. High-Quality Fault-Resiliency in Fat-Tree Networks (Extended Abstract)

    Authors: John Gliksberg, Antoine Capra, Alexandre Louvet, Pedro Javier Garcia, Devan Sohier

    Abstract: Coupling regular topologies with optimized routing algorithms is key in pushing the performance of interconnection networks of HPC systems. In this paper we present Dmodc, a fast deterministic routing algorithm for Parallel Generalized Fat-Trees (PGFTs) which minimizes congestion risk even under massive topology degradation caused by equipment failure. It applies a modulo-based computation of forw… ▽ More

    Submitted 21 November, 2022; originally announced November 2022.

    Journal ref: 2019 IEEE Symposium on High-Performance Interconnects (HOTI), Aug 2019, Santa Clara, United States. pp.9-12

  47. arXiv:2211.03025  [pdf, other

    cs.CL cs.SD eess.AS

    Bridging Speech and Textual Pre-trained Models with Unsupervised ASR

    Authors: Jiatong Shi, Chan-Jan Hsu, Holam Chung, Dongji Gao, Paola Garcia, Shinji Watanabe, Ann Lee, Hung-yi Lee

    Abstract: Spoken language understanding (SLU) is a task aiming to extract high-level semantics from spoken utterances. Previous works have investigated the use of speech self-supervised models and textual pre-trained models, which have shown reasonable improvements to various SLU tasks. However, because of the mismatched modalities between speech signals and text tokens, previous methods usually need comple… ▽ More

    Submitted 6 November, 2022; originally announced November 2022.

    Comments: ICASSP2023 submission

  48. arXiv:2211.00482  [pdf, other

    eess.AS cs.SD

    Adapting self-supervised models to multi-talker speech recognition using speaker embeddings

    Authors: Zili Huang, Desh Raj, Paola García, Sanjeev Khudanpur

    Abstract: Self-supervised learning (SSL) methods which learn representations of data without explicit supervision have gained popularity in speech-processing tasks, particularly for single-talker applications. However, these models often have degraded performance for multi-talker scenarios -- possibly due to the domain mismatch -- which severely limits their use for such applications. In this paper, we inve… ▽ More

    Submitted 1 November, 2022; originally announced November 2022.

    Comments: submitted to ICASSP 2023

  49. arXiv:2210.14567  [pdf, other

    eess.AS cs.SD

    Reducing Language confusion for Code-switching Speech Recognition with Token-level Language Diarization

    Authors: Hexin Liu, Haihua Xu, Leibny Paola Garcia, Andy W. H. Khong, Yi He, Sanjeev Khudanpur

    Abstract: Code-switching (CS) refers to the phenomenon that languages switch within a speech signal and leads to language confusion for automatic speech recognition (ASR). This paper aims to address language confusion for improving CS-ASR from two perspectives: incorporating and disentangling language information. We incorporate language information in the CS-ASR model by dynamically biasing the model with… ▽ More

    Submitted 26 October, 2022; originally announced October 2022.

    Comments: Submitted to ICASSP 2023

  50. arXiv:2210.07189  [pdf, other

    cs.CL cs.SD eess.AS

    On Compressing Sequences for Self-Supervised Speech Models

    Authors: Yen Meng, Hsuan-Jui Chen, Jiatong Shi, Shinji Watanabe, Paola Garcia, Hung-yi Lee, Hao Tang

    Abstract: Compressing self-supervised models has become increasingly necessary, as self-supervised models become larger. While previous approaches have primarily focused on compressing the model size, shortening sequences is also effective in reducing the computational cost. In this work, we study fixed-length and variable-length subsampling along the time axis in self-supervised learning. We explore how in… ▽ More

    Submitted 25 October, 2022; v1 submitted 13 October, 2022; originally announced October 2022.

    Comments: Accepted to IEEE SLT 2022