-
DeepFilterGAN: A Full-band Real-time Speech Enhancement System with GAN-based Stochastic Regeneration
Authors:
Sanberk Serbest,
Tijana Stojkovic,
Milos Cernak,
Andrew Harper
Abstract:
In this work, we propose a full-band real-time speech enhancement system with GAN-based stochastic regeneration. Predictive models focus on estimating the mean of the target distribution, whereas generative models aim to learn the full distribution. This behavior of predictive models may lead to over-suppression, i.e. the removal of speech content. In the literature, it was shown that combining a…
▽ More
In this work, we propose a full-band real-time speech enhancement system with GAN-based stochastic regeneration. Predictive models focus on estimating the mean of the target distribution, whereas generative models aim to learn the full distribution. This behavior of predictive models may lead to over-suppression, i.e. the removal of speech content. In the literature, it was shown that combining a predictive model with a generative one within the stochastic regeneration framework can reduce the distortion in the output. We use this framework to obtain a real-time speech enhancement system. With 3.58M parameters and a low latency, our system is designed for real-time streaming with a lightweight architecture. Experiments show that our system improves over the first stage in terms of NISQA-MOS metric. Finally, through an ablation study, we show the importance of noisy conditioning in our system. We participated in 2025 Urgent Challenge with our model and later made further improvements.
△ Less
Submitted 29 May, 2025;
originally announced May 2025.
-
Model as Loss: A Self-Consistent Training Paradigm
Authors:
Saisamarth Rajesh Phaye,
Milos Cernak,
Andrew Harper
Abstract:
Conventional methods for speech enhancement rely on handcrafted loss functions (e.g., time or frequency domain losses) or deep feature losses (e.g., using WavLM or wav2vec), which often fail to capture subtle signal properties essential for optimal performance. To address this, we propose Model as Loss, a novel training paradigm that utilizes the encoder from the same model as a loss function to g…
▽ More
Conventional methods for speech enhancement rely on handcrafted loss functions (e.g., time or frequency domain losses) or deep feature losses (e.g., using WavLM or wav2vec), which often fail to capture subtle signal properties essential for optimal performance. To address this, we propose Model as Loss, a novel training paradigm that utilizes the encoder from the same model as a loss function to guide the training.
The Model as Loss paradigm leverages the encoder's task-specific feature space, optimizing the decoder to produce output consistent with perceptual and task-relevant characteristics of the clean signal. By using the encoder's learned features as a loss function, this framework enforces self-consistency between the clean reference speech and the enhanced model output. Our approach outperforms pre-trained deep feature losses on standard speech enhancement benchmarks, offering better perceptual quality and robust generalization to both in-domain and out-of-domain datasets.
△ Less
Submitted 27 May, 2025;
originally announced May 2025.
-
Unlocking the Potential of Past Research: Using Generative AI to Reconstruct Healthcare Simulation Models
Authors:
Thomas Monks,
Alison Harper,
Amy Heather
Abstract:
Discrete-event simulation (DES) is widely used in healthcare Operations Research, but the models themselves are rarely shared. This limits their potential for reuse and long-term impact in the modelling and healthcare communities. This study explores the feasibility of using generative artificial intelligence (AI) to recreate published models using Free and Open Source Software (FOSS), based on th…
▽ More
Discrete-event simulation (DES) is widely used in healthcare Operations Research, but the models themselves are rarely shared. This limits their potential for reuse and long-term impact in the modelling and healthcare communities. This study explores the feasibility of using generative artificial intelligence (AI) to recreate published models using Free and Open Source Software (FOSS), based on the descriptions provided in an academic journal. Using a structured methodology, we successfully generated, tested and internally reproduced two DES models, including user interfaces. The reported results were replicated for one model, but not the other, likely due to missing information on distributions. These models are substantially more complex than AI-generated DES models published to date. Given the challenges we faced in prompt engineering, code generation, and model testing, we conclude that our iterative approach to model development, systematic comparison and testing, and the expertise of our team were necessary to the success of our recreated simulation models.
△ Less
Submitted 27 March, 2025;
originally announced March 2025.
-
Time-Series Forecasting in Smart Manufacturing Systems: An Experimental Evaluation of the State-of-the-art Algorithms
Authors:
Mojtaba A. Farahani,
Fadi El Kalach,
Austin Harper,
M. R. McCormick,
Ramy Harik,
Thorsten Wuest
Abstract:
TSF is growing in various domains including manufacturing. Although numerous TSF algorithms have been developed recently, the validation and evaluation of algorithms hold substantial value for researchers and practitioners and are missing. This study aims to fill this gap by evaluating the SoTA TSF algorithms on thirteen manufacturing datasets, focusing on their applicability in manufacturing. Eac…
▽ More
TSF is growing in various domains including manufacturing. Although numerous TSF algorithms have been developed recently, the validation and evaluation of algorithms hold substantial value for researchers and practitioners and are missing. This study aims to fill this gap by evaluating the SoTA TSF algorithms on thirteen manufacturing datasets, focusing on their applicability in manufacturing. Each algorithm was selected based on its TSF category to ensure a representative set of algorithms. The evaluation includes different scenarios to evaluate the models using two problem categories and two forecasting horizons. To evaluate the performance, the WAPE was calculated, and additional post hoc analyses were conducted to assess the significance of observed differences. Only algorithms with codes from open-source libraries were utilized, and no hyperparameter tuning was done. This allowed us to evaluate the algorithms as "out-of-the-box" solutions that can be easily implemented, ensuring their usability within the manufacturing by practitioners with limited technical knowledge. This aligns to facilitate the adoption of these techniques in smart manufacturing systems. Based on the results, transformer and MLP-based architectures demonstrated the best performance with MLP-based architecture winning the most scenarios. For univariate TSF, PatchTST emerged as the most robust, particularly for long-term horizons, while for multivariate problems, MLP-based architectures like N-HITS and TiDE showed superior results. The study revealed that simpler algorithms like XGBoost could outperform complex algorithms in certain tasks. These findings challenge the assumption that more sophisticated models produce better results. Additionally, the research highlighted the importance of computational resource considerations, showing variations in runtime and memory usage across different algorithms.
△ Less
Submitted 26 November, 2024;
originally announced November 2024.
-
A Data-Efficient Sequential Learning Framework for Melt Pool Defect Classification in Laser Powder Bed Fusion
Authors:
Ahmed Shoyeb Raihan,
Austin Harper,
Israt Zarin Era,
Omar Al-Shebeeb,
Thorsten Wuest,
Srinjoy Das,
Imtiaz Ahmed
Abstract:
Ensuring the quality and reliability of Metal Additive Manufacturing (MAM) components is crucial, especially in the Laser Powder Bed Fusion (L-PBF) process, where melt pool defects such as keyhole, balling, and lack of fusion can significantly compromise structural integrity. This study presents SL-RF+ (Sequentially Learned Random Forest with Enhanced Sampling), a novel Sequential Learning (SL) fr…
▽ More
Ensuring the quality and reliability of Metal Additive Manufacturing (MAM) components is crucial, especially in the Laser Powder Bed Fusion (L-PBF) process, where melt pool defects such as keyhole, balling, and lack of fusion can significantly compromise structural integrity. This study presents SL-RF+ (Sequentially Learned Random Forest with Enhanced Sampling), a novel Sequential Learning (SL) framework for melt pool defect classification designed to maximize data efficiency and model accuracy in data-scarce environments. SL-RF+ utilizes RF classifier combined with Least Confidence Sampling (LCS) and Sobol sequence-based synthetic sampling to iteratively select the most informative samples to learn from, thereby refining the model's decision boundaries with minimal labeled data. Results show that SL-RF+ outperformed traditional machine learning models across key performance metrics, including accuracy, precision, recall, and F1 score, demonstrating significant robustness in identifying melt pool defects with limited data. This framework efficiently captures complex defect patterns by focusing on high-uncertainty regions in the process parameter space, ultimately achieving superior classification performance without the need for extensive labeled datasets. While this study utilizes pre-existing experimental data, SL-RF+ shows strong potential for real-world applications in pure sequential learning settings, where data is acquired and labeled incrementally, mitigating the high costs and time constraints of sample acquisition.
△ Less
Submitted 16 November, 2024;
originally announced November 2024.
-
Diffusion-based Speech Enhancement with Schrödinger Bridge and Symmetric Noise Schedule
Authors:
Siyi Wang,
Siyi Liu,
Andrew Harper,
Paul Kendrick,
Mathieu Salzmann,
Milos Cernak
Abstract:
Recently, diffusion-based generative models have demonstrated remarkable performance in speech enhancement tasks. However, these methods still encounter challenges, including the lack of structural information and poor performance in low Signal-to-Noise Ratio (SNR) scenarios. To overcome these challenges, we propose the Schröodinger Bridge-based Speech Enhancement (SBSE) method, which learns the d…
▽ More
Recently, diffusion-based generative models have demonstrated remarkable performance in speech enhancement tasks. However, these methods still encounter challenges, including the lack of structural information and poor performance in low Signal-to-Noise Ratio (SNR) scenarios. To overcome these challenges, we propose the Schröodinger Bridge-based Speech Enhancement (SBSE) method, which learns the diffusion processes directly between the noisy input and the clean distribution, unlike conventional diffusion-based speech enhancement systems that learn data to Gaussian distributions. To enhance performance in extremely noisy conditions, we introduce a two-stage system incorporating ratio mask information into the diffusion-based generative model. Our experimental results show that our proposed SBSE method outperforms all the baseline models and achieves state-of-the-art performance, especially in low SNR conditions. Importantly, only a few inference steps are required to achieve the best result.
△ Less
Submitted 13 September, 2024; v1 submitted 8 September, 2024;
originally announced September 2024.
-
A Photonic Physically Unclonable Function's Resilience to Multiple-Valued Machine Learning Attacks
Authors:
Jessie M. Henderson,
Elena R. Henderson,
Clayton A. Harper,
Hiva Shahoei,
William V. Oxford,
Eric C. Larson,
Duncan L. MacFarlane,
Mitchell A. Thornton
Abstract:
Physically unclonable functions (PUFs) identify integrated circuits using nonlinearly-related challenge-response pairs (CRPs). Ideally, the relationship between challenges and corresponding responses is unpredictable, even if a subset of CRPs is known. Previous work developed a photonic PUF offering improved security compared to non-optical counterparts. Here, we investigate this PUF's susceptibil…
▽ More
Physically unclonable functions (PUFs) identify integrated circuits using nonlinearly-related challenge-response pairs (CRPs). Ideally, the relationship between challenges and corresponding responses is unpredictable, even if a subset of CRPs is known. Previous work developed a photonic PUF offering improved security compared to non-optical counterparts. Here, we investigate this PUF's susceptibility to Multiple-Valued-Logic-based machine learning attacks. We find that approximately 1,000 CRPs are necessary to train models that predict response bits better than random chance. Given the significant challenge of acquiring a vast number of CRPs from a photonic PUF, our results demonstrate photonic PUF resilience against such attacks.
△ Less
Submitted 2 March, 2024;
originally announced March 2024.
-
Multi-Channel MOSRA: Mean Opinion Score and Room Acoustics Estimation Using Simulated Data and a Teacher Model
Authors:
Jozef Coldenhoff,
Andrew Harper,
Paul Kendrick,
Tijana Stojkovic,
Milos Cernak
Abstract:
Previous methods for predicting room acoustic parameters and speech quality metrics have focused on the single-channel case, where room acoustics and Mean Opinion Score (MOS) are predicted for a single recording device. However, quality-based device selection for rooms with multiple recording devices may benefit from a multi-channel approach where the descriptive metrics are predicted for multiple…
▽ More
Previous methods for predicting room acoustic parameters and speech quality metrics have focused on the single-channel case, where room acoustics and Mean Opinion Score (MOS) are predicted for a single recording device. However, quality-based device selection for rooms with multiple recording devices may benefit from a multi-channel approach where the descriptive metrics are predicted for multiple devices in parallel. Following our hypothesis that a model may benefit from multi-channel training, we develop a multi-channel model for joint MOS and room acoustics prediction (MOSRA) for five channels in parallel. The lack of multi-channel audio data with ground truth labels necessitated the creation of simulated data using an acoustic simulator with room acoustic labels extracted from the generated impulse responses and labels for MOS generated in a student-teacher setup using a wav2vec2-based MOS prediction model. Our experiments show that the multi-channel model improves the prediction of the direct-to-reverberation ratio, clarity, and speech transmission index over the single-channel model with roughly 5$\times$ less computation while suffering minimal losses in the performance of the other metrics.
△ Less
Submitted 13 March, 2024; v1 submitted 21 September, 2023;
originally announced September 2023.
-
Automated Fidelity Assessment for Strategy Training in Inpatient Rehabilitation using Natural Language Processing
Authors:
Hunter Osterhoudt,
Courtney E. Schneider,
Haneef A Mohammad,
Minmei Shih,
Alexandra E. Harper,
Leming Zhou,
Elizabeth R Skidmore,
Yanshan Wang
Abstract:
Strategy training is a multidisciplinary rehabilitation approach that teaches skills to reduce disability among those with cognitive impairments following a stroke. Strategy training has been shown in randomized, controlled clinical trials to be a more feasible and efficacious intervention for promoting independence than traditional rehabilitation approaches. A standardized fidelity assessment is…
▽ More
Strategy training is a multidisciplinary rehabilitation approach that teaches skills to reduce disability among those with cognitive impairments following a stroke. Strategy training has been shown in randomized, controlled clinical trials to be a more feasible and efficacious intervention for promoting independence than traditional rehabilitation approaches. A standardized fidelity assessment is used to measure adherence to treatment principles by examining guided and directed verbal cues in video recordings of rehabilitation sessions. Although the fidelity assessment for detecting guided and directed verbal cues is valid and feasible for single-site studies, it can become labor intensive, time consuming, and expensive in large, multi-site pragmatic trials. To address this challenge to widespread strategy training implementation, we leveraged natural language processing (NLP) techniques to automate the strategy training fidelity assessment, i.e., to automatically identify guided and directed verbal cues from video recordings of rehabilitation sessions. We developed a rule-based NLP algorithm, a long-short term memory (LSTM) model, and a bidirectional encoder representation from transformers (BERT) model for this task. The best performance was achieved by the BERT model with a 0.8075 F1-score. This BERT model was verified on an external validation dataset collected from a separate major regional health system and achieved an F1 score of 0.8259, which shows that the BERT model generalizes well. The findings from this study hold widespread promise in psychology and rehabilitation intervention research and practice.
△ Less
Submitted 24 January, 2023; v1 submitted 14 September, 2022;
originally announced September 2022.
-
Simulation-based Algorithm for Determining Best Package Delivery Alternatives under Three Criteria: Time, Cost and Sustainability
Authors:
Suchithra Rajendran,
Aidan Harper
Abstract:
With the significant rise in demand for same-day instant deliveries, several courier services are exploring alternatives to transport packages in a cost- and time-effective, as well as, sustainable manner. Motivated by a real-life case study, this paper focuses on developing a simulation algorithm that assists same-day package delivery companies to serve customers instantly. The proposed recommend…
▽ More
With the significant rise in demand for same-day instant deliveries, several courier services are exploring alternatives to transport packages in a cost- and time-effective, as well as, sustainable manner. Motivated by a real-life case study, this paper focuses on developing a simulation algorithm that assists same-day package delivery companies to serve customers instantly. The proposed recommender system provides the best solution with respect to three criteria: cost, time, and sustainability, considering the variation in travel time and cost parameters. The decision support tool provides recommendations on the best alternative for transporting products based on factors, such as source and destination locations, time of the day, package weight, and volume. Besides considering existing new technologies like electric-assisted cargo bikes, we also analyze the impact of emerging methods of deliveries, such as robots and air taxis. Finally, this paper also considers the best delivery alternative during the presence of a pandemic, such as COVID-19. For the purpose of illustrating our approach, we consider the delivery options in New York City. We believe that the proposed tool is the first to provide solutions to courier companies considering evolving modes of transportation and under logistics disruptions due to pandemic.
Keywords: Instant package delivery; Courier services; Simulation algorithm; Recommender system; Emerging technologies; COVID-19 pandemic.
△ Less
Submitted 5 June, 2021;
originally announced June 2021.
-
New lower bounds for matrix multiplication and the 3x3 determinant
Authors:
Austin Conner,
Alicia Harper,
J. M. Landsberg
Abstract:
Let $M_{\langle u,v,w\rangle}\in C^{uv}\otimes C^{vw}\otimes C^{wu}$ denote the matrix multiplication tensor (and write $M_n=M_{\langle n,n,n\rangle}$) and let $det_3\in ( C^9)^{\otimes 3}$ denote the determinant polynomial considered as a tensor. For a tensor $T$, let $\underline R(T)$ denote its border rank. We (i) give the first hand-checkable algebraic proof that $\underline R(M_2)=7$,(ii) pro…
▽ More
Let $M_{\langle u,v,w\rangle}\in C^{uv}\otimes C^{vw}\otimes C^{wu}$ denote the matrix multiplication tensor (and write $M_n=M_{\langle n,n,n\rangle}$) and let $det_3\in ( C^9)^{\otimes 3}$ denote the determinant polynomial considered as a tensor. For a tensor $T$, let $\underline R(T)$ denote its border rank. We (i) give the first hand-checkable algebraic proof that $\underline R(M_2)=7$,(ii) prove $\underline R(M_{\langle 223\rangle})=10$, and $\underline R(M_{\langle 233\rangle})=14$, where previously the only nontrivial matrix multiplication tensor whose border rank had been determined was $M_2$,(iii) prove $\underline R( M_3)\geq 17$, (iv) prove $\underline R( det_3)=17$, improving the previous lower bound of $12$, (v) prove $\underline R(M_{\langle 2nn\rangle})\geq n^2+1.32n$ for all $n\geq 25$ (previously only $\underline R(M_{\langle 2nn\rangle})\geq n^2+1$ was known) as well as lower bounds for $4\leq n\leq 25$, and (vi) prove $\underline R(M_{\langle 3nn\rangle})\geq n^2+2 n+1$ for all $ n\geq 21$, where previously only $\underline R(M_{\langle 3nn\rangle})\geq n^2+2$ was known, as well as lower boundsfor $4\leq n\leq 21$.
Our results utilize a new technique initiated by Buczyńska and Buczyński, called border apolarity. The two key ingredients are: (i) the use of a multi-graded ideal associated to a border rank $r$ decomposition of any tensor, and (ii) the exploitation of the large symmetry group of $T$ to restrict to $B_T$-invariant ideals, where $B_T$ is a maximal solvable subgroup of the symmetry group of $T$.
△ Less
Submitted 18 November, 2019;
originally announced November 2019.