-
Adaptive Physics-Guided Neural Network
Authors:
David Shulman,
Itai Dattner
Abstract:
This paper introduces an adaptive physics-guided neural network (APGNN) framework for predicting quality attributes from image data by integrating physical laws into deep learning models. The APGNN adaptively balances data-driven and physics-informed predictions, enhancing model accuracy and robustness across different environments. Our approach is evaluated on both synthetic and real-world datase…
▽ More
This paper introduces an adaptive physics-guided neural network (APGNN) framework for predicting quality attributes from image data by integrating physical laws into deep learning models. The APGNN adaptively balances data-driven and physics-informed predictions, enhancing model accuracy and robustness across different environments. Our approach is evaluated on both synthetic and real-world datasets, with comparisons to conventional data-driven models such as ResNet. For the synthetic data, 2D domains were generated using three distinct governing equations: the diffusion equation, the advection-diffusion equation, and the Poisson equation. Non-linear transformations were applied to these domains to emulate complex physical processes in image form.
In real-world experiments, the APGNN consistently demonstrated superior performance in the diverse thermal image dataset. On the cucumber dataset, characterized by low material diversity and controlled conditions, APGNN and PGNN showed similar performance, both outperforming the data-driven ResNet. However, in the more complex thermal dataset, particularly for outdoor materials with higher environmental variability, APGNN outperformed both PGNN and ResNet by dynamically adjusting its reliance on physics-based versus data-driven insights. This adaptability allowed APGNN to maintain robust performance across structured, low-variability settings and more heterogeneous scenarios. These findings underscore the potential of adaptive physics-guided learning to integrate physical constraints effectively, even in challenging real-world contexts with diverse environmental conditions.
△ Less
Submitted 15 November, 2024;
originally announced November 2024.
-
Optimization Methods in Deep Learning: A Comprehensive Overview
Authors:
David Shulman
Abstract:
In recent years, deep learning has achieved remarkable success in various fields such as image recognition, natural language processing, and speech recognition. The effectiveness of deep learning largely depends on the optimization methods used to train deep neural networks. In this paper, we provide an overview of first-order optimization methods such as Stochastic Gradient Descent, Adagrad, Adad…
▽ More
In recent years, deep learning has achieved remarkable success in various fields such as image recognition, natural language processing, and speech recognition. The effectiveness of deep learning largely depends on the optimization methods used to train deep neural networks. In this paper, we provide an overview of first-order optimization methods such as Stochastic Gradient Descent, Adagrad, Adadelta, and RMSprop, as well as recent momentum-based and adaptive gradient methods such as Nesterov accelerated gradient, Adam, Nadam, AdaMax, and AMSGrad. We also discuss the challenges associated with optimization in deep learning and explore techniques for addressing these challenges, including weight initialization, batch normalization, and layer normalization. Finally, we provide recommendations for selecting optimization methods for different deep learning tasks and datasets. This paper serves as a comprehensive guide to optimization methods in deep learning and can be used as a reference for researchers and practitioners in the field.
△ Less
Submitted 24 April, 2023; v1 submitted 19 February, 2023;
originally announced February 2023.
-
SPGISpeech: 5,000 hours of transcribed financial audio for fully formatted end-to-end speech recognition
Authors:
Patrick K. O'Neill,
Vitaly Lavrukhin,
Somshubra Majumdar,
Vahid Noroozi,
Yuekai Zhang,
Oleksii Kuchaiev,
Jagadeesh Balam,
Yuliya Dovzhenko,
Keenan Freyberg,
Michael D. Shulman,
Boris Ginsburg,
Shinji Watanabe,
Georg Kucsko
Abstract:
In the English speech-to-text (STT) machine learning task, acoustic models are conventionally trained on uncased Latin characters, and any necessary orthography (such as capitalization, punctuation, and denormalization of non-standard words) is imputed by separate post-processing models. This adds complexity and limits performance, as many formatting tasks benefit from semantic information present…
▽ More
In the English speech-to-text (STT) machine learning task, acoustic models are conventionally trained on uncased Latin characters, and any necessary orthography (such as capitalization, punctuation, and denormalization of non-standard words) is imputed by separate post-processing models. This adds complexity and limits performance, as many formatting tasks benefit from semantic information present in the acoustic signal but absent in transcription. Here we propose a new STT task: end-to-end neural transcription with fully formatted text for target labels. We present baseline Conformer-based models trained on a corpus of 5,000 hours of professionally transcribed earnings calls, achieving a CER of 1.7. As a contribution to the STT research community, we release the corpus free for non-commercial use at https://datasets.kensho.com/datasets/scribe.
△ Less
Submitted 6 April, 2021; v1 submitted 5 April, 2021;
originally announced April 2021.