-
Preserving Plasticity in Continual Learning with Adaptive Linearity Injection
Authors:
Seyed Roozbeh Razavi Rohani,
Khashayar Khajavi,
Wesley Chung,
Mo Chen,
Sharan Vaswani
Abstract:
Loss of plasticity in deep neural networks is the gradual reduction in a model's capacity to incrementally learn and has been identified as a key obstacle to learning in non-stationary problem settings. Recent work has shown that deep linear networks tend to be resilient towards loss of plasticity. Motivated by this observation, we propose Adaptive Linearization (AdaLin), a general approach that d…
▽ More
Loss of plasticity in deep neural networks is the gradual reduction in a model's capacity to incrementally learn and has been identified as a key obstacle to learning in non-stationary problem settings. Recent work has shown that deep linear networks tend to be resilient towards loss of plasticity. Motivated by this observation, we propose Adaptive Linearization (AdaLin), a general approach that dynamically adapts each neuron's activation function to mitigate plasticity loss. Unlike prior methods that rely on regularization or periodic resets, AdaLin equips every neuron with a learnable parameter and a gating mechanism that injects linearity into the activation function based on its gradient flow. This adaptive modulation ensures sufficient gradient signal and sustains continual learning without introducing additional hyperparameters or requiring explicit task boundaries. When used with conventional activation functions like ReLU, Tanh, and GeLU, we demonstrate that AdaLin can significantly improve performance on standard benchmarks, including Random Label and Permuted MNIST, Random Label and Shuffled CIFAR-10, and Class-Split CIFAR-100. Furthermore, its efficacy is shown in more complex scenarios, such as class-incremental learning on CIFAR-100 with a ResNet-18 backbone, and in mitigating plasticity loss in off-policy reinforcement learning agents. We perform a systematic set of ablations that show that neuron-level adaptation is crucial for good performance and analyze a number of metrics in the network that might be correlated to loss of plasticity.
△ Less
Submitted 14 May, 2025;
originally announced May 2025.
-
MDF: Multi-Modal Data Fusion with CNN-Based Object Detection for Enhanced Indoor Localization Using LiDAR-SLAM
Authors:
Saqi Hussain Kalan,
Boon Giin Lee,
Wan-Young Chung
Abstract:
Indoor localization faces persistent challenges in achieving high accuracy, particularly in GPS-deprived environments. This study unveils a cutting-edge handheld indoor localization system that integrates 2D LiDAR and IMU sensors, delivering enhanced high-velocity precision mapping, computational efficiency, and real-time adaptability. Unlike 3D LiDAR systems, it excels with rapid processing, low-…
▽ More
Indoor localization faces persistent challenges in achieving high accuracy, particularly in GPS-deprived environments. This study unveils a cutting-edge handheld indoor localization system that integrates 2D LiDAR and IMU sensors, delivering enhanced high-velocity precision mapping, computational efficiency, and real-time adaptability. Unlike 3D LiDAR systems, it excels with rapid processing, low-cost scalability, and robust performance, setting new standards for emergency response, autonomous navigation, and industrial automation. Enhanced with a CNN-driven object detection framework and optimized through Cartographer SLAM (simultaneous localization and mapping ) in ROS, the system significantly reduces Absolute Trajectory Error (ATE) by 21.03%, achieving exceptional precision compared to state-of-the-art approaches like SC-ALOAM, with a mean x-position error of -0.884 meters (1.976 meters). The integration of CNN-based object detection ensures robustness in mapping and localization, even in cluttered or dynamic environments, outperforming existing methods by 26.09%. These advancements establish the system as a reliable, scalable solution for high-precision localization in challenging indoor scenarios
△ Less
Submitted 13 May, 2025;
originally announced May 2025.
-
On the Robustness of Reward Models for Language Model Alignment
Authors:
Jiwoo Hong,
Noah Lee,
Eunki Kim,
Guijin Son,
Woojin Chung,
Aman Gupta,
Shao Tang,
James Thorne
Abstract:
The Bradley-Terry (BT) model is widely practiced in reward modeling for reinforcement learning with human feedback (RLHF). Despite its effectiveness, reward models (RMs) trained with BT model loss are prone to over-optimization, losing generalizability to unseen input distributions. In this paper, we study the cause of over-optimization in RM training and its downstream effects on the RLHF procedu…
▽ More
The Bradley-Terry (BT) model is widely practiced in reward modeling for reinforcement learning with human feedback (RLHF). Despite its effectiveness, reward models (RMs) trained with BT model loss are prone to over-optimization, losing generalizability to unseen input distributions. In this paper, we study the cause of over-optimization in RM training and its downstream effects on the RLHF procedure, accentuating the importance of distributional robustness of RMs in unseen data. First, we show that the excessive dispersion of hidden state norms is the main source of over-optimization. Then, we propose batch-wise sum-to-zero regularization (BSR) to enforce zero-centered reward sum per batch, constraining the rewards with extreme magnitudes. We assess the impact of BSR in improving robustness in RMs through four scenarios of over-optimization, where BSR consistently manifests better robustness. Subsequently, we compare the plain BT model and BSR on RLHF training and empirically show that robust RMs better align the policy to the gold preference model. Finally, we apply BSR to high-quality data and models, which surpasses state-of-the-art RMs in the 8B scale by adding more than 5% in complex preference prediction tasks. By conducting RLOO training with 8B RMs, AlpacaEval 2.0 reduces generation length by 40% while adding a 7% increase in win rate, further highlighting that robustness in RMs induces robustness in RLHF training. We release the code, data, and models: https://github.com/LinkedIn-XFACT/RM-Robustness.
△ Less
Submitted 12 May, 2025;
originally announced May 2025.
-
InfoVids: Reimagining the Viewer Experience with Alternative Visualization-Presenter Relationships
Authors:
Ji Won Chung,
Tongyu Zhou,
Ivy Chen,
Kevin Hsu,
Ryan A. Rossi,
Alexa Siu,
Shunan Guo,
Franck Dernoncourt,
James Tompkin,
Jeff Huang
Abstract:
Traditional data presentations typically separate the presenter and visualization into two separate spaces--the 3D world and a 2D screen--enforcing visualization-centric stories. To create a more human-centric viewing experience, we establish a more equitable relationship between the visualization and the presenter through our InfoVids. These infographics-inspired informational videos are crafted…
▽ More
Traditional data presentations typically separate the presenter and visualization into two separate spaces--the 3D world and a 2D screen--enforcing visualization-centric stories. To create a more human-centric viewing experience, we establish a more equitable relationship between the visualization and the presenter through our InfoVids. These infographics-inspired informational videos are crafted to redefine relationships between the presenter and visualizations. As we design InfoVids, we explore how the use of layout, form, and interactions affects the viewer experience. We compare InfoVids against their baseline 2D `slides' equivalents across 9 metrics with 30 participants and provide practical, long-term insights from an autobiographical perspective. Our mixed methods analyses reveal that this paradigm reduced viewer attention splitting, shifted the focus from the visualization to the presenter, and led to more interactive, natural, and engaging full-body data performances for viewers. Ultimately, InfoVids helped viewers re-imagine traditional dynamics between the presenter and visualizations.
△ Less
Submitted 6 May, 2025;
originally announced May 2025.
-
StableQuant: Layer Adaptive Post-Training Quantization for Speech Foundation Models
Authors:
Yeona Hong,
Hyewon Han,
Woo-jin Chung,
Hong-Goo Kang
Abstract:
In this paper, we propose StableQuant, a novel adaptive post-training quantization (PTQ) algorithm for widely used speech foundation models (SFMs). While PTQ has been successfully employed for compressing large language models (LLMs) due to its ability to bypass additional fine-tuning, directly applying these techniques to SFMs may not yield optimal results, as SFMs utilize distinct network archit…
▽ More
In this paper, we propose StableQuant, a novel adaptive post-training quantization (PTQ) algorithm for widely used speech foundation models (SFMs). While PTQ has been successfully employed for compressing large language models (LLMs) due to its ability to bypass additional fine-tuning, directly applying these techniques to SFMs may not yield optimal results, as SFMs utilize distinct network architecture for feature extraction. StableQuant demonstrates optimal quantization performance regardless of the network architecture type, as it adaptively determines the quantization range for each layer by analyzing both the scale distributions and overall performance. We evaluate our algorithm on two SFMs, HuBERT and wav2vec2.0, for an automatic speech recognition (ASR) task, and achieve superior performance compared to traditional PTQ methods. StableQuant successfully reduces the sizes of SFM models to a quarter and doubles the inference speed while limiting the word error rate (WER) performance drop to less than 0.3% with 8-bit quantization.
△ Less
Submitted 21 April, 2025;
originally announced April 2025.
-
BrowseComp: A Simple Yet Challenging Benchmark for Browsing Agents
Authors:
Jason Wei,
Zhiqing Sun,
Spencer Papay,
Scott McKinney,
Jeffrey Han,
Isa Fulford,
Hyung Won Chung,
Alex Tachard Passos,
William Fedus,
Amelia Glaese
Abstract:
We present BrowseComp, a simple yet challenging benchmark for measuring the ability for agents to browse the web. BrowseComp comprises 1,266 questions that require persistently navigating the internet in search of hard-to-find, entangled information. Despite the difficulty of the questions, BrowseComp is simple and easy-to-use, as predicted answers are short and easily verifiable against reference…
▽ More
We present BrowseComp, a simple yet challenging benchmark for measuring the ability for agents to browse the web. BrowseComp comprises 1,266 questions that require persistently navigating the internet in search of hard-to-find, entangled information. Despite the difficulty of the questions, BrowseComp is simple and easy-to-use, as predicted answers are short and easily verifiable against reference answers. BrowseComp for browsing agents can be seen as analogous to how programming competitions are an incomplete but useful benchmark for coding agents. While BrowseComp sidesteps challenges of a true user query distribution, like generating long answers or resolving ambiguity, it measures the important core capability of exercising persistence and creativity in finding information. BrowseComp can be found at https://github.com/openai/simple-evals.
△ Less
Submitted 16 April, 2025;
originally announced April 2025.
-
NeuroAMP: A Novel End-to-end General Purpose Deep Neural Amplifier for Personalized Hearing Aids
Authors:
Shafique Ahmed,
Ryandhimas E. Zezario,
Hui-Guan Yuan,
Amir Hussain,
Hsin-Min Wang,
Wei-Ho Chung,
Yu Tsao
Abstract:
The prevalence of hearing aids is increasing. However, optimizing the amplification processes of hearing aids remains challenging due to the complexity of integrating multiple modular components in traditional methods. To address this challenge, we present NeuroAMP, a novel deep neural network designed for end-to-end, personalized amplification in hearing aids. NeuroAMP leverages both spectral fea…
▽ More
The prevalence of hearing aids is increasing. However, optimizing the amplification processes of hearing aids remains challenging due to the complexity of integrating multiple modular components in traditional methods. To address this challenge, we present NeuroAMP, a novel deep neural network designed for end-to-end, personalized amplification in hearing aids. NeuroAMP leverages both spectral features and the listener's audiogram as inputs, and we investigate four architectures: Convolutional Neural Network (CNN), Long Short-Term Memory (LSTM), Convolutional Recurrent Neural Network (CRNN), and Transformer. We also introduce Denoising NeuroAMP, an extension that integrates noise reduction along with amplification capabilities for improved performance in real-world scenarios. To enhance generalization, a comprehensive data augmentation strategy was employed during training on diverse speech (TIMIT and TMHINT) and music (Cadenza Challenge MUSIC) datasets. Evaluation using the Hearing Aid Speech Perception Index (HASPI), Hearing Aid Speech Quality Index (HASQI), and Hearing Aid Audio Quality Index (HAAQI) demonstrates that the Transformer architecture within NeuroAMP achieves the best performance, with SRCC scores of 0.9927 (HASQI) and 0.9905 (HASPI) on TIMIT, and 0.9738 (HAAQI) on the Cadenza Challenge MUSIC dataset. Notably, our data augmentation strategy maintains high performance on unseen datasets (e.g., VCTK, MUSDB18-HQ). Furthermore, Denoising NeuroAMP outperforms both the conventional NAL-R+WDRC approach and a two-stage baseline on the VoiceBank+DEMAND dataset, achieving a 10% improvement in both HASPI (0.90) and HASQI (0.59) scores. These results highlight the potential of NeuroAMP and Denoising NeuroAMP to deliver notable improvements in personalized hearing aid amplification.
△ Less
Submitted 15 February, 2025;
originally announced February 2025.
-
Online Gradient Boosting Decision Tree: In-Place Updates for Efficient Adding/Deleting Data
Authors:
Huawei Lin,
Jun Woo Chung,
Yingjie Lao,
Weijie Zhao
Abstract:
Gradient Boosting Decision Tree (GBDT) is one of the most popular machine learning models in various applications. However, in the traditional settings, all data should be simultaneously accessed in the training procedure: it does not allow to add or delete any data instances after training. In this paper, we propose an efficient online learning framework for GBDT supporting both incremental and d…
▽ More
Gradient Boosting Decision Tree (GBDT) is one of the most popular machine learning models in various applications. However, in the traditional settings, all data should be simultaneously accessed in the training procedure: it does not allow to add or delete any data instances after training. In this paper, we propose an efficient online learning framework for GBDT supporting both incremental and decremental learning. To the best of our knowledge, this is the first work that considers an in-place unified incremental and decremental learning on GBDT. To reduce the learning cost, we present a collection of optimizations for our framework, so that it can add or delete a small fraction of data on the fly. We theoretically show the relationship between the hyper-parameters of the proposed optimizations, which enables trading off accuracy and cost on incremental and decremental learning. The backdoor attack results show that our framework can successfully inject and remove backdoor in a well-trained model using incremental and decremental learning, and the empirical results on public datasets confirm the effectiveness and efficiency of our proposed online learning framework and optimizations.
△ Less
Submitted 3 February, 2025;
originally announced February 2025.
-
Exact Matching in Correlated Networks with Node Attributes for Improved Community Recovery
Authors:
Joonhyuk Yang,
Hye Won Chung
Abstract:
We study community detection in multiple networks whose nodes and edges are jointly correlated. This setting arises naturally in applications such as social platforms, where a shared set of users may exhibit both correlated friendship patterns and correlated attributes across different platforms. Extending the classical Stochastic Block Model (SBM) and its contextual counterpart (CSBM), we introdu…
▽ More
We study community detection in multiple networks whose nodes and edges are jointly correlated. This setting arises naturally in applications such as social platforms, where a shared set of users may exhibit both correlated friendship patterns and correlated attributes across different platforms. Extending the classical Stochastic Block Model (SBM) and its contextual counterpart (CSBM), we introduce the correlated CSBM, which incorporates structural and attribute correlations across graphs. To build intuition, we first analyze correlated Gaussian Mixture Models, wherein only correlated node attributes are available without edges, and identify the conditions under which an estimator minimizing the distance between attributes achieves exact matching of nodes across the two databases. For correlated CSBMs, we develop a two-step procedure that first applies $k$-core matching to most nodes using edge information, then refines the matching for the remaining unmatched nodes by leveraging their attributes with a distance-based estimator. We identify the conditions under which the algorithm recovers the exact node correspondence, enabling us to merge the correlated edges and average the correlated attributes for enhanced community detection. Crucially, by aligning and combining graphs, we identify regimes in which community detection is impossible in a single graph but becomes feasible when side information from correlated graphs is incorporated. Our results illustrate how the interplay between graph matching and community recovery can boost performance, broadening the scope of multi-graph, attribute-based community detection.
△ Less
Submitted 6 January, 2025;
originally announced January 2025.
-
OpenAI o1 System Card
Authors:
OpenAI,
:,
Aaron Jaech,
Adam Kalai,
Adam Lerer,
Adam Richardson,
Ahmed El-Kishky,
Aiden Low,
Alec Helyar,
Aleksander Madry,
Alex Beutel,
Alex Carney,
Alex Iftimie,
Alex Karpenko,
Alex Tachard Passos,
Alexander Neitz,
Alexander Prokofiev,
Alexander Wei,
Allison Tam,
Ally Bennett,
Ananya Kumar,
Andre Saraiva,
Andrea Vallone,
Andrew Duberstein,
Andrew Kondrich
, et al. (238 additional authors not shown)
Abstract:
The o1 model series is trained with large-scale reinforcement learning to reason using chain of thought. These advanced reasoning capabilities provide new avenues for improving the safety and robustness of our models. In particular, our models can reason about our safety policies in context when responding to potentially unsafe prompts, through deliberative alignment. This leads to state-of-the-ar…
▽ More
The o1 model series is trained with large-scale reinforcement learning to reason using chain of thought. These advanced reasoning capabilities provide new avenues for improving the safety and robustness of our models. In particular, our models can reason about our safety policies in context when responding to potentially unsafe prompts, through deliberative alignment. This leads to state-of-the-art performance on certain benchmarks for risks such as generating illicit advice, choosing stereotyped responses, and succumbing to known jailbreaks. Training models to incorporate a chain of thought before answering has the potential to unlock substantial benefits, while also increasing potential risks that stem from heightened intelligence. Our results underscore the need for building robust alignment methods, extensively stress-testing their efficacy, and maintaining meticulous risk management protocols. This report outlines the safety work carried out for the OpenAI o1 and OpenAI o1-mini models, including safety evaluations, external red teaming, and Preparedness Framework evaluations.
△ Less
Submitted 21 December, 2024;
originally announced December 2024.
-
Deliberative Alignment: Reasoning Enables Safer Language Models
Authors:
Melody Y. Guan,
Manas Joglekar,
Eric Wallace,
Saachi Jain,
Boaz Barak,
Alec Helyar,
Rachel Dias,
Andrea Vallone,
Hongyu Ren,
Jason Wei,
Hyung Won Chung,
Sam Toyer,
Johannes Heidecke,
Alex Beutel,
Amelia Glaese
Abstract:
As large-scale language models increasingly impact safety-critical domains, ensuring their reliable adherence to well-defined principles remains a fundamental challenge. We introduce Deliberative Alignment, a new paradigm that directly teaches the model safety specifications and trains it to explicitly recall and accurately reason over the specifications before answering. We used this approach to…
▽ More
As large-scale language models increasingly impact safety-critical domains, ensuring their reliable adherence to well-defined principles remains a fundamental challenge. We introduce Deliberative Alignment, a new paradigm that directly teaches the model safety specifications and trains it to explicitly recall and accurately reason over the specifications before answering. We used this approach to align OpenAI's o-series models, and achieved highly precise adherence to OpenAI's safety policies, without requiring human-written chain-of-thoughts or answers. Deliberative Alignment pushes the Pareto frontier by simultaneously increasing robustness to jailbreaks while decreasing overrefusal rates, and also improves out-of-distribution generalization. We demonstrate that reasoning over explicitly specified policies enables more scalable, trustworthy, and interpretable alignment.
△ Less
Submitted 8 January, 2025; v1 submitted 20 December, 2024;
originally announced December 2024.
-
LAMA-UT: Language Agnostic Multilingual ASR through Orthography Unification and Language-Specific Transliteration
Authors:
Sangmin Lee,
Woo-Jin Chung,
Hong-Goo Kang
Abstract:
Building a universal multilingual automatic speech recognition (ASR) model that performs equitably across languages has long been a challenge due to its inherent difficulties. To address this task we introduce a Language-Agnostic Multilingual ASR pipeline through orthography Unification and language-specific Transliteration (LAMA-UT). LAMA-UT operates without any language-specific modules while ma…
▽ More
Building a universal multilingual automatic speech recognition (ASR) model that performs equitably across languages has long been a challenge due to its inherent difficulties. To address this task we introduce a Language-Agnostic Multilingual ASR pipeline through orthography Unification and language-specific Transliteration (LAMA-UT). LAMA-UT operates without any language-specific modules while matching the performance of state-of-the-art models trained on a minimal amount of data. Our pipeline consists of two key steps. First, we utilize a universal transcription generator to unify orthographic features into Romanized form and capture common phonetic characteristics across diverse languages. Second, we utilize a universal converter to transform these universal transcriptions into language-specific ones. In experiments, we demonstrate the effectiveness of our proposed method leveraging universal transcriptions for massively multilingual ASR. Our pipeline achieves a relative error reduction rate of 45% when compared to Whisper and performs comparably to MMS, despite being trained on only 0.1% of Whisper's training data. Furthermore, our pipeline does not rely on any language-specific modules. However, it performs on par with zero-shot ASR approaches which utilize additional language-specific lexicons and language models. We expect this framework to serve as a cornerstone for flexible multilingual ASR systems that are generalizable even to unseen languages.
△ Less
Submitted 22 December, 2024; v1 submitted 19 December, 2024;
originally announced December 2024.
-
A Staged Deep Learning Approach to Spatial Refinement in 3D Temporal Atmospheric Transport
Authors:
M. Giselle Fernández-Godino,
Wai Tong Chung,
Akshay A. Gowardhan,
Matthias Ihme,
Qingkai Kong,
Donald D. Lucas,
Stephen C. Myers
Abstract:
High-resolution spatiotemporal simulations effectively capture the complexities of atmospheric plume dispersion in complex terrain. However, their high computational cost makes them impractical for applications requiring rapid responses or iterative processes, such as optimization, uncertainty quantification, or inverse modeling. To address this challenge, this work introduces the Dual-Stage Tempo…
▽ More
High-resolution spatiotemporal simulations effectively capture the complexities of atmospheric plume dispersion in complex terrain. However, their high computational cost makes them impractical for applications requiring rapid responses or iterative processes, such as optimization, uncertainty quantification, or inverse modeling. To address this challenge, this work introduces the Dual-Stage Temporal Three-dimensional UNet Super-resolution (DST3D-UNet-SR) model, a highly efficient deep learning model for plume dispersion prediction. DST3D-UNet-SR is composed of two sequential modules: the temporal module (TM), which predicts the transient evolution of a plume in complex terrain from low-resolution temporal data, and the spatial refinement module (SRM), which subsequently enhances the spatial resolution of the TM predictions. We train DST3DUNet- SR using a comprehensive dataset derived from high-resolution large eddy simulations (LES) of plume transport. We propose the DST3D-UNet-SR model to significantly accelerate LES simulations of three-dimensional plume dispersion by three orders of magnitude. Additionally, the model demonstrates the ability to dynamically adapt to evolving conditions through the incorporation of new observational data, substantially improving prediction accuracy in high-concentration regions near the source.
Keywords: Atmospheric sciences, Geosciences, Plume transport,3D temporal sequences, Artificial intelligence, CNN, LSTM, Autoencoder, Autoregressive model, U-Net, Super-resolution, Spatial Refinement.
△ Less
Submitted 18 December, 2024; v1 submitted 14 December, 2024;
originally announced December 2024.
-
Parseval Regularization for Continual Reinforcement Learning
Authors:
Wesley Chung,
Lynn Cherif,
David Meger,
Doina Precup
Abstract:
Loss of plasticity, trainability loss, and primacy bias have been identified as issues arising when training deep neural networks on sequences of tasks -- all referring to the increased difficulty in training on new tasks. We propose to use Parseval regularization, which maintains orthogonality of weight matrices, to preserve useful optimization properties and improve training in a continual reinf…
▽ More
Loss of plasticity, trainability loss, and primacy bias have been identified as issues arising when training deep neural networks on sequences of tasks -- all referring to the increased difficulty in training on new tasks. We propose to use Parseval regularization, which maintains orthogonality of weight matrices, to preserve useful optimization properties and improve training in a continual reinforcement learning setting. We show that it provides significant benefits to RL agents on a suite of gridworld, CARL and MetaWorld tasks. We conduct comprehensive ablations to identify the source of its benefits and investigate the effect of certain metrics associated to network trainability including weight matrix rank, weight norms and policy entropy.
△ Less
Submitted 10 December, 2024;
originally announced December 2024.
-
Label Distribution Shift-Aware Prediction Refinement for Test-Time Adaptation
Authors:
Minguk Jang,
Hye Won Chung
Abstract:
Test-time adaptation (TTA) is an effective approach to mitigate performance degradation of trained models when encountering input distribution shifts at test time. However, existing TTA methods often suffer significant performance drops when facing additional class distribution shifts. We first analyze TTA methods under label distribution shifts and identify the presence of class-wise confusion pa…
▽ More
Test-time adaptation (TTA) is an effective approach to mitigate performance degradation of trained models when encountering input distribution shifts at test time. However, existing TTA methods often suffer significant performance drops when facing additional class distribution shifts. We first analyze TTA methods under label distribution shifts and identify the presence of class-wise confusion patterns commonly observed across different covariate shifts. Based on this observation, we introduce label Distribution shift-Aware prediction Refinement for Test-time adaptation (DART), a novel TTA method that refines the predictions by focusing on class-wise confusion patterns. DART trains a prediction refinement module during an intermediate time by exposing it to several batches with diverse class distributions using the training dataset. This module is then used during test time to detect and correct class distribution shifts, significantly improving pseudo-label accuracy for test data. Our method exhibits 5-18% gains in accuracy under label distribution shifts on CIFAR-10C, without any performance degradation when there is no label distribution shift. Extensive experiments on CIFAR, PACS, OfficeHome, and ImageNet benchmarks demonstrate DART's ability to correct inaccurate predictions caused by test-time distribution shifts. This improvement leads to enhanced performance in existing TTA methods, making DART a valuable plug-in tool.
△ Less
Submitted 4 February, 2025; v1 submitted 20 November, 2024;
originally announced November 2024.
-
OneNet: A Channel-Wise 1D Convolutional U-Net
Authors:
Sanghyun Byun,
Kayvan Shah,
Ayushi Gang,
Christopher Apton,
Jacob Song,
Woo Seong Chung
Abstract:
Many state-of-the-art computer vision architectures leverage U-Net for its adaptability and efficient feature extraction. However, the multi-resolution convolutional design often leads to significant computational demands, limiting deployment on edge devices. We present a streamlined alternative: a 1D convolutional encoder that retains accuracy while enhancing its suitability for edge applications…
▽ More
Many state-of-the-art computer vision architectures leverage U-Net for its adaptability and efficient feature extraction. However, the multi-resolution convolutional design often leads to significant computational demands, limiting deployment on edge devices. We present a streamlined alternative: a 1D convolutional encoder that retains accuracy while enhancing its suitability for edge applications. Our novel encoder architecture achieves semantic segmentation through channel-wise 1D convolutions combined with pixel-unshuffle operations. By incorporating PixelShuffle, known for improving accuracy in super-resolution tasks while reducing computational load, OneNet captures spatial relationships without requiring 2D convolutions, reducing parameters by up to 47%. Additionally, we explore a fully 1D encoder-decoder that achieves a 71% reduction in size, albeit with some accuracy loss. We benchmark our approach against U-Net variants across diverse mask-generation tasks, demonstrating that it preserves accuracy effectively. Although focused on image segmentation, this architecture is adaptable to other convolutional applications. Code for the project is available at https://github.com/shbyun080/OneNet .
△ Less
Submitted 14 November, 2024;
originally announced November 2024.
-
Continuous GNN-based Anomaly Detection on Edge using Efficient Adaptive Knowledge Graph Learning
Authors:
Sanggeon Yun,
Ryozo Masukawa,
William Youngwoo Chung,
Minhyoung Na,
Nathaniel Bastian,
Mohsen Imani
Abstract:
The increasing demand for robust security solutions across various industries has made Video Anomaly Detection (VAD) a critical task in applications such as intelligent surveillance, evidence investigation, and violence detection. Traditional approaches to VAD often rely on finetuning large pre-trained models, which can be computationally expensive and impractical for real-time or resource-constra…
▽ More
The increasing demand for robust security solutions across various industries has made Video Anomaly Detection (VAD) a critical task in applications such as intelligent surveillance, evidence investigation, and violence detection. Traditional approaches to VAD often rely on finetuning large pre-trained models, which can be computationally expensive and impractical for real-time or resource-constrained environments. To address this, MissionGNN introduced a more efficient method by training a graph neural network (GNN) using a fixed knowledge graph (KG) derived from large language models (LLMs) like GPT-4. While this approach demonstrated significant efficiency in computational power and memory, it faces limitations in dynamic environments where frequent updates to the KG are necessary due to evolving behavior trends and shifting data patterns. These updates typically require cloud-based computation, posing challenges for edge computing applications. In this paper, we propose a novel framework that facilitates continuous KG adaptation directly on edge devices, overcoming the limitations of cloud dependency. Our method dynamically modifies the KG through a three-phase process: pruning, alternating, and creating nodes, enabling real-time adaptation to changing data trends. This continuous learning approach enhances the robustness of anomaly detection models, making them more suitable for deployment in dynamic and resource-constrained environments.
△ Less
Submitted 13 January, 2025; v1 submitted 13 November, 2024;
originally announced November 2024.
-
Measuring short-form factuality in large language models
Authors:
Jason Wei,
Nguyen Karina,
Hyung Won Chung,
Yunxin Joy Jiao,
Spencer Papay,
Amelia Glaese,
John Schulman,
William Fedus
Abstract:
We present SimpleQA, a benchmark that evaluates the ability of language models to answer short, fact-seeking questions. We prioritized two properties in designing this eval. First, SimpleQA is challenging, as it is adversarially collected against GPT-4 responses. Second, responses are easy to grade, because questions are created such that there exists only a single, indisputable answer. Each answe…
▽ More
We present SimpleQA, a benchmark that evaluates the ability of language models to answer short, fact-seeking questions. We prioritized two properties in designing this eval. First, SimpleQA is challenging, as it is adversarially collected against GPT-4 responses. Second, responses are easy to grade, because questions are created such that there exists only a single, indisputable answer. Each answer in SimpleQA is graded as either correct, incorrect, or not attempted. A model with ideal behavior would get as many questions correct as possible while not attempting the questions for which it is not confident it knows the correct answer. SimpleQA is a simple, targeted evaluation for whether models "know what they know," and our hope is that this benchmark will remain relevant for the next few generations of frontier models. SimpleQA can be found at https://github.com/openai/simple-evals.
△ Less
Submitted 6 November, 2024;
originally announced November 2024.
-
MultiDepth: Multi-Sample Priors for Refining Monocular Metric Depth Estimations in Indoor Scenes
Authors:
Sanghyun Byun,
Jacob Song,
Woo Seong Chung
Abstract:
Monocular metric depth estimation (MMDE) is a crucial task to solve for indoor scene reconstruction on edge devices. Despite this importance, existing models are sensitive to factors such as boundary frequency of objects in the scene and scene complexity, failing to fully capture many indoor scenes. In this work, we propose to close this gap through the task of monocular metric depth refinement (M…
▽ More
Monocular metric depth estimation (MMDE) is a crucial task to solve for indoor scene reconstruction on edge devices. Despite this importance, existing models are sensitive to factors such as boundary frequency of objects in the scene and scene complexity, failing to fully capture many indoor scenes. In this work, we propose to close this gap through the task of monocular metric depth refinement (MMDR) by leveraging state-of-the-art MMDE models. MultiDepth proposes a solution by taking samples of the image along with the initial depth map prediction made by a pre-trained MMDE model. Compared to existing iterative depth refinement techniques, MultiDepth does not employ normal map prediction as part of its architecture, effectively lowering the model size and computation overhead while outputting impactful changes from refining iterations. MultiDepth implements a lightweight encoder-decoder architecture for the refinement network, processing multiple samples from the given image, including segmentation masking. We evaluate MultiDepth on four datasets and compare them to state-of-the-art methods to demonstrate its effective refinement with minimal overhead, displaying accuracy improvement upward of 45%.
△ Less
Submitted 1 November, 2024;
originally announced November 2024.
-
GPT-4o System Card
Authors:
OpenAI,
:,
Aaron Hurst,
Adam Lerer,
Adam P. Goucher,
Adam Perelman,
Aditya Ramesh,
Aidan Clark,
AJ Ostrow,
Akila Welihinda,
Alan Hayes,
Alec Radford,
Aleksander Mądry,
Alex Baker-Whitcomb,
Alex Beutel,
Alex Borzunov,
Alex Carney,
Alex Chow,
Alex Kirillov,
Alex Nichol,
Alex Paino,
Alex Renzin,
Alex Tachard Passos,
Alexander Kirillov,
Alexi Christakis
, et al. (395 additional authors not shown)
Abstract:
GPT-4o is an autoregressive omni model that accepts as input any combination of text, audio, image, and video, and generates any combination of text, audio, and image outputs. It's trained end-to-end across text, vision, and audio, meaning all inputs and outputs are processed by the same neural network. GPT-4o can respond to audio inputs in as little as 232 milliseconds, with an average of 320 mil…
▽ More
GPT-4o is an autoregressive omni model that accepts as input any combination of text, audio, image, and video, and generates any combination of text, audio, and image outputs. It's trained end-to-end across text, vision, and audio, meaning all inputs and outputs are processed by the same neural network. GPT-4o can respond to audio inputs in as little as 232 milliseconds, with an average of 320 milliseconds, which is similar to human response time in conversation. It matches GPT-4 Turbo performance on text in English and code, with significant improvement on text in non-English languages, while also being much faster and 50\% cheaper in the API. GPT-4o is especially better at vision and audio understanding compared to existing models. In line with our commitment to building AI safely and consistent with our voluntary commitments to the White House, we are sharing the GPT-4o System Card, which includes our Preparedness Framework evaluations. In this System Card, we provide a detailed look at GPT-4o's capabilities, limitations, and safety evaluations across multiple categories, focusing on speech-to-speech while also evaluating text and image capabilities, and measures we've implemented to ensure the model is safe and aligned. We also include third-party assessments on dangerous capabilities, as well as discussion of potential societal impacts of GPT-4o's text and vision capabilities.
△ Less
Submitted 25 October, 2024;
originally announced October 2024.
-
Satellite Streaming Video QoE Prediction: A Real-World Subjective Database and Network-Level Prediction Models
Authors:
Bowen Chen,
Zaixi Shang,
Jae Won Chung,
David Lerner,
Werner Robitza,
Rakesh Rao Ramachandra Rao,
Alexander Raake,
Alan C. Bovik
Abstract:
Demand for streaming services, including satellite, continues to exhibit unprecedented growth. Internet Service Providers find themselves at the crossroads of technological advancements and rising customer expectations. To stay relevant and competitive, these ISPs must ensure their networks deliver optimal video streaming quality, a key determinant of user satisfaction. Towards this end, it is imp…
▽ More
Demand for streaming services, including satellite, continues to exhibit unprecedented growth. Internet Service Providers find themselves at the crossroads of technological advancements and rising customer expectations. To stay relevant and competitive, these ISPs must ensure their networks deliver optimal video streaming quality, a key determinant of user satisfaction. Towards this end, it is important to have accurate Quality of Experience prediction models in place. However, achieving robust performance by these models requires extensive data sets labeled by subjective opinion scores on videos impaired by diverse playback disruptions. To bridge this data gap, we introduce the LIVE-Viasat Real-World Satellite QoE Database. This database consists of 179 videos recorded from real-world streaming services affected by various authentic distortion patterns. We also conducted a comprehensive subjective study involving 54 participants, who contributed both continuous-time opinion scores and endpoint (retrospective) QoE scores. Our analysis sheds light on various determinants influencing subjective QoE, such as stall events, spatial resolutions, bitrate, and certain network parameters. We demonstrate the usefulness of this unique new resource by evaluating the efficacy of prevalent QoE-prediction models on it. We also created a new model that maps the network parameters to predicted human perception scores, which can be used by ISPs to optimize the video streaming quality of their networks. Our proposed model, which we call SatQA, is able to accurately predict QoE using only network parameters, without any access to pixel data or video-specific metadata, estimated by Spearman's Rank Order Correlation Coefficient (SROCC), Pearson Linear Correlation Coefficient (PLCC), and Root Mean Squared Error (RMSE), indicating high accuracy and reliability.
△ Less
Submitted 17 October, 2024;
originally announced October 2024.
-
Improved deep learning of chaotic dynamical systems with multistep penalty losses
Authors:
Dibyajyoti Chakraborty,
Seung Whan Chung,
Ashesh Chattopadhyay,
Romit Maulik
Abstract:
Predicting the long-term behavior of chaotic systems remains a formidable challenge due to their extreme sensitivity to initial conditions and the inherent limitations of traditional data-driven modeling approaches. This paper introduces a novel framework that addresses these challenges by leveraging the recently proposed multi-step penalty (MP) optimization technique. Our approach extends the app…
▽ More
Predicting the long-term behavior of chaotic systems remains a formidable challenge due to their extreme sensitivity to initial conditions and the inherent limitations of traditional data-driven modeling approaches. This paper introduces a novel framework that addresses these challenges by leveraging the recently proposed multi-step penalty (MP) optimization technique. Our approach extends the applicability of MP optimization to a wide range of deep learning architectures, including Fourier Neural Operators and UNETs. By introducing penalized local discontinuities in the forecast trajectory, we effectively handle the non-convexity of loss landscapes commonly encountered in training neural networks for chaotic systems. We demonstrate the effectiveness of our method through its application to two challenging use-cases: the prediction of flow velocity evolution in two-dimensional turbulence and ocean dynamics using reanalysis data. Our results highlight the potential of this approach for accurate and stable long-term prediction of chaotic dynamics, paving the way for new advancements in data-driven modeling of complex natural phenomena.
△ Less
Submitted 7 October, 2024;
originally announced October 2024.
-
SoccerNet 2024 Challenges Results
Authors:
Anthony Cioppa,
Silvio Giancola,
Vladimir Somers,
Victor Joos,
Floriane Magera,
Jan Held,
Seyed Abolfazl Ghasemzadeh,
Xin Zhou,
Karolina Seweryn,
Mateusz Kowalczyk,
Zuzanna Mróz,
Szymon Łukasik,
Michał Hałoń,
Hassan Mkhallati,
Adrien Deliège,
Carlos Hinojosa,
Karen Sanchez,
Amir M. Mansourian,
Pierre Miralles,
Olivier Barnich,
Christophe De Vleeschouwer,
Alexandre Alahi,
Bernard Ghanem,
Marc Van Droogenbroeck,
Adam Gorski
, et al. (59 additional authors not shown)
Abstract:
The SoccerNet 2024 challenges represent the fourth annual video understanding challenges organized by the SoccerNet team. These challenges aim to advance research across multiple themes in football, including broadcast video understanding, field understanding, and player understanding. This year, the challenges encompass four vision-based tasks. (1) Ball Action Spotting, focusing on precisely loca…
▽ More
The SoccerNet 2024 challenges represent the fourth annual video understanding challenges organized by the SoccerNet team. These challenges aim to advance research across multiple themes in football, including broadcast video understanding, field understanding, and player understanding. This year, the challenges encompass four vision-based tasks. (1) Ball Action Spotting, focusing on precisely localizing when and which soccer actions related to the ball occur, (2) Dense Video Captioning, focusing on describing the broadcast with natural language and anchored timestamps, (3) Multi-View Foul Recognition, a novel task focusing on analyzing multiple viewpoints of a potential foul incident to classify whether a foul occurred and assess its severity, (4) Game State Reconstruction, another novel task focusing on reconstructing the game state from broadcast videos onto a 2D top-view map of the field. Detailed information about the tasks, challenges, and leaderboards can be found at https://www.soccer-net.org, with baselines and development kits available at https://github.com/SoccerNet.
△ Less
Submitted 16 September, 2024;
originally announced September 2024.
-
Stable Language Model Pre-training by Reducing Embedding Variability
Authors:
Woojin Chung,
Jiwoo Hong,
Na Min An,
James Thorne,
Se-Young Yun
Abstract:
Stable pre-training is essential for achieving better-performing language models. However, tracking pre-training stability by calculating gradient variance at every step is impractical due to the significant computational costs. We explore Token Embedding Variability (TEV) as a simple and efficient proxy for assessing pre-training stability in language models with pre-layer normalization, given th…
▽ More
Stable pre-training is essential for achieving better-performing language models. However, tracking pre-training stability by calculating gradient variance at every step is impractical due to the significant computational costs. We explore Token Embedding Variability (TEV) as a simple and efficient proxy for assessing pre-training stability in language models with pre-layer normalization, given that shallower layers are more prone to gradient explosion (section 2.2). Moreover, we propose Multi-head Low-Rank Attention (MLRA) as an architecture to alleviate such instability by limiting the exponential growth of output embedding variance, thereby preventing the gradient explosion (section 3.2). Empirical results on GPT-2 with MLRA demonstrate increased stability and lower perplexity, particularly in deeper models.
△ Less
Submitted 12 September, 2024;
originally announced September 2024.
-
Representation Norm Amplification for Out-of-Distribution Detection in Long-Tail Learning
Authors:
Dong Geun Shin,
Hye Won Chung
Abstract:
Detecting out-of-distribution (OOD) samples is a critical task for reliable machine learning. However, it becomes particularly challenging when the models are trained on long-tailed datasets, as the models often struggle to distinguish tail-class in-distribution samples from OOD samples. We examine the main challenges in this problem by identifying the trade-offs between OOD detection and in-distr…
▽ More
Detecting out-of-distribution (OOD) samples is a critical task for reliable machine learning. However, it becomes particularly challenging when the models are trained on long-tailed datasets, as the models often struggle to distinguish tail-class in-distribution samples from OOD samples. We examine the main challenges in this problem by identifying the trade-offs between OOD detection and in-distribution (ID) classification, faced by existing methods. We then introduce our method, called \textit{Representation Norm Amplification} (RNA), which solves this challenge by decoupling the two problems. The main idea is to use the norm of the representation as a new dimension for OOD detection, and to develop a training method that generates a noticeable discrepancy in the representation norm between ID and OOD data, while not perturbing the feature learning for ID classification. Our experiments show that RNA achieves superior performance in both OOD detection and classification compared to the state-of-the-art methods, by 1.70\% and 9.46\% in FPR95 and 2.43\% and 6.87\% in classification accuracy on CIFAR10-LT and ImageNet-LT, respectively. The code for this work is available at https://github.com/dgshin21/RNA.
△ Less
Submitted 20 August, 2024;
originally announced August 2024.
-
Exact Graph Matching in Correlated Gaussian-Attributed Erdős-Rényi Model
Authors:
Joonhyuk Yang,
Hye Won Chung
Abstract:
Graph matching problem aims to identify node correspondence between two or more correlated graphs. Previous studies have primarily focused on models where only edge information is provided. However, in many social networks, not only the relationships between users, represented by edges, but also their personal information, represented by features, are present. In this paper, we address the challen…
▽ More
Graph matching problem aims to identify node correspondence between two or more correlated graphs. Previous studies have primarily focused on models where only edge information is provided. However, in many social networks, not only the relationships between users, represented by edges, but also their personal information, represented by features, are present. In this paper, we address the challenge of identifying node correspondence in correlated graphs, where additional node features exist, as in many real-world settings. We propose a two-step procedure, where we initially match a subset of nodes only using edge information, and then match the remaining nodes using node features. We derive information-theoretic limits for exact graph matching on this model. Our approach provides a comprehensive solution to the real-world graph matching problem by providing systematic ways to utilize both edge and node information for exact matching of the graphs.
△ Less
Submitted 17 July, 2024;
originally announced July 2024.
-
Optimization of DNN-based speaker verification model through efficient quantization technique
Authors:
Yeona Hong,
Woo-Jin Chung,
Hong-Goo Kang
Abstract:
As Deep Neural Networks (DNNs) rapidly advance in various fields, including speech verification, they typically involve high computational costs and substantial memory consumption, which can be challenging to manage on mobile systems. Quantization of deep models offers a means to reduce both computational and memory expenses. Our research proposes an optimization framework for the quantization of…
▽ More
As Deep Neural Networks (DNNs) rapidly advance in various fields, including speech verification, they typically involve high computational costs and substantial memory consumption, which can be challenging to manage on mobile systems. Quantization of deep models offers a means to reduce both computational and memory expenses. Our research proposes an optimization framework for the quantization of the speaker verification model. By analyzing performance changes and model size reductions in each layer of a pre-trained speaker verification model, we have effectively minimized performance degradation while significantly reducing the model size. Our quantization algorithm is the first attempt to maintain the performance of the state-of-the-art pre-trained speaker verification model, ECAPATDNN, while significantly compressing its model size. Overall, our quantization approach resulted in reducing the model size by half, with an increase in EER limited to 0.07%.
△ Less
Submitted 12 July, 2024;
originally announced July 2024.
-
Divide And Conquer: Learning Chaotic Dynamical Systems With Multistep Penalty Neural Ordinary Differential Equations
Authors:
Dibyajyoti Chakraborty,
Seung Whan Chung,
Troy Arcomano,
Romit Maulik
Abstract:
Forecasting high-dimensional dynamical systems is a fundamental challenge in various fields, such as geosciences and engineering. Neural Ordinary Differential Equations (NODEs), which combine the power of neural networks and numerical solvers, have emerged as a promising algorithm for forecasting complex nonlinear dynamical systems. However, classical techniques used for NODE training are ineffect…
▽ More
Forecasting high-dimensional dynamical systems is a fundamental challenge in various fields, such as geosciences and engineering. Neural Ordinary Differential Equations (NODEs), which combine the power of neural networks and numerical solvers, have emerged as a promising algorithm for forecasting complex nonlinear dynamical systems. However, classical techniques used for NODE training are ineffective for learning chaotic dynamical systems. In this work, we propose a novel NODE-training approach that allows for robust learning of chaotic dynamical systems. Our method addresses the challenges of non-convexity and exploding gradients associated with underlying chaotic dynamics. Training data trajectories from such systems are split into multiple, non-overlapping time windows. In addition to the deviation from the training data, the optimization loss term further penalizes the discontinuities of the predicted trajectory between the time windows. The window size is selected based on the fastest Lyapunov time scale of the system. Multi-step penalty(MP) method is first demonstrated on Lorenz equation, to illustrate how it improves the loss landscape and thereby accelerates the optimization convergence. MP method can optimize chaotic systems in a manner similar to least-squares shadowing with significantly lower computational costs. Our proposed algorithm, denoted the Multistep Penalty NODE, is applied to chaotic systems such as the Kuramoto-Sivashinsky equation, the two-dimensional Kolmogorov flow, and ERA5 reanalysis data for the atmosphere. It is observed that MP-NODE provide viable performance for such chaotic systems, not only for short-term trajectory predictions but also for invariant statistics that are hallmarks of the chaotic nature of these dynamics.
△ Less
Submitted 15 October, 2024; v1 submitted 29 June, 2024;
originally announced July 2024.
-
SelMatch: Effectively Scaling Up Dataset Distillation via Selection-Based Initialization and Partial Updates by Trajectory Matching
Authors:
Yongmin Lee,
Hye Won Chung
Abstract:
Dataset distillation aims to synthesize a small number of images per class (IPC) from a large dataset to approximate full dataset training with minimal performance loss. While effective in very small IPC ranges, many distillation methods become less effective, even underperforming random sample selection, as IPC increases. Our examination of state-of-the-art trajectory-matching based distillation…
▽ More
Dataset distillation aims to synthesize a small number of images per class (IPC) from a large dataset to approximate full dataset training with minimal performance loss. While effective in very small IPC ranges, many distillation methods become less effective, even underperforming random sample selection, as IPC increases. Our examination of state-of-the-art trajectory-matching based distillation methods across various IPC scales reveals that these methods struggle to incorporate the complex, rare features of harder samples into the synthetic dataset even with the increased IPC, resulting in a persistent coverage gap between easy and hard test samples. Motivated by such observations, we introduce SelMatch, a novel distillation method that effectively scales with IPC. SelMatch uses selection-based initialization and partial updates through trajectory matching to manage the synthetic dataset's desired difficulty level tailored to IPC scales. When tested on CIFAR-10/100 and TinyImageNet, SelMatch consistently outperforms leading selection-only and distillation-only methods across subset ratios from 5% to 30%.
△ Less
Submitted 28 May, 2024;
originally announced June 2024.
-
Speaker-Independent Acoustic-to-Articulatory Inversion through Multi-Channel Attention Discriminator
Authors:
Woo-Jin Chung,
Hong-Goo Kang
Abstract:
We present a novel speaker-independent acoustic-to-articulatory inversion (AAI) model, overcoming the limitations observed in conventional AAI models that rely on acoustic features derived from restricted datasets. To address these challenges, we leverage representations from a pre-trained self-supervised learning (SSL) model to more effectively estimate the global, local, and kinematic pattern in…
▽ More
We present a novel speaker-independent acoustic-to-articulatory inversion (AAI) model, overcoming the limitations observed in conventional AAI models that rely on acoustic features derived from restricted datasets. To address these challenges, we leverage representations from a pre-trained self-supervised learning (SSL) model to more effectively estimate the global, local, and kinematic pattern information in Electromagnetic Articulography (EMA) signals during the AAI process. We train our model using an adversarial approach and introduce an attention-based Multi-duration phoneme discriminator (MDPD) designed to fully capture the intricate relationship among multi-channel articulatory signals. Our method achieves a Pearson correlation coefficient of 0.847, marking state-of-the-art performance in speaker-independent AAI models. The implementation details and code can be found online.
△ Less
Submitted 25 June, 2024;
originally announced June 2024.
-
BWS: Best Window Selection Based on Sample Scores for Data Pruning across Broad Ranges
Authors:
Hoyong Choi,
Nohyun Ki,
Hye Won Chung
Abstract:
Data subset selection aims to find a smaller yet informative subset of a large dataset that can approximate the full-dataset training, addressing challenges associated with training neural networks on large-scale datasets. However, existing methods tend to specialize in either high or low selection ratio regimes, lacking a universal approach that consistently achieves competitive performance acros…
▽ More
Data subset selection aims to find a smaller yet informative subset of a large dataset that can approximate the full-dataset training, addressing challenges associated with training neural networks on large-scale datasets. However, existing methods tend to specialize in either high or low selection ratio regimes, lacking a universal approach that consistently achieves competitive performance across a broad range of selection ratios. We introduce a universal and efficient data subset selection method, Best Window Selection (BWS), by proposing a method to choose the best window subset from samples ordered based on their difficulty scores. This approach offers flexibility by allowing the choice of window intervals that span from easy to difficult samples. Furthermore, we provide an efficient mechanism for selecting the best window subset by evaluating its quality using kernel ridge regression. Our experimental results demonstrate the superior performance of BWS compared to other baselines across a broad range of selection ratios over datasets, including CIFAR-10/100 and ImageNet, and the scenarios involving training from random initialization or fine-tuning of pre-trained models.
△ Less
Submitted 5 June, 2024;
originally announced June 2024.
-
A Robust UWOC-assisted Multi-hop Topology for Underwater Sensor Network Nodes
Authors:
Maaz Salman,
Javad Bolboli,
Wan-Young Chung
Abstract:
Underwater environment is substantially less explored territory as compared to earth surface due to lack of robust underwater communication infrastructure. For Internet of Underwater things connectivity, underwater wireless optical communication can play a vital role, compared to conventional radio frequency communication, due to longer range, high data rate, low latency, and unregulated bandwidth…
▽ More
Underwater environment is substantially less explored territory as compared to earth surface due to lack of robust underwater communication infrastructure. For Internet of Underwater things connectivity, underwater wireless optical communication can play a vital role, compared to conventional radio frequency communication, due to longer range, high data rate, low latency, and unregulated bandwidth. This study proposes underwater wireless optical communication driven local area network UWOC LAN, comprised of multiple network nodes with optical transceivers. Moreover, the temperature sensor data is encapsulated with individual authentication identity to enhance the security of the framework at the user end. The proposed system is evaluated in a specially designed water tank of 4 meters. The proposed system evaluation analysis shows that the system can transmit underwater temperature data reliably in real time. The proposed secure UWOC LAN is tested within a communication range of 16 meters by incorporating multi hop connectivity to monitor the underwater environment.
△ Less
Submitted 31 March, 2024; v1 submitted 28 March, 2024;
originally announced March 2024.
-
An Interpretable Generalization Mechanism for Accurately Detecting Anomaly and Identifying Networking Intrusion Techniques
Authors:
Hao-Ting Pai,
Yu-Hsuan Kang,
Wen-Cheng Chung
Abstract:
Recent advancements in Intrusion Detection Systems (IDS), integrating Explainable AI (XAI) methodologies, have led to notable improvements in system performance via precise feature selection. However, a thorough understanding of cyber-attacks requires inherently explainable decision-making processes within IDS. In this paper, we present the Interpretable Generalization Mechanism (IG), poised to re…
▽ More
Recent advancements in Intrusion Detection Systems (IDS), integrating Explainable AI (XAI) methodologies, have led to notable improvements in system performance via precise feature selection. However, a thorough understanding of cyber-attacks requires inherently explainable decision-making processes within IDS. In this paper, we present the Interpretable Generalization Mechanism (IG), poised to revolutionize IDS capabilities. IG discerns coherent patterns, making it interpretable in distinguishing between normal and anomalous network traffic. Further, the synthesis of coherent patterns sheds light on intricate intrusion pathways, providing essential insights for cybersecurity forensics. By experiments with real-world datasets NSL-KDD, UNSW-NB15, and UKM-IDS20, IG is accurate even at a low ratio of training-to-test. With 10%-to-90%, IG achieves Precision (PRE)=0.93, Recall (REC)=0.94, and Area Under Curve (AUC)=0.94 in NSL-KDD; PRE=0.98, REC=0.99, and AUC=0.99 in UNSW-NB15; and PRE=0.98, REC=0.98, and AUC=0.99 in UKM-IDS20. Notably, in UNSW-NB15, IG achieves REC=1.0 and at least PRE=0.98 since 40%-to-60%; in UKM-IDS20, IG achieves REC=1.0 and at least PRE=0.88 since 20%-to-80%. Importantly, in UKM-IDS20, IG successfully identifies all three anomalous instances without prior exposure, demonstrating its generalization capabilities. These results and inferences are reproducible. In sum, IG showcases superior generalization by consistently performing well across diverse datasets and training-to-test ratios (from 10%-to-90% to 90%-to-10%), and excels in identifying novel anomalies without prior exposure. Its interpretability is enhanced by coherent evidence that accurately distinguishes both normal and anomalous activities, significantly improving detection accuracy and reducing false alarms, thereby strengthening IDS reliability and trustworthiness.
△ Less
Submitted 5 November, 2024; v1 submitted 12 March, 2024;
originally announced March 2024.
-
HEAL: Brain-inspired Hyperdimensional Efficient Active Learning
Authors:
Yang Ni,
Zhuowen Zou,
Wenjun Huang,
Hanning Chen,
William Youngwoo Chung,
Samuel Cho,
Ranganath Krishnan,
Pietro Mercati,
Mohsen Imani
Abstract:
Drawing inspiration from the outstanding learning capability of our human brains, Hyperdimensional Computing (HDC) emerges as a novel computing paradigm, and it leverages high-dimensional vector presentation and operations for brain-like lightweight Machine Learning (ML). Practical deployments of HDC have significantly enhanced the learning efficiency compared to current deep ML methods on a broad…
▽ More
Drawing inspiration from the outstanding learning capability of our human brains, Hyperdimensional Computing (HDC) emerges as a novel computing paradigm, and it leverages high-dimensional vector presentation and operations for brain-like lightweight Machine Learning (ML). Practical deployments of HDC have significantly enhanced the learning efficiency compared to current deep ML methods on a broad spectrum of applications. However, boosting the data efficiency of HDC classifiers in supervised learning remains an open question. In this paper, we introduce Hyperdimensional Efficient Active Learning (HEAL), a novel Active Learning (AL) framework tailored for HDC classification. HEAL proactively annotates unlabeled data points via uncertainty and diversity-guided acquisition, leading to a more efficient dataset annotation and lowering labor costs. Unlike conventional AL methods that only support classifiers built upon deep neural networks (DNN), HEAL operates without the need for gradient or probabilistic computations. This allows it to be effortlessly integrated with any existing HDC classifier architecture. The key design of HEAL is a novel approach for uncertainty estimation in HDC classifiers through a lightweight HDC ensemble with prior hypervectors. Additionally, by exploiting hypervectors as prototypes (i.e., compact representations), we develop an extra metric for HEAL to select diverse samples within each batch for annotation. Our evaluation shows that HEAL surpasses a diverse set of baselines in AL quality and achieves notably faster acquisition than many BNN-powered or diversity-guided AL methods, recording 11 times to 40,000 times speedup in acquisition runtime per batch.
△ Less
Submitted 17 February, 2024;
originally announced February 2024.
-
Rethinking Self-Distillation: Label Averaging and Enhanced Soft Label Refinement with Partial Labels
Authors:
Hyeonsu Jeong,
Hye Won Chung
Abstract:
We investigate the mechanisms of self-distillation in multi-class classification, particularly in the context of linear probing with fixed feature extractors where traditional feature learning explanations do not apply. Our theoretical analysis reveals that multi-round self-distillation effectively performs label averaging among instances with high feature correlations, governed by the eigenvector…
▽ More
We investigate the mechanisms of self-distillation in multi-class classification, particularly in the context of linear probing with fixed feature extractors where traditional feature learning explanations do not apply. Our theoretical analysis reveals that multi-round self-distillation effectively performs label averaging among instances with high feature correlations, governed by the eigenvectors of the Gram matrix derived from input features. This process leads to clustered predictions and improved generalization, mitigating the impact of label noise by reducing the model's reliance on potentially corrupted labels. We establish conditions under which multi-round self-distillation achieves 100% population accuracy despite label noise. Furthermore, we introduce a novel, efficient single-round self-distillation method using refined partial labels from the teacher's top two softmax outputs, referred to as the PLL student model. This approach replicates the benefits of multi-round distillation in a single round, achieving comparable or superior performance--especially in high-noise scenarios--while significantly reducing computational cost.
△ Less
Submitted 19 February, 2025; v1 submitted 16 February, 2024;
originally announced February 2024.
-
Train Small, Model Big: Scalable Physics Simulators via Reduced Order Modeling and Domain Decomposition
Authors:
Seung Whan Chung,
Youngsoo Choi,
Pratanu Roy,
Thomas Moore,
Thomas Roy,
Tiras Y. Lin,
Du Y. Nguyen,
Christopher Hahn,
Eric B. Duoss,
Sarah E. Baker
Abstract:
Numerous cutting-edge scientific technologies originate at the laboratory scale, but transitioning them to practical industry applications is a formidable challenge. Traditional pilot projects at intermediate scales are costly and time-consuming. An alternative, the E-pilot, relies on high-fidelity numerical simulations, but even these simulations can be computationally prohibitive at larger scale…
▽ More
Numerous cutting-edge scientific technologies originate at the laboratory scale, but transitioning them to practical industry applications is a formidable challenge. Traditional pilot projects at intermediate scales are costly and time-consuming. An alternative, the E-pilot, relies on high-fidelity numerical simulations, but even these simulations can be computationally prohibitive at larger scales. To overcome these limitations, we propose a scalable, physics-constrained reduced order model (ROM) method. ROM identifies critical physics modes from small-scale unit components, projecting governing equations onto these modes to create a reduced model that retains essential physics details. We also employ Discontinuous Galerkin Domain Decomposition (DG-DD) to apply ROM to unit components and interfaces, enabling the construction of large-scale global systems without data at such large scales. This method is demonstrated on the Poisson and Stokes flow equations, showing that it can solve equations about $15 - 40$ times faster with only $\sim$ $1\%$ relative error. Furthermore, ROM takes one order of magnitude less memory than the full order model, enabling larger scale predictions at a given memory limitation.
△ Less
Submitted 5 December, 2023;
originally announced January 2024.
-
GestaltMML: Enhancing Rare Genetic Disease Diagnosis through Multimodal Machine Learning Combining Facial Images and Clinical Texts
Authors:
Da Wu,
Jingye Yang,
Cong Liu,
Tzung-Chien Hsieh,
Elaine Marchi,
Justin Blair,
Peter Krawitz,
Chunhua Weng,
Wendy Chung,
Gholson J. Lyon,
Ian D. Krantz,
Jennifer M. Kalish,
Kai Wang
Abstract:
Individuals with suspected rare genetic disorders often undergo multiple clinical evaluations, imaging studies, laboratory tests and genetic tests, to find a possible answer over a prolonged period of time. Addressing this "diagnostic odyssey" thus has substantial clinical, psychosocial, and economic benefits. Many rare genetic diseases have distinctive facial features, which can be used by artifi…
▽ More
Individuals with suspected rare genetic disorders often undergo multiple clinical evaluations, imaging studies, laboratory tests and genetic tests, to find a possible answer over a prolonged period of time. Addressing this "diagnostic odyssey" thus has substantial clinical, psychosocial, and economic benefits. Many rare genetic diseases have distinctive facial features, which can be used by artificial intelligence algorithms to facilitate clinical diagnosis, in prioritizing candidate diseases to be further examined by lab tests or genetic assays, or in helping the phenotype-driven reinterpretation of genome/exome sequencing data. Existing methods using frontal facial photos were built on conventional Convolutional Neural Networks (CNNs), rely exclusively on facial images, and cannot capture non-facial phenotypic traits and demographic information essential for guiding accurate diagnoses. Here we introduce GestaltMML, a multimodal machine learning (MML) approach solely based on the Transformer architecture. It integrates facial images, demographic information (age, sex, ethnicity), and clinical notes (optionally, a list of Human Phenotype Ontology terms) to improve prediction accuracy. Furthermore, we also evaluated GestaltMML on a diverse range of datasets, including 528 diseases from the GestaltMatcher Database, several in-house datasets of Beckwith-Wiedemann syndrome (BWS, over-growth syndrome with distinct facial features), Sotos syndrome (overgrowth syndrome with overlapping features with BWS), NAA10-related neurodevelopmental syndrome, Cornelia de Lange syndrome (multiple malformation syndrome), and KBG syndrome (multiple malformation syndrome). Our results suggest that GestaltMML effectively incorporates multiple modalities of data, greatly narrowing candidate genetic diagnoses of rare diseases and may facilitate the reinterpretation of genome/exome sequencing data.
△ Less
Submitted 21 April, 2024; v1 submitted 23 December, 2023;
originally announced December 2023.
-
Dimensionality Reduction of Dynamics on Lie Manifolds via Structure-Aware Canonical Correlation Analysis
Authors:
Wooyoung Chung,
Daniel Polani,
Stas Tiomkin
Abstract:
Incorporating prior knowledge into a data-driven modeling problem can drastically improve performance, reliability, and generalization outside of the training sample. The stronger the structural properties, the more effective these improvements become. Manifolds are a powerful nonlinear generalization of Euclidean space for modeling finite dimensions. Structural impositions in constrained systems…
▽ More
Incorporating prior knowledge into a data-driven modeling problem can drastically improve performance, reliability, and generalization outside of the training sample. The stronger the structural properties, the more effective these improvements become. Manifolds are a powerful nonlinear generalization of Euclidean space for modeling finite dimensions. Structural impositions in constrained systems increase when applying group structure, converting them into Lie manifolds. The range of their applications is very wide and includes the important case of robotic tasks. Canonical Correlation Analysis (CCA) can construct a hierarchical sequence of maximal correlations of up to two paired data sets in these Euclidean spaces. We present a method to generalize this concept to Lie Manifolds and demonstrate its efficacy through the substantial improvements it achieves in making structure-consistent predictions about changes in the state of a robotic hand.
△ Less
Submitted 16 November, 2023;
originally announced November 2023.
-
C2C: Cough to COVID-19 Detection in BHI 2023 Data Challenge
Authors:
Woo-Jin Chung,
Miseul Kim,
Hong-Goo Kang
Abstract:
This report describes our submission to BHI 2023 Data Competition: Sensor challenge. Our Audio Alchemists team designed an acoustic-based COVID-19 diagnosis system, Cough to COVID-19 (C2C), and won the 1st place in the challenge. C2C involves three key contributions: pre-processing of input signals, cough-related representation extraction leveraging Wav2vec2.0, and data augmentation. Through exper…
▽ More
This report describes our submission to BHI 2023 Data Competition: Sensor challenge. Our Audio Alchemists team designed an acoustic-based COVID-19 diagnosis system, Cough to COVID-19 (C2C), and won the 1st place in the challenge. C2C involves three key contributions: pre-processing of input signals, cough-related representation extraction leveraging Wav2vec2.0, and data augmentation. Through experimental findings, we demonstrate C2C's promising potential to enhance the diagnostic accuracy of COVID-19 via cough signals. Our proposed model achieves a ROC-AUC value of 0.7810 in the context of COVID-19 diagnosis. The implementation details and the python code can be found in the following link: https://github.com/Woo-jin-Chung/BHI_2023_challenge_Audio_Alchemists
△ Less
Submitted 1 November, 2023;
originally announced November 2023.
-
Contrastive Learning for Inference in Dialogue
Authors:
Etsuko Ishii,
Yan Xu,
Bryan Wilie,
Ziwei Ji,
Holy Lovenia,
Willy Chung,
Pascale Fung
Abstract:
Inference, especially those derived from inductive processes, is a crucial component in our conversation to complement the information implicitly or explicitly conveyed by a speaker. While recent large language models show remarkable advances in inference tasks, their performance in inductive reasoning, where not all information is present in the context, is far behind deductive reasoning. In this…
▽ More
Inference, especially those derived from inductive processes, is a crucial component in our conversation to complement the information implicitly or explicitly conveyed by a speaker. While recent large language models show remarkable advances in inference tasks, their performance in inductive reasoning, where not all information is present in the context, is far behind deductive reasoning. In this paper, we analyze the behavior of the models based on the task difficulty defined by the semantic information gap -- which distinguishes inductive and deductive reasoning (Johnson-Laird, 1988, 1993). Our analysis reveals that the disparity in information between dialogue contexts and desired inferences poses a significant challenge to the inductive inference process. To mitigate this information gap, we investigate a contrastive learning approach by feeding negative samples. Our experiments suggest negative samples help models understand what is wrong and improve their inference generations.
△ Less
Submitted 12 November, 2023; v1 submitted 19 October, 2023;
originally announced October 2023.
-
InstructTODS: Large Language Models for End-to-End Task-Oriented Dialogue Systems
Authors:
Willy Chung,
Samuel Cahyawijaya,
Bryan Wilie,
Holy Lovenia,
Pascale Fung
Abstract:
Large language models (LLMs) have been used for diverse tasks in natural language processing (NLP), yet remain under-explored for task-oriented dialogue systems (TODS), especially for end-to-end TODS. We present InstructTODS, a novel off-the-shelf framework for zero-shot end-to-end task-oriented dialogue systems that can adapt to diverse domains without fine-tuning. By leveraging LLMs, InstructTOD…
▽ More
Large language models (LLMs) have been used for diverse tasks in natural language processing (NLP), yet remain under-explored for task-oriented dialogue systems (TODS), especially for end-to-end TODS. We present InstructTODS, a novel off-the-shelf framework for zero-shot end-to-end task-oriented dialogue systems that can adapt to diverse domains without fine-tuning. By leveraging LLMs, InstructTODS generates a proxy belief state that seamlessly translates user intentions into dynamic queries for efficient interaction with any KB. Our extensive experiments demonstrate that InstructTODS achieves comparable performance to fully fine-tuned TODS in guiding dialogues to successful completion without prior knowledge or task-specific data. Furthermore, a rigorous human evaluation of end-to-end TODS shows that InstructTODS produces dialogue responses that notably outperform both the gold responses and the state-of-the-art TODS in terms of helpfulness, informativeness, and humanness. Moreover, the effectiveness of LLMs in TODS is further supported by our comprehensive evaluations on TODS subtasks: dialogue state tracking, intent classification, and response generation. Code and implementations could be found here https://github.com/WillyHC22/InstructTODS/
△ Less
Submitted 13 October, 2023;
originally announced October 2023.
-
Turbulence in Focus: Benchmarking Scaling Behavior of 3D Volumetric Super-Resolution with BLASTNet 2.0 Data
Authors:
Wai Tong Chung,
Bassem Akoush,
Pushan Sharma,
Alex Tamkin,
Ki Sung Jung,
Jacqueline H. Chen,
Jack Guo,
Davy Brouzet,
Mohsen Talei,
Bruno Savard,
Alexei Y. Poludnenko,
Matthias Ihme
Abstract:
Analysis of compressible turbulent flows is essential for applications related to propulsion, energy generation, and the environment. Here, we present BLASTNet 2.0, a 2.2 TB network-of-datasets containing 744 full-domain samples from 34 high-fidelity direct numerical simulations, which addresses the current limited availability of 3D high-fidelity reacting and non-reacting compressible turbulent f…
▽ More
Analysis of compressible turbulent flows is essential for applications related to propulsion, energy generation, and the environment. Here, we present BLASTNet 2.0, a 2.2 TB network-of-datasets containing 744 full-domain samples from 34 high-fidelity direct numerical simulations, which addresses the current limited availability of 3D high-fidelity reacting and non-reacting compressible turbulent flow simulation data. With this data, we benchmark a total of 49 variations of five deep learning approaches for 3D super-resolution - which can be applied for improving scientific imaging, simulations, turbulence models, as well as in computer vision applications. We perform neural scaling analysis on these models to examine the performance of different machine learning (ML) approaches, including two scientific ML techniques. We demonstrate that (i) predictive performance can scale with model size and cost, (ii) architecture matters significantly, especially for smaller models, and (iii) the benefits of physics-based losses can persist with increasing model size. The outcomes of this benchmark study are anticipated to offer insights that can aid the design of 3D super-resolution models, especially for turbulence models, while this data is expected to foster ML methods for a broad range of flow physics applications. This data is publicly available with download links and browsing tools consolidated at https://blastnet.github.io.
△ Less
Submitted 27 October, 2023; v1 submitted 23 September, 2023;
originally announced September 2023.
-
PICK: Polished & Informed Candidate Scoring for Knowledge-Grounded Dialogue Systems
Authors:
Bryan Wilie,
Yan Xu,
Willy Chung,
Samuel Cahyawijaya,
Holy Lovenia,
Pascale Fung
Abstract:
Grounding dialogue response generation on external knowledge is proposed to produce informative and engaging responses. However, current knowledge-grounded dialogue (KGD) systems often fail to align the generated responses with human-preferred qualities due to several issues like hallucination and the lack of coherence. Upon analyzing multiple language model generations, we observe the presence of…
▽ More
Grounding dialogue response generation on external knowledge is proposed to produce informative and engaging responses. However, current knowledge-grounded dialogue (KGD) systems often fail to align the generated responses with human-preferred qualities due to several issues like hallucination and the lack of coherence. Upon analyzing multiple language model generations, we observe the presence of alternative generated responses within a single decoding process. These alternative responses are more faithful and exhibit a comparable or higher level of relevance to prior conversational turns compared to the optimal responses prioritized by the decoding processes. To address these challenges and driven by these observations, we propose Polished \& Informed Candidate Scoring (PICK), a generation re-scoring framework that empowers models to generate faithful and relevant responses without requiring additional labeled data or model tuning. Through comprehensive automatic and human evaluations, we demonstrate the effectiveness of PICK in generating responses that are more faithful while keeping them relevant to the dialogue history. Furthermore, PICK consistently improves the system's performance with both oracle and retrieved knowledge in all decoding strategies. We provide the detailed implementation in https://github.com/bryanwilie/pick .
△ Less
Submitted 19 September, 2023;
originally announced September 2023.
-
Graph Matching in Correlated Stochastic Block Models for Improved Graph Clustering
Authors:
Joonhyuk Yang,
Hye Won Chung
Abstract:
We consider community detection from multiple correlated graphs sharing the same community structure. The correlated graphs are generated by independent subsampling of a parent graph sampled from the stochastic block model. The vertex correspondence between the correlated graphs is assumed to be unknown. We consider the two-step procedure where the vertex correspondence between the correlated grap…
▽ More
We consider community detection from multiple correlated graphs sharing the same community structure. The correlated graphs are generated by independent subsampling of a parent graph sampled from the stochastic block model. The vertex correspondence between the correlated graphs is assumed to be unknown. We consider the two-step procedure where the vertex correspondence between the correlated graphs is first revealed, and the communities are recovered from the union of the correlated graphs, which becomes denser than each single graph. We derive the information-theoretic limits for exact graph matching in general density regimes and the number of communities, and then analyze the regime of graph parameters, where one can benefit from the matching of the correlated graphs in recovering the latent community structure of the graphs.
△ Less
Submitted 10 September, 2023;
originally announced September 2023.
-
Cross-Lingual Cross-Age Group Adaptation for Low-Resource Elderly Speech Emotion Recognition
Authors:
Samuel Cahyawijaya,
Holy Lovenia,
Willy Chung,
Rita Frieske,
Zihan Liu,
Pascale Fung
Abstract:
Speech emotion recognition plays a crucial role in human-computer interactions. However, most speech emotion recognition research is biased toward English-speaking adults, which hinders its applicability to other demographic groups in different languages and age groups. In this work, we analyze the transferability of emotion recognition across three different languages--English, Mandarin Chinese,…
▽ More
Speech emotion recognition plays a crucial role in human-computer interactions. However, most speech emotion recognition research is biased toward English-speaking adults, which hinders its applicability to other demographic groups in different languages and age groups. In this work, we analyze the transferability of emotion recognition across three different languages--English, Mandarin Chinese, and Cantonese; and 2 different age groups--adults and the elderly. To conduct the experiment, we develop an English-Mandarin speech emotion benchmark for adults and the elderly, BiMotion, and a Cantonese speech emotion dataset, YueMotion. This study concludes that different language and age groups require specific speech features, thus making cross-lingual inference an unsuitable method. However, cross-group data augmentation is still beneficial to regularize the model, with linguistic distance being a significant influence on cross-lingual transferability. We release publicly release our code at https://github.com/HLTCHKUST/elderly_ser.
△ Less
Submitted 26 June, 2023;
originally announced June 2023.
-
Spatially Resolved Gene Expression Prediction from H&E Histology Images via Bi-modal Contrastive Learning
Authors:
Ronald Xie,
Kuan Pang,
Sai W. Chung,
Catia T. Perciani,
Sonya A. MacParland,
Bo Wang,
Gary D. Bader
Abstract:
Histology imaging is an important tool in medical diagnosis and research, enabling the examination of tissue structure and composition at the microscopic level. Understanding the underlying molecular mechanisms of tissue architecture is critical in uncovering disease mechanisms and developing effective treatments. Gene expression profiling provides insight into the molecular processes underlying t…
▽ More
Histology imaging is an important tool in medical diagnosis and research, enabling the examination of tissue structure and composition at the microscopic level. Understanding the underlying molecular mechanisms of tissue architecture is critical in uncovering disease mechanisms and developing effective treatments. Gene expression profiling provides insight into the molecular processes underlying tissue architecture, but the process can be time-consuming and expensive. We present BLEEP (Bi-modaL Embedding for Expression Prediction), a bi-modal embedding framework capable of generating spatially resolved gene expression profiles of whole-slide Hematoxylin and eosin (H&E) stained histology images. BLEEP uses contrastive learning to construct a low-dimensional joint embedding space from a reference dataset using paired image and expression profiles at micrometer resolution. With this approach, the gene expression of any query image patch can be imputed using the expression profiles from the reference dataset. We demonstrate BLEEP's effectiveness in gene expression prediction by benchmarking its performance on a human liver tissue dataset captured using the 10x Visium platform, where it achieves significant improvements over existing methods. Our results demonstrate the potential of BLEEP to provide insights into the molecular mechanisms underlying tissue architecture, with important implications in diagnosis and research of various diseases. The proposed approach can significantly reduce the time and cost associated with gene expression profiling, opening up new avenues for high-throughput analysis of histology images for both research and clinical applications.
△ Less
Submitted 27 October, 2023; v1 submitted 2 June, 2023;
originally announced June 2023.
-
Efficient Algorithms for Exact Graph Matching on Correlated Stochastic Block Models with Constant Correlation
Authors:
Joonhyuk Yang,
Dongpil Shin,
Hye Won Chung
Abstract:
We consider the problem of graph matching, or learning vertex correspondence, between two correlated stochastic block models (SBMs). The graph matching problem arises in various fields, including computer vision, natural language processing and bioinformatics, and in particular, matching graphs with inherent community structure has significance related to de-anonymization of correlated social netw…
▽ More
We consider the problem of graph matching, or learning vertex correspondence, between two correlated stochastic block models (SBMs). The graph matching problem arises in various fields, including computer vision, natural language processing and bioinformatics, and in particular, matching graphs with inherent community structure has significance related to de-anonymization of correlated social networks. Compared to the correlated Erdos-Renyi (ER) model, where various efficient algorithms have been developed, among which a few algorithms have been proven to achieve the exact matching with constant edge correlation, no low-order polynomial algorithm has been known to achieve exact matching for the correlated SBMs with constant correlation. In this work, we propose an efficient algorithm for matching graphs with community structure, based on the comparison between partition trees rooted from each vertex, by extending the idea of Mao et al. (2021) to graphs with communities. The partition tree divides the large neighborhoods of each vertex into disjoint subsets using their edge statistics to different communities. Our algorithm is the first low-order polynomial-time algorithm achieving exact matching between two correlated SBMs with high probability in dense graphs.
△ Less
Submitted 2 June, 2023; v1 submitted 31 May, 2023;
originally announced May 2023.
-
Mixture-of-Experts Meets Instruction Tuning:A Winning Combination for Large Language Models
Authors:
Sheng Shen,
Le Hou,
Yanqi Zhou,
Nan Du,
Shayne Longpre,
Jason Wei,
Hyung Won Chung,
Barret Zoph,
William Fedus,
Xinyun Chen,
Tu Vu,
Yuexin Wu,
Wuyang Chen,
Albert Webson,
Yunxuan Li,
Vincent Zhao,
Hongkun Yu,
Kurt Keutzer,
Trevor Darrell,
Denny Zhou
Abstract:
Sparse Mixture-of-Experts (MoE) is a neural architecture design that can be utilized to add learnable parameters to Large Language Models (LLMs) without increasing inference cost. Instruction tuning is a technique for training LLMs to follow instructions. We advocate combining these two approaches, as we find that MoE models benefit more from instruction tuning than dense models. In particular, we…
▽ More
Sparse Mixture-of-Experts (MoE) is a neural architecture design that can be utilized to add learnable parameters to Large Language Models (LLMs) without increasing inference cost. Instruction tuning is a technique for training LLMs to follow instructions. We advocate combining these two approaches, as we find that MoE models benefit more from instruction tuning than dense models. In particular, we conduct empirical studies across three experimental setups: (i) Direct finetuning on individual downstream tasks devoid of instruction tuning; (ii) Instructiontuning followed by in-context few-shot or zero-shot generalization on downstream tasks; and (iii) Instruction tuning supplemented by further finetuning on individual downstream tasks. In the first scenario, MoE models overall underperform dense models of identical computational capacity. This narrative, however, dramatically changes with the introduction of instruction tuning (second and third scenario), used independently or in conjunction with task-specific finetuning. Our most powerful model, FLAN-MOE-32B, surpasses the performance of FLAN-PALM-62B on four benchmark tasks, while using only a third of the FLOPs. The advancements embodied byFLAN-MOE inspire a reevaluation of the design principles of large-scale, high-performance language models in the framework of task-agnostic learning.
△ Less
Submitted 5 July, 2023; v1 submitted 24 May, 2023;
originally announced May 2023.
-
InstructAlign: High-and-Low Resource Language Alignment via Continual Crosslingual Instruction Tuning
Authors:
Samuel Cahyawijaya,
Holy Lovenia,
Tiezheng Yu,
Willy Chung,
Pascale Fung
Abstract:
Large language models (LLMs) that are tuned with instructions have demonstrated remarkable capabilities in various tasks and languages. However, their ability to generalize to underrepresented languages is limited due to the scarcity of available data. Additionally, directly adapting new languages to instruction-tuned LLMs can result in catastrophic forgetting, which leads to the loss of multitask…
▽ More
Large language models (LLMs) that are tuned with instructions have demonstrated remarkable capabilities in various tasks and languages. However, their ability to generalize to underrepresented languages is limited due to the scarcity of available data. Additionally, directly adapting new languages to instruction-tuned LLMs can result in catastrophic forgetting, which leads to the loss of multitasking ability. To address this issue, we propose InstructAlign which uses continual crosslingual instruction tuning to enable LLMs to align new unseen languages with previously learned high-resource languages. Our results demonstrate the effectiveness of InstructAlign in enabling the model to understand low-resource languages with limited parallel data while preventing catastrophic forgetting. Our work contributes to the advancement of language adaptation methods, particularly for adapting instruction-tuned LLMs to underrepresented languages. Our code is released on https://github.com/HLTCHKUST/InstructAlign
△ Less
Submitted 24 October, 2023; v1 submitted 22 May, 2023;
originally announced May 2023.
-
Learn What NOT to Learn: Towards Generative Safety in Chatbots
Authors:
Leila Khalatbari,
Yejin Bang,
Dan Su,
Willy Chung,
Saeed Ghadimi,
Hossein Sameti,
Pascale Fung
Abstract:
Conversational models that are generative and open-domain are particularly susceptible to generating unsafe content since they are trained on web-based social data. Prior approaches to mitigating this issue have drawbacks, such as disrupting the flow of conversation, limited generalization to unseen toxic input contexts, and sacrificing the quality of the dialogue for the sake of safety. In this p…
▽ More
Conversational models that are generative and open-domain are particularly susceptible to generating unsafe content since they are trained on web-based social data. Prior approaches to mitigating this issue have drawbacks, such as disrupting the flow of conversation, limited generalization to unseen toxic input contexts, and sacrificing the quality of the dialogue for the sake of safety. In this paper, we present a novel framework, named "LOT" (Learn NOT to), that employs a contrastive loss to enhance generalization by learning from both positive and negative training signals. Our approach differs from the standard contrastive learning framework in that it automatically obtains positive and negative signals from the safe and unsafe language distributions that have been learned beforehand. The LOT framework utilizes divergence to steer the generations away from the unsafe subspace and towards the safe subspace while sustaining the flow of conversation. Our approach is memory and time-efficient during decoding and effectively reduces toxicity while preserving engagingness and fluency. Empirical results indicate that LOT reduces toxicity by up to four-fold while achieving four to six-fold higher rates of engagingness and fluency compared to baseline models. Our findings are further corroborated by human evaluation.
△ Less
Submitted 25 April, 2023; v1 submitted 21 April, 2023;
originally announced April 2023.