-
ArtifactsBench: Bridging the Visual-Interactive Gap in LLM Code Generation Evaluation
Authors:
Chenchen Zhang,
Yuhang Li,
Can Xu,
Jiaheng Liu,
Ao Liu,
Shihui Hu,
Dengpeng Wu,
Guanhua Huang,
Kejiao Li,
Qi Yi,
Ruibin Xiong,
Haotian Zhu,
Yuanxing Zhang,
Yuhao Jiang,
Yue Zhang,
Zenan Xu,
Bohui Zhai,
Guoxiang He,
Hebin Li,
Jie Zhao,
Le Zhang,
Lingyun Tan,
Pengyu Guo,
Xianshu Pang,
Yang Ruan
, et al. (7 additional authors not shown)
Abstract:
The generative capabilities of Large Language Models (LLMs) are rapidly expanding from static code to dynamic, interactive visual artifacts. This progress is bottlenecked by a critical evaluation gap: established benchmarks focus on algorithmic correctness and are blind to the visual fidelity and interactive integrity that define modern user experiences. To bridge this gap, we introduce ArtifactsB…
▽ More
The generative capabilities of Large Language Models (LLMs) are rapidly expanding from static code to dynamic, interactive visual artifacts. This progress is bottlenecked by a critical evaluation gap: established benchmarks focus on algorithmic correctness and are blind to the visual fidelity and interactive integrity that define modern user experiences. To bridge this gap, we introduce ArtifactsBench, a new benchmark and paradigm for the automated, multimodal evaluation of visual code generation. Our framework programmatically renders each generated artifact and captures its dynamic behavior through temporal screenshots. This visual evidence, alongside the source code, is then assessed by a Multimodal LLM (MLLM)-as-Judge, which is rigorously guided by a fine-grained, per-task checklist to ensure holistic and reproducible scoring. We construct a new benchmark of 1,825 diverse tasks and evaluate over 30 leading LLMs. Our automated evaluation achieves a striking 94.4% ranking consistency with WebDev Arena, the gold-standard for human preference in web development, and over 90% pairwise agreement with human experts. This establishes ArtifactsBench as the first framework to reliably automate the assessment of human-perceived quality at scale. Our analysis provides a high-resolution map of the current SOTA, revealing that generalist models often outperform domain-specific ones. We open-source ArtifactsBench, including the benchmark, evaluation harness, and baseline results at https://artifactsbenchmark.github.io/, to provide the community with a scalable and accurate tool to accelerate the development of user-centric generative models.
△ Less
Submitted 7 July, 2025;
originally announced July 2025.
-
StackCLIP: Clustering-Driven Stacked Prompt in Zero-Shot Industrial Anomaly Detection
Authors:
Yanning Hou,
Yanran Ruan,
Junfa Li,
Shanshan Wang,
Jianfeng Qiu,
Ke Xu
Abstract:
Enhancing the alignment between text and image features in the CLIP model is a critical challenge in zero-shot industrial anomaly detection tasks. Recent studies predominantly utilize specific category prompts during pretraining, which can cause overfitting to the training categories and limit model generalization. To address this, we propose a method that transforms category names through multica…
▽ More
Enhancing the alignment between text and image features in the CLIP model is a critical challenge in zero-shot industrial anomaly detection tasks. Recent studies predominantly utilize specific category prompts during pretraining, which can cause overfitting to the training categories and limit model generalization. To address this, we propose a method that transforms category names through multicategory name stacking to create stacked prompts, forming the basis of our StackCLIP model. Our approach introduces two key components. The Clustering-Driven Stacked Prompts (CSP) module constructs generic prompts by stacking semantically analogous categories, while utilizing multi-object textual feature fusion to amplify discriminative anomalies among similar objects. The Ensemble Feature Alignment (EFA) module trains knowledge-specific linear layers tailored for each stack cluster and adaptively integrates them based on the attributes of test categories. These modules work together to deliver superior training speed, stability, and convergence, significantly boosting anomaly segmentation performance. Additionally, our stacked prompt framework offers robust generalization across classification tasks. To further improve performance, we introduce the Regulating Prompt Learning (RPL) module, which leverages the generalization power of stacked prompts to refine prompt learning, elevating results in anomaly detection classification tasks. Extensive testing on seven industrial anomaly detection datasets demonstrates that our method achieves state-of-the-art performance in both zero-shot anomaly detection and segmentation tasks.
△ Less
Submitted 5 July, 2025; v1 submitted 30 June, 2025;
originally announced June 2025.
-
SFNet: Fusion of Spatial and Frequency-Domain Features for Remote Sensing Image Forgery Detection
Authors:
Ji Qi,
Xinchang Zhang,
Dingqi Ye,
Yongjia Ruan,
Xin Guo,
Shaowen Wang,
Haifeng Li
Abstract:
The rapid advancement of generative artificial intelligence is producing fake remote sensing imagery (RSI) that is increasingly difficult to detect, potentially leading to erroneous intelligence, fake news, and even conspiracy theories. Existing forgery detection methods typically rely on single visual features to capture predefined artifacts, such as spatial-domain cues to detect forged objects l…
▽ More
The rapid advancement of generative artificial intelligence is producing fake remote sensing imagery (RSI) that is increasingly difficult to detect, potentially leading to erroneous intelligence, fake news, and even conspiracy theories. Existing forgery detection methods typically rely on single visual features to capture predefined artifacts, such as spatial-domain cues to detect forged objects like roads or buildings in RSI, or frequency-domain features to identify artifacts from up-sampling operations in adversarial generative networks (GANs). However, the nature of artifacts can significantly differ depending on geographic terrain, land cover types, or specific features within the RSI. Moreover, these complex artifacts evolve as generative models become more sophisticated. In short, over-reliance on a single visual cue makes existing forgery detectors struggle to generalize across diverse remote sensing data. This paper proposed a novel forgery detection framework called SFNet, designed to identify fake images in diverse remote sensing data by leveraging spatial and frequency domain features. Specifically, to obtain rich and comprehensive visual information, SFNet employs two independent feature extractors to capture spatial and frequency domain features from input RSIs. To fully utilize the complementary domain features, the domain feature mapping module and the hybrid domain feature refinement module(CBAM attention) of SFNet are designed to successively align and fuse the multi-domain features while suppressing redundant information. Experiments on three datasets show that SFNet achieves an accuracy improvement of 4%-15.18% over the state-of-the-art RS forgery detection methods and exhibits robust generalization capabilities. The code is available at https://github.com/GeoX-Lab/RSTI/tree/main/SFNet.
△ Less
Submitted 25 June, 2025;
originally announced June 2025.
-
Neural Graduated Assignment for Maximum Common Edge Subgraphs
Authors:
Chaolong Ying,
Yingqi Ruan,
Xuemin Chen,
Yaomin Wang,
Tianshu Yu
Abstract:
The Maximum Common Edge Subgraph (MCES) problem is a crucial challenge with significant implications in domains such as biology and chemistry. Traditional approaches, which include transformations into max-clique and search-based algorithms, suffer from scalability issues when dealing with larger instances. This paper introduces ``Neural Graduated Assignment'' (NGA), a simple, scalable, unsupervis…
▽ More
The Maximum Common Edge Subgraph (MCES) problem is a crucial challenge with significant implications in domains such as biology and chemistry. Traditional approaches, which include transformations into max-clique and search-based algorithms, suffer from scalability issues when dealing with larger instances. This paper introduces ``Neural Graduated Assignment'' (NGA), a simple, scalable, unsupervised-training-based method that addresses these limitations by drawing inspiration from the classical Graduated Assignment (GA) technique. Central to NGA is stacking of neural components that closely resemble the GA process, but with the reparameterization of learnable temperature into higher dimension. We further theoretically analyze the learning dynamics of NGA, showing its design leads to fast convergence, better exploration-exploitation tradeoff, and ability to escape local optima. Extensive experiments across MCES computation, graph similarity estimation, and graph retrieval tasks reveal that NGA not only significantly improves computation time and scalability on large instances but also enhances performance compared to existing methodologies. The introduction of NGA marks a significant advancement in the computation of MCES and offers insights into other assignment problems.
△ Less
Submitted 18 May, 2025;
originally announced May 2025.
-
Putting It All into Context: Simplifying Agents with LCLMs
Authors:
Mingjian Jiang,
Yangjun Ruan,
Luis Lastras,
Pavan Kapanipathi,
Tatsunori Hashimoto
Abstract:
Recent advances in language model (LM) agents have demonstrated significant potential for automating complex real-world tasks. To make progress on these difficult tasks, LM agent architectures have become increasingly complex, often incorporating multi-step retrieval tools, multiple agents, and scaffolding adapted to the underlying LM. In this work, we investigate whether all of this complexity is…
▽ More
Recent advances in language model (LM) agents have demonstrated significant potential for automating complex real-world tasks. To make progress on these difficult tasks, LM agent architectures have become increasingly complex, often incorporating multi-step retrieval tools, multiple agents, and scaffolding adapted to the underlying LM. In this work, we investigate whether all of this complexity is necessary, or if parts of these scaffolds can be removed on challenging tasks like SWE-bench. We show that in the case of SWE-bench, simply putting the entire environment into the context of a long context language model (LCLM) and properly prompting the model makes it competitive with carefully tuned, complex agent scaffolds. We show that a Gemini-1.5-Pro model without any scaffolding or tools achieves 38% on SWE-Bench-Verified, comparable with approaches using carefully tuned agent scaffolds (32%). While the unscaffolded approach with Gemini-1.5-Pro falls short of the strongest agentic architectures, we demonstrate that the more capable Gemini-2.5-Pro using the same unscaffolded approach directly attains a 50.8% solve rate. Additionally, a two-stage approach combining Gemini-1.5-Pro with Claude-3.7 achieves a competitive 48.6% solve rate.
△ Less
Submitted 12 May, 2025;
originally announced May 2025.
-
Reasoning to Learn from Latent Thoughts
Authors:
Yangjun Ruan,
Neil Band,
Chris J. Maddison,
Tatsunori Hashimoto
Abstract:
Compute scaling for language model (LM) pretraining has outpaced the growth of human-written texts, leading to concerns that data will become the bottleneck to LM scaling. To continue scaling pretraining in this data-constrained regime, we propose that explicitly modeling and inferring the latent thoughts that underlie the text generation process can significantly improve pretraining data efficien…
▽ More
Compute scaling for language model (LM) pretraining has outpaced the growth of human-written texts, leading to concerns that data will become the bottleneck to LM scaling. To continue scaling pretraining in this data-constrained regime, we propose that explicitly modeling and inferring the latent thoughts that underlie the text generation process can significantly improve pretraining data efficiency. Intuitively, our approach views web text as the compressed final outcome of a verbose human thought process and that the latent thoughts contain important contextual knowledge and reasoning steps that are critical to data-efficient learning. We empirically demonstrate the effectiveness of our approach through data-constrained continued pretraining for math. We first show that synthetic data approaches to inferring latent thoughts significantly improve data efficiency, outperforming training on the same amount of raw data (5.7\% $\rightarrow$ 25.4\% on MATH). Furthermore, we demonstrate latent thought inference without a strong teacher, where an LM bootstraps its own performance by using an EM algorithm to iteratively improve the capability of the trained LM and the quality of thought-augmented pretraining data. We show that a 1B LM can bootstrap its performance across at least three iterations and significantly outperform baselines trained on raw data, with increasing gains from additional inference compute when performing the E-step. The gains from inference scaling and EM iterations suggest new opportunities for scaling data-constrained pretraining.
△ Less
Submitted 24 March, 2025;
originally announced March 2025.
-
Correctness Coverage Evaluation for Medical Multiple-Choice Question Answering Based on the Enhanced Conformal Prediction Framework
Authors:
Yusong Ke,
Hongru Lin,
Yuting Ruan,
Junya Tang,
Li Li
Abstract:
Large language models (LLMs) are increasingly adopted in medical question-answering (QA) scenarios. However, LLMs can generate hallucinations and nonfactual information, undermining their trustworthiness in high-stakes medical tasks. Conformal Prediction (CP) provides a statistically rigorous framework for marginal (average) coverage guarantees but has limited exploration in medical QA. This paper…
▽ More
Large language models (LLMs) are increasingly adopted in medical question-answering (QA) scenarios. However, LLMs can generate hallucinations and nonfactual information, undermining their trustworthiness in high-stakes medical tasks. Conformal Prediction (CP) provides a statistically rigorous framework for marginal (average) coverage guarantees but has limited exploration in medical QA. This paper proposes an enhanced CP framework for medical multiple-choice question-answering (MCQA) tasks. By associating the non-conformance score with the frequency score of correct options and leveraging self-consistency, the framework addresses internal model opacity and incorporates a risk control strategy with a monotonic loss function. Evaluated on MedMCQA, MedQA, and MMLU datasets using four off-the-shelf LLMs, the proposed method meets specified error rate guarantees while reducing average prediction set size with increased risk level, offering a promising uncertainty evaluation metric for LLMs.
△ Less
Submitted 8 May, 2025; v1 submitted 7 March, 2025;
originally announced March 2025.
-
MixMin: Finding Data Mixtures via Convex Minimization
Authors:
Anvith Thudi,
Evianne Rovers,
Yangjun Ruan,
Tristan Thrush,
Chris J. Maddison
Abstract:
Modern machine learning pipelines are increasingly combining and mixing data from diverse and disparate sources, e.g., pre-training large language models. Yet, finding the optimal data mixture is a challenging and open problem. We formalize this data mixing problem as a bi-level objective: the best mixture is the one that would lead to the best model for a downstream objective. Unfortunately, this…
▽ More
Modern machine learning pipelines are increasingly combining and mixing data from diverse and disparate sources, e.g., pre-training large language models. Yet, finding the optimal data mixture is a challenging and open problem. We formalize this data mixing problem as a bi-level objective: the best mixture is the one that would lead to the best model for a downstream objective. Unfortunately, this objective is generally intractable. In this paper, we make the observation that the bi-level data mixing objective becomes convex as our model class becomes larger. We develop and study a gradient-based approach for optimizing this convex objective, which we call MixMin, and test it on language modeling and chemistry tasks. MixMin was the only method that uniformly improved the data mixture in all our experiments. With MixMin, we improved the data mixture using less than 0.2% additional compute for a pythia-410M model trained on 8.2B tokens, resulting between 1-5% relative improvement to negative log likelihood on PIQA, ARC Easy, SciQ, and OpenWebMath. Crucially, we found that MixMin mixtures for smaller models improved training of larger models, suggesting that MixMin mixtures may be scale-invariant. When mixing bioassay data to train an XGBoost model, we saw improvements to average precision scores of 0.03-0.15.
△ Less
Submitted 22 May, 2025; v1 submitted 14 February, 2025;
originally announced February 2025.
-
CFT-RAG: An Entity Tree Based Retrieval Augmented Generation Algorithm With Cuckoo Filter
Authors:
Zihang Li,
Yangdong Ruan,
Wenjun Liu,
Zhengyang Wang,
Tong Yang
Abstract:
Although retrieval-augmented generation(RAG) significantly improves generation quality by retrieving external knowledge bases and integrating generated content, it faces computational efficiency bottlenecks, particularly in knowledge retrieval tasks involving hierarchical structures for Tree-RAG. This paper proposes a Tree-RAG acceleration method based on the improved Cuckoo Filter, which optimize…
▽ More
Although retrieval-augmented generation(RAG) significantly improves generation quality by retrieving external knowledge bases and integrating generated content, it faces computational efficiency bottlenecks, particularly in knowledge retrieval tasks involving hierarchical structures for Tree-RAG. This paper proposes a Tree-RAG acceleration method based on the improved Cuckoo Filter, which optimizes entity localization during the retrieval process to achieve significant performance improvements. Tree-RAG effectively organizes entities through the introduction of a hierarchical tree structure, while the Cuckoo Filter serves as an efficient data structure that supports rapid membership queries and dynamic updates. The experiment results demonstrate that our method is much faster than naive Tree-RAG while maintaining high levels of generative quality. When the number of trees is large, our method is hundreds of times faster than naive Tree-RAG. Our work is available at https://github.com/TUPYP7180/CFT-RAG-2025.
△ Less
Submitted 25 January, 2025;
originally announced January 2025.
-
Towards accurate and reliable ICU outcome prediction: a multimodal learning framework based on belief function theory using structured EHRs and free-text notes
Authors:
Yucheng Ruan,
Daniel J. Tan,
See Kiong Ng,
Ling Huang,
Mengling Feng
Abstract:
Accurate Intensive Care Unit (ICU) outcome prediction is critical for improving patient treatment quality and ICU resource allocation. Existing research mainly focuses on structured data, e.g. demographics and vital signs, and lacks effective frameworks to integrate clinical notes from heterogeneous electronic health records (EHRs). This study aims to explore a multimodal framework based on belief…
▽ More
Accurate Intensive Care Unit (ICU) outcome prediction is critical for improving patient treatment quality and ICU resource allocation. Existing research mainly focuses on structured data, e.g. demographics and vital signs, and lacks effective frameworks to integrate clinical notes from heterogeneous electronic health records (EHRs). This study aims to explore a multimodal framework based on belief function theory that can effectively fuse heterogeneous structured EHRs and free-text notes for accurate and reliable ICU outcome prediction. The fusion strategy accounts for prediction uncertainty within each modality and conflicts between multimodal data. The experiments on MIMIC-III dataset show that our framework provides more accurate and reliable predictions than existing approaches. Specifically, it outperformed the best baseline by 1.05%/1.02% in BACC, 9.74%/6.04% in F1 score, 1.28%/0.9% in AUROC, and 6.21%/2.68% in AUPRC for predicting mortality and PLOS, respectively. Additionally, it improved the reliability of the predictions with a 26.8%/15.1% reduction in the Brier score and a 25.0%/13.3% reduction in negative log-likelihood. By effectively reducing false positives, the model can aid in better allocation of medical resources in the ICU. Furthermore, the proposed method is very versatile and can be extended to analyzing multimodal EHRs for other clinical tasks. The code implementation is available on https://github.com/yuchengruan/evid_multimodal_ehr.
△ Less
Submitted 25 February, 2025; v1 submitted 8 January, 2025;
originally announced January 2025.
-
Foundation Models for Wearable Movement Data in Mental Health Research
Authors:
Franklin Y. Ruan,
Aiwei Zhang,
Jenny Y. Oh,
SouYoung Jin,
Nicholas C. Jacobson
Abstract:
Pretrained foundation models and transformer architectures have driven the success of large language models (LLMs) and other modern AI breakthroughs. However, similar advancements in health data modeling remain limited due to the need for innovative adaptations. Wearable movement data offers a valuable avenue for exploration, as it's a core feature in nearly all commercial smartwatches, well estab…
▽ More
Pretrained foundation models and transformer architectures have driven the success of large language models (LLMs) and other modern AI breakthroughs. However, similar advancements in health data modeling remain limited due to the need for innovative adaptations. Wearable movement data offers a valuable avenue for exploration, as it's a core feature in nearly all commercial smartwatches, well established in clinical and mental health research, and the sequential nature of the data shares similarities to language. We introduce the Pretrained Actigraphy Transformer (PAT), the first open source foundation model designed for time-series wearable movement data. Leveraging transformer-based architectures and novel techniques, such as patch embeddings, and pretraining on data from 29,307 participants in a national U.S. sample, PAT achieves state-of-the-art performance in several mental health prediction tasks. PAT is also lightweight and easily interpretable, making it a robust tool for mental health research.
GitHub: https://github.com/njacobsonlab/Pretrained-Actigraphy-Transformer/
△ Less
Submitted 28 June, 2025; v1 submitted 21 November, 2024;
originally announced November 2024.
-
InterFormer: Effective Heterogeneous Interaction Learning for Click-Through Rate Prediction
Authors:
Zhichen Zeng,
Xiaolong Liu,
Mengyue Hang,
Xiaoyi Liu,
Qinghai Zhou,
Chaofei Yang,
Yiqun Liu,
Yichen Ruan,
Laming Chen,
Yuxin Chen,
Yujia Hao,
Jiaqi Xu,
Jade Nie,
Xi Liu,
Buyun Zhang,
Wei Wen,
Siyang Yuan,
Hang Yin,
Xin Zhang,
Kai Wang,
Wen-Yen Chen,
Yiping Han,
Huayu Li,
Chunzhi Yang,
Bo Long
, et al. (3 additional authors not shown)
Abstract:
Click-through rate (CTR) prediction, which predicts the probability of a user clicking an ad, is a fundamental task in recommender systems. The emergence of heterogeneous information, such as user profile and behavior sequences, depicts user interests from different aspects. A mutually beneficial integration of heterogeneous information is the cornerstone towards the success of CTR prediction. How…
▽ More
Click-through rate (CTR) prediction, which predicts the probability of a user clicking an ad, is a fundamental task in recommender systems. The emergence of heterogeneous information, such as user profile and behavior sequences, depicts user interests from different aspects. A mutually beneficial integration of heterogeneous information is the cornerstone towards the success of CTR prediction. However, most of the existing methods suffer from two fundamental limitations, including (1) insufficient inter-mode interaction due to the unidirectional information flow between modes, and (2) aggressive information aggregation caused by early summarization, resulting in excessive information loss. To address the above limitations, we propose a novel module named InterFormer to learn heterogeneous information interaction in an interleaving style. To achieve better interaction learning, InterFormer enables bidirectional information flow for mutually beneficial learning across different modes. To avoid aggressive information aggregation, we retain complete information in each data mode and use a separate bridging arch for effective information selection and summarization. Our proposed InterFormer achieves state-of-the-art performance on three public datasets and a large-scale industrial dataset.
△ Less
Submitted 25 June, 2025; v1 submitted 14 November, 2024;
originally announced November 2024.
-
Graph-based Uncertainty Metrics for Long-form Language Model Outputs
Authors:
Mingjian Jiang,
Yangjun Ruan,
Prasanna Sattigeri,
Salim Roukos,
Tatsunori Hashimoto
Abstract:
Recent advancements in Large Language Models (LLMs) have significantly improved text generation capabilities, but these systems are still known to hallucinate, and granular uncertainty estimation for long-form LLM generations remains challenging. In this work, we propose Graph Uncertainty -- which represents the relationship between LLM generations and claims within them as a bipartite graph and e…
▽ More
Recent advancements in Large Language Models (LLMs) have significantly improved text generation capabilities, but these systems are still known to hallucinate, and granular uncertainty estimation for long-form LLM generations remains challenging. In this work, we propose Graph Uncertainty -- which represents the relationship between LLM generations and claims within them as a bipartite graph and estimates the claim-level uncertainty with a family of graph centrality metrics. Under this view, existing uncertainty estimation methods based on the concept of self-consistency can be viewed as using degree centrality as an uncertainty measure, and we show that more sophisticated alternatives such as closeness centrality provide consistent gains at claim-level uncertainty estimation. Moreover, we present uncertainty-aware decoding techniques that leverage both the graph structure and uncertainty estimates to improve the factuality of LLM generations by preserving only the most reliable claims. Compared to existing methods, our graph-based uncertainty metrics lead to an average of 6.8% relative gains on AUPRC across various long-form generation settings, and our end-to-end system provides consistent 2-4% gains in factuality over existing decoding techniques while significantly improving the informativeness of generated responses.
△ Less
Submitted 28 October, 2024;
originally announced October 2024.
-
Power Control and Random Serving Mode Allocation for CJT-NCJT Hybrid Mode Enabled Cell-Free Massive MIMO With Limited Fronthauls
Authors:
Hangyu Zhang,
Rui Zhang,
Yongzhao Li,
Yuhan Ruan,
Tao Li,
Dong Yang
Abstract:
With a great potential of improving the service fairness and quality for user equipments (UEs), cell-free massive multiple-input multiple-output (mMIMO) has been regarded as an emerging candidate for 6G network architectures. Under ideal assumptions, the coherent joint transmission (CJT) serving mode has been considered as an optimal option for cell-free mMIMO systems, since it can achieve coheren…
▽ More
With a great potential of improving the service fairness and quality for user equipments (UEs), cell-free massive multiple-input multiple-output (mMIMO) has been regarded as an emerging candidate for 6G network architectures. Under ideal assumptions, the coherent joint transmission (CJT) serving mode has been considered as an optimal option for cell-free mMIMO systems, since it can achieve coherent cooperation gain among the access points. However, when considering the limited fronthaul constraint in practice, the non-coherent joint transmission (NCJT) serving mode is likely to outperform CJT, since the former requires much lower fronthaul resources. In other words, the performance excellence and worseness of single serving mode (CJT or NCJT) depends on the fronthaul capacity, and any single transmission mode cannot perfectly adapt the capacity limited fronthaul. To explore the performance potential of the cell-free mMIMO system with limited fronthauls by harnessing the merits of CJT and NCJT, we propose a CJT-NCJT hybrid serving mode framework, in which UEs are allocated to operate on CJT or NCJT serving mode. To improve the sum-rate of the system with low complexity, we first propose a probability-based random serving mode allocation scheme. With a given serving mode, a successive convex approximation-based power allocation algorithm is proposed to maximize the system's sum-rate. Simulation results demonstrate the superiority of the proposed scheme.
△ Less
Submitted 3 September, 2024;
originally announced September 2024.
-
3D Priors-Guided Diffusion for Blind Face Restoration
Authors:
Xiaobin Lu,
Xiaobin Hu,
Jun Luo,
Ben Zhu,
Yaping Ruan,
Wenqi Ren
Abstract:
Blind face restoration endeavors to restore a clear face image from a degraded counterpart. Recent approaches employing Generative Adversarial Networks (GANs) as priors have demonstrated remarkable success in this field. However, these methods encounter challenges in achieving a balance between realism and fidelity, particularly in complex degradation scenarios. To inherit the exceptional realism…
▽ More
Blind face restoration endeavors to restore a clear face image from a degraded counterpart. Recent approaches employing Generative Adversarial Networks (GANs) as priors have demonstrated remarkable success in this field. However, these methods encounter challenges in achieving a balance between realism and fidelity, particularly in complex degradation scenarios. To inherit the exceptional realism generative ability of the diffusion model and also constrained by the identity-aware fidelity, we propose a novel diffusion-based framework by embedding the 3D facial priors as structure and identity constraints into a denoising diffusion process. Specifically, in order to obtain more accurate 3D prior representations, the 3D facial image is reconstructed by a 3D Morphable Model (3DMM) using an initial restored face image that has been processed by a pretrained restoration network. A customized multi-level feature extraction method is employed to exploit both structural and identity information of 3D facial images, which are then mapped into the noise estimation process. In order to enhance the fusion of identity information into the noise estimation, we propose a Time-Aware Fusion Block (TAFB). This module offers a more efficient and adaptive fusion of weights for denoising, considering the dynamic nature of the denoising process in the diffusion model, which involves initial structure refinement followed by texture detail enhancement. Extensive experiments demonstrate that our network performs favorably against state-of-the-art algorithms on synthetic and real-world datasets for blind face restoration. The Code is released on our project page at https://github.com/838143396/3Diffusion.
△ Less
Submitted 12 September, 2024; v1 submitted 2 September, 2024;
originally announced September 2024.
-
Language Modeling on Tabular Data: A Survey of Foundations, Techniques and Evolution
Authors:
Yucheng Ruan,
Xiang Lan,
Jingying Ma,
Yizhi Dong,
Kai He,
Mengling Feng
Abstract:
Tabular data, a prevalent data type across various domains, presents unique challenges due to its heterogeneous nature and complex structural relationships. Achieving high predictive performance and robustness in tabular data analysis holds significant promise for numerous applications. Influenced by recent advancements in natural language processing, particularly transformer architectures, new me…
▽ More
Tabular data, a prevalent data type across various domains, presents unique challenges due to its heterogeneous nature and complex structural relationships. Achieving high predictive performance and robustness in tabular data analysis holds significant promise for numerous applications. Influenced by recent advancements in natural language processing, particularly transformer architectures, new methods for tabular data modeling have emerged. Early techniques concentrated on pre-training transformers from scratch, often encountering scalability issues. Subsequently, methods leveraging pre-trained language models like BERT have been developed, which require less data and yield enhanced performance. The recent advent of large language models, such as GPT and LLaMA, has further revolutionized the field, facilitating more advanced and diverse applications with minimal fine-tuning. Despite the growing interest, a comprehensive survey of language modeling techniques for tabular data remains absent. This paper fills this gap by providing a systematic review of the development of language modeling for tabular data, encompassing: (1) a categorization of different tabular data structures and data types; (2) a review of key datasets used in model training and tasks used for evaluation; (3) a summary of modeling techniques including widely-adopted data processing methods, popular architectures, and training objectives; (4) the evolution from adapting traditional Pre-training/Pre-trained language models to the utilization of large language models; (5) an identification of persistent challenges and potential future research directions in language modeling for tabular data analysis. GitHub page associated with this survey is available at: https://github.com/lanxiang1017/Language-Modeling-on-Tabular-Data-Survey.git.
△ Less
Submitted 20 August, 2024;
originally announced August 2024.
-
ECAFormer: Low-light Image Enhancement using Cross Attention
Authors:
Yudi Ruan,
Hao Ma,
Weikai Li,
Xiao Wang
Abstract:
Low-light image enhancement (LLIE) is critical in computer vision. Existing LLIE methods often fail to discover the underlying relationships between different sub-components, causing the loss of complementary information between multiple modules and network layers, ultimately resulting in the loss of image details. To beat this shortage, we design a hierarchical mutual Enhancement via a Cross Atte…
▽ More
Low-light image enhancement (LLIE) is critical in computer vision. Existing LLIE methods often fail to discover the underlying relationships between different sub-components, causing the loss of complementary information between multiple modules and network layers, ultimately resulting in the loss of image details. To beat this shortage, we design a hierarchical mutual Enhancement via a Cross Attention transformer (ECAFormer), which introduces an architecture that enables concurrent propagation and interaction of multiple features. The model preserves detailed information by introducing a Dual Multi-head self-attention (DMSA), which leverages visual and semantic features across different scales, allowing them to guide and complement each other. Besides, a Cross-Scale DMSA block is introduced to capture the residual connection, integrating cross-layer information to further enhance image detail. Experimental results show that ECAFormer reaches competitive performance across multiple benchmarks, yielding nearly a 3% improvement in PSNR over the suboptimal method, demonstrating the effectiveness of information interaction in LLIE.
△ Less
Submitted 22 December, 2024; v1 submitted 19 June, 2024;
originally announced June 2024.
-
APPL: A Prompt Programming Language for Harmonious Integration of Programs and Large Language Model Prompts
Authors:
Honghua Dong,
Qidong Su,
Yubo Gao,
Zhaoyu Li,
Yangjun Ruan,
Gennady Pekhimenko,
Chris J. Maddison,
Xujie Si
Abstract:
Large Language Models (LLMs) have become increasingly capable of handling diverse tasks with the aid of well-crafted prompts and integration of external tools, but as task complexity rises, the workflow involving LLMs can be complicated and thus challenging to implement and maintain. To address this challenge, we propose APPL, A Prompt Programming Language that acts as a bridge between computer pr…
▽ More
Large Language Models (LLMs) have become increasingly capable of handling diverse tasks with the aid of well-crafted prompts and integration of external tools, but as task complexity rises, the workflow involving LLMs can be complicated and thus challenging to implement and maintain. To address this challenge, we propose APPL, A Prompt Programming Language that acts as a bridge between computer programs and LLMs, allowing seamless embedding of prompts into Python functions, and vice versa. APPL provides an intuitive and Python-native syntax, an efficient parallelized runtime with asynchronous semantics, and a tracing module supporting effective failure diagnosis and replaying without extra costs. We demonstrate that APPL programs are intuitive, concise, and efficient through three representative scenarios: Chain-of-Thought with self-consistency (CoT-SC), ReAct tool use agent, and multi-agent chat. Experiments on three parallelizable workflows further show that APPL can effectively parallelize independent LLM calls, with a significant speedup ratio that almost matches the estimation.
△ Less
Submitted 18 June, 2024;
originally announced June 2024.
-
Quantum Computing in Wireless Communications and Networking: A Tutorial-cum-Survey
Authors:
Wei Zhao,
Tangjie Weng,
Yue Ruan,
Zhi Liu,
Xuangou Wu,
Xiao Zheng,
Nei Kato
Abstract:
Owing to its outstanding parallel computing capabilities, quantum computing (QC) has been a subject of continuous attention. With the gradual maturation of QC platforms, it has increasingly played a significant role in various fields such as transportation, pharmaceuticals, and industrial manufacturing,achieving unprecedented milestones. In modern society, wireless communication stands as an indis…
▽ More
Owing to its outstanding parallel computing capabilities, quantum computing (QC) has been a subject of continuous attention. With the gradual maturation of QC platforms, it has increasingly played a significant role in various fields such as transportation, pharmaceuticals, and industrial manufacturing,achieving unprecedented milestones. In modern society, wireless communication stands as an indispensable infrastructure, with its essence lying in optimization. Although artificial intelligence (AI) algorithms such as Reinforcement Learning (RL) and mathematical optimization have greatly enhanced the performance of wireless communication, the rapid attainment of optimal solutions for wireless communication problems remains an unresolved challenge. QC, however, presents a new alternative. This paper aims to elucidate the fundamentals of QC and explore its applications in wireless communications and networking. First, we will provide a tutorial on QC, covering its basics, characteristics, and popular QC algorithms. Next, we will introduce the applications of QC in communication and networking, followed by its applications in miscellaneous areas such as security and privacy, localization and tracking, and video streaming. Finally,we will discuss remaining open issues before concluding.
△ Less
Submitted 19 November, 2024; v1 submitted 4 June, 2024;
originally announced June 2024.
-
Observational Scaling Laws and the Predictability of Language Model Performance
Authors:
Yangjun Ruan,
Chris J. Maddison,
Tatsunori Hashimoto
Abstract:
Understanding how language model performance varies with scale is critical to benchmark and algorithm development. Scaling laws are one approach to building this understanding, but the requirement of training models across many different scales has limited their use. We propose an alternative, observational approach that bypasses model training and instead builds scaling laws from ~100 publically…
▽ More
Understanding how language model performance varies with scale is critical to benchmark and algorithm development. Scaling laws are one approach to building this understanding, but the requirement of training models across many different scales has limited their use. We propose an alternative, observational approach that bypasses model training and instead builds scaling laws from ~100 publically available models. Building a single scaling law from multiple model families is challenging due to large variations in their training compute efficiencies and capabilities. However, we show that these variations are consistent with a simple, generalized scaling law where language model performance is a function of a low-dimensional capability space, and model families only vary in their efficiency in converting training compute to capabilities. Using this approach, we show the surprising predictability of complex scaling phenomena: we show that several emergent phenomena follow a smooth, sigmoidal behavior and are predictable from small models; we show that the agent performance of models such as GPT-4 can be precisely predicted from simpler non-agentic benchmarks; and we show how to predict the impact of post-training interventions like Chain-of-Thought and Self-Consistency as language model capabilities continue to improve.
△ Less
Submitted 1 October, 2024; v1 submitted 17 May, 2024;
originally announced May 2024.
-
Fair Concurrent Training of Multiple Models in Federated Learning
Authors:
Marie Siew,
Haoran Zhang,
Jong-Ik Park,
Yuezhou Liu,
Yichen Ruan,
Lili Su,
Stratis Ioannidis,
Edmund Yeh,
Carlee Joe-Wong
Abstract:
Federated learning (FL) enables collaborative learning across multiple clients. In most FL work, all clients train a single learning task. However, the recent proliferation of FL applications may increasingly require multiple FL tasks to be trained simultaneously, sharing clients' computing and communication resources, which we call Multiple-Model Federated Learning (MMFL). Current MMFL algorithms…
▽ More
Federated learning (FL) enables collaborative learning across multiple clients. In most FL work, all clients train a single learning task. However, the recent proliferation of FL applications may increasingly require multiple FL tasks to be trained simultaneously, sharing clients' computing and communication resources, which we call Multiple-Model Federated Learning (MMFL). Current MMFL algorithms use naive average-based client-task allocation schemes that can lead to unfair performance when FL tasks have heterogeneous difficulty levels, e.g., tasks with larger models may need more rounds and data to train. Just as naively allocating resources to generic computing jobs with heterogeneous resource needs can lead to unfair outcomes, naive allocation of clients to FL tasks can lead to unfairness, with some tasks having excessively long training times, or lower converged accuracies. Furthermore, in the FL setting, since clients are typically not paid for their training effort, we face a further challenge that some clients may not even be willing to train some tasks, e.g., due to high computational costs, which may exacerbate unfairness in training outcomes across tasks. We address both challenges by firstly designing FedFairMMFL, a difficulty-aware algorithm that dynamically allocates clients to tasks in each training round. We provide guarantees on airness and FedFairMMFL's convergence rate. We then propose a novel auction design that incentivizes clients to train multiple tasks, so as to fairly distribute clients' training efforts across the tasks. We show how our fairness-based learning and incentive mechanisms impact training convergence and finally evaluate our algorithm with multiple sets of learning tasks on real world datasets.
△ Less
Submitted 19 May, 2025; v1 submitted 21 April, 2024;
originally announced April 2024.
-
Deep Prompt Multi-task Network for Abuse Language Detection
Authors:
Jian Zhu,
Yuping Ruan,
Jingfei Chang,
Wenhui Sun,
Hui Wan,
Jian Long,
Cheng Luo
Abstract:
The detection of abusive language remains a long-standing challenge with the extensive use of social networks. The detection task of abusive language suffers from limited accuracy. We argue that the existing detection methods utilize the fine-tuning technique of the pre-trained language models (PLMs) to handle downstream tasks. Hence, these methods fail to stimulate the general knowledge of the PL…
▽ More
The detection of abusive language remains a long-standing challenge with the extensive use of social networks. The detection task of abusive language suffers from limited accuracy. We argue that the existing detection methods utilize the fine-tuning technique of the pre-trained language models (PLMs) to handle downstream tasks. Hence, these methods fail to stimulate the general knowledge of the PLMs. To address the problem, we propose a novel Deep Prompt Multi-task Network (DPMN) for abuse language detection. Specifically, DPMN first attempts to design two forms of deep prompt tuning and light prompt tuning for the PLMs. The effects of different prompt lengths, tuning strategies, and prompt initialization methods on detecting abusive language are studied. In addition, we propose a Task Head based on Bi-LSTM and FFN, which can be used as a short text classifier. Eventually, DPMN utilizes multi-task learning to improve detection metrics further. The multi-task network has the function of transferring effective knowledge. The proposed DPMN is evaluated against eight typical methods on three public datasets: OLID, SOLID, and AbuseAnalyzer. The experimental results show that our DPMN outperforms the state-of-the-art methods.
△ Less
Submitted 24 June, 2024; v1 submitted 8 March, 2024;
originally announced March 2024.
-
Efficient CNN-LSTM based Parameter Estimation of Levy Driven Stochastic Differential Equations
Authors:
Shuaiyu Li,
Yang Ruan,
Changzhou Long,
Yuzhong Cheng
Abstract:
This study addresses the challenges in parameter estimation of stochastic differential equations driven by non-Gaussian noises, which are critical in understanding dynamic phenomena such as price fluctuations and the spread of infectious diseases. Previous research highlighted the potential of LSTM networks in estimating parameters of alpha stable Levy driven SDEs but faced limitations including h…
▽ More
This study addresses the challenges in parameter estimation of stochastic differential equations driven by non-Gaussian noises, which are critical in understanding dynamic phenomena such as price fluctuations and the spread of infectious diseases. Previous research highlighted the potential of LSTM networks in estimating parameters of alpha stable Levy driven SDEs but faced limitations including high time complexity and constraints of the LSTM chaining property. To mitigate these issues, we introduce the PEnet, a novel CNN-LSTM-based three-stage model that offers an end to end approach with superior accuracy and adaptability to varying data structures, enhanced inference speed for long sequence observations through initial data feature condensation by CNN, and high generalization capability, allowing its application to various complex SDE scenarios. Experiments on synthetic datasets confirm PEnet significant advantage in estimating SDE parameters associated with noise characteristics, establishing it as a competitive method for SDE parameter estimation in the presence of Levy noise.
△ Less
Submitted 7 March, 2024;
originally announced March 2024.
-
RedCore: Relative Advantage Aware Cross-modal Representation Learning for Missing Modalities with Imbalanced Missing Rates
Authors:
Jun Sun,
Xinxin Zhang,
Shoukang Han,
Yu-ping Ruan,
Taihao Li
Abstract:
Multimodal learning is susceptible to modality missing, which poses a major obstacle for its practical applications and, thus, invigorates increasing research interest. In this paper, we investigate two challenging problems: 1) when modality missing exists in the training data, how to exploit the incomplete samples while guaranteeing that they are properly supervised? 2) when the missing rates of…
▽ More
Multimodal learning is susceptible to modality missing, which poses a major obstacle for its practical applications and, thus, invigorates increasing research interest. In this paper, we investigate two challenging problems: 1) when modality missing exists in the training data, how to exploit the incomplete samples while guaranteeing that they are properly supervised? 2) when the missing rates of different modalities vary, causing or exacerbating the imbalance among modalities, how to address the imbalance and ensure all modalities are well-trained? To tackle these two challenges, we first introduce the variational information bottleneck (VIB) method for the cross-modal representation learning of missing modalities, which capitalizes on the available modalities and the labels as supervision. Then, accounting for the imbalanced missing rates, we define relative advantage to quantify the advantage of each modality over others. Accordingly, a bi-level optimization problem is formulated to adaptively regulate the supervision of all modalities during training. As a whole, the proposed approach features \textbf{Re}lative a\textbf{d}vantage aware \textbf{C}ross-m\textbf{o}dal \textbf{r}epresentation l\textbf{e}arning (abbreviated as \textbf{RedCore}) for missing modalities with imbalanced missing rates. Extensive empirical results demonstrate that RedCore outperforms competing models in that it exhibits superior robustness against either large or imbalanced missing rates.
△ Less
Submitted 16 December, 2023;
originally announced December 2023.
-
Edge Wasserstein Distance Loss for Oriented Object Detection
Authors:
Yuke Zhu,
Yumeng Ruan,
Zihua Xiong,
Sheng Guo
Abstract:
Regression loss design is an essential topic for oriented object detection. Due to the periodicity of the angle and the ambiguity of width and height definition, traditional L1-distance loss and its variants have been suffered from the metric discontinuity and the square-like problem. As a solution, the distribution based methods show significant advantages by representing oriented boxes as distri…
▽ More
Regression loss design is an essential topic for oriented object detection. Due to the periodicity of the angle and the ambiguity of width and height definition, traditional L1-distance loss and its variants have been suffered from the metric discontinuity and the square-like problem. As a solution, the distribution based methods show significant advantages by representing oriented boxes as distributions. Differing from exploited the Gaussian distribution to get analytical form of distance measure, we propose a novel oriented regression loss, Wasserstein Distance(EWD) loss, to alleviate the square-like problem. Specifically, for the oriented box(OBox) representation, we choose a specially-designed distribution whose probability density function is only nonzero over the edges. On this basis, we develop Wasserstein distance as the measure. Besides, based on the edge representation of OBox, the EWD loss can be generalized to quadrilateral and polynomial regression scenarios. Experiments on multiple popular datasets and different detectors show the effectiveness of the proposed method.
△ Less
Submitted 12 December, 2023;
originally announced December 2023.
-
TIDE: Test Time Few Shot Object Detection
Authors:
Weikai Li,
Hongfeng Wei,
Yanlai Wu,
Jie Yang,
Yudi Ruan,
Yuan Li,
Ying Tang
Abstract:
Few-shot object detection (FSOD) aims to extract semantic knowledge from limited object instances of novel categories within a target domain. Recent advances in FSOD focus on fine-tuning the base model based on a few objects via meta-learning or data augmentation. Despite their success, the majority of them are grounded with parametric readjustment to generalize on novel objects, which face consid…
▽ More
Few-shot object detection (FSOD) aims to extract semantic knowledge from limited object instances of novel categories within a target domain. Recent advances in FSOD focus on fine-tuning the base model based on a few objects via meta-learning or data augmentation. Despite their success, the majority of them are grounded with parametric readjustment to generalize on novel objects, which face considerable challenges in Industry 5.0, such as (i) a certain amount of fine-tuning time is required, and (ii) the parameters of the constructed model being unavailable due to the privilege protection, making the fine-tuning fail. Such constraints naturally limit its application in scenarios with real-time configuration requirements or within black-box settings. To tackle the challenges mentioned above, we formalize a novel FSOD task, referred to as Test TIme Few Shot DEtection (TIDE), where the model is un-tuned in the configuration procedure. To that end, we introduce an asymmetric architecture for learning a support-instance-guided dynamic category classifier. Further, a cross-attention module and a multi-scale resizer are provided to enhance the model performance. Experimental results on multiple few-shot object detection platforms reveal that the proposed TIDE significantly outperforms existing contemporary methods. The implementation codes are available at https://github.com/deku-0621/TIDE
△ Less
Submitted 30 November, 2023;
originally announced November 2023.
-
Towards Hyperparameter-Agnostic DNN Training via Dynamical System Insights
Authors:
Carmel Fiscko,
Aayushya Agarwal,
Yihan Ruan,
Soummya Kar,
Larry Pileggi,
Bruno Sinopoli
Abstract:
We present a stochastic first-order optimization method specialized for deep neural networks (DNNs), ECCO-DNN. This method models the optimization variable trajectory as a dynamical system and develops a discretization algorithm that adaptively selects step sizes based on the trajectory's shape. This provides two key insights: designing the dynamical system for fast continuous-time convergence and…
▽ More
We present a stochastic first-order optimization method specialized for deep neural networks (DNNs), ECCO-DNN. This method models the optimization variable trajectory as a dynamical system and develops a discretization algorithm that adaptively selects step sizes based on the trajectory's shape. This provides two key insights: designing the dynamical system for fast continuous-time convergence and developing a time-stepping algorithm to adaptively select step sizes based on principles of numerical integration and neural network structure. The result is an optimizer with performance that is insensitive to hyperparameter variations and that achieves comparable performance to state-of-the-art optimizers including ADAM, SGD, RMSProp, and AdaGrad. We demonstrate this in training DNN models and datasets, including CIFAR-10 and CIFAR-100 using ECCO-DNN and find that ECCO-DNN's single hyperparameter can be changed by three orders of magnitude without affecting the trained models' accuracies. ECCO-DNN's insensitivity reduces the data and computation needed for hyperparameter tuning, making it advantageous for rapid prototyping and for applications with new datasets. To validate the efficacy of our proposed optimizer, we train an LSTM architecture on a household power consumption dataset with ECCO-DNN and achieve an optimal mean-square-error without tuning hyperparameters.
△ Less
Submitted 20 October, 2023;
originally announced October 2023.
-
On Synthetic Data for Back Translation
Authors:
Jiahao Xu,
Yubin Ruan,
Wei Bi,
Guoping Huang,
Shuming Shi,
Lihui Chen,
Lemao Liu
Abstract:
Back translation (BT) is one of the most significant technologies in NMT research fields. Existing attempts on BT share a common characteristic: they employ either beam search or random sampling to generate synthetic data with a backward model but seldom work studies the role of synthetic data in the performance of BT. This motivates us to ask a fundamental question: {\em what kind of synthetic da…
▽ More
Back translation (BT) is one of the most significant technologies in NMT research fields. Existing attempts on BT share a common characteristic: they employ either beam search or random sampling to generate synthetic data with a backward model but seldom work studies the role of synthetic data in the performance of BT. This motivates us to ask a fundamental question: {\em what kind of synthetic data contributes to BT performance?} Through both theoretical and empirical studies, we identify two key factors on synthetic data controlling the back-translation NMT performance, which are quality and importance. Furthermore, based on our findings, we propose a simple yet effective method to generate synthetic data to better trade off both factors so as to yield a better performance for BT. We run extensive experiments on WMT14 DE-EN, EN-DE, and RU-EN benchmark tasks. By employing our proposed method to generate synthetic data, our BT model significantly outperforms the standard BT baselines (i.e., beam and sampling based methods for data generation), which proves the effectiveness of our proposed methods.
△ Less
Submitted 20 October, 2023;
originally announced October 2023.
-
A Survey of Large Language Models for Healthcare: from Data, Technology, and Applications to Accountability and Ethics
Authors:
Kai He,
Rui Mao,
Qika Lin,
Yucheng Ruan,
Xiang Lan,
Mengling Feng,
Erik Cambria
Abstract:
The utilization of large language models (LLMs) in the Healthcare domain has generated both excitement and concern due to their ability to effectively respond to freetext queries with certain professional knowledge. This survey outlines the capabilities of the currently developed LLMs for Healthcare and explicates their development process, with the aim of providing an overview of the development…
▽ More
The utilization of large language models (LLMs) in the Healthcare domain has generated both excitement and concern due to their ability to effectively respond to freetext queries with certain professional knowledge. This survey outlines the capabilities of the currently developed LLMs for Healthcare and explicates their development process, with the aim of providing an overview of the development roadmap from traditional Pretrained Language Models (PLMs) to LLMs. Specifically, we first explore the potential of LLMs to enhance the efficiency and effectiveness of various Healthcare applications highlighting both the strengths and limitations. Secondly, we conduct a comparison between the previous PLMs and the latest LLMs, as well as comparing various LLMs with each other. Then we summarize related Healthcare training data, training methods, optimization strategies, and usage. Finally, the unique concerns associated with deploying LLMs in Healthcare settings are investigated, particularly regarding fairness, accountability, transparency and ethics. Our survey provide a comprehensive investigation from perspectives of both computer science and Healthcare specialty. Besides the discussion about Healthcare concerns, we supports the computer science community by compiling a collection of open source resources, such as accessible datasets, the latest methodologies, code implementations, and evaluation benchmarks in the Github. Summarily, we contend that a significant paradigm shift is underway, transitioning from PLMs to LLMs. This shift encompasses a move from discriminative AI approaches to generative AI approaches, as well as a shift from model-centered methodologies to data-centered methodologies. Also, we determine that the biggest obstacle of using LLMs in Healthcare are fairness, accountability, transparency and ethics.
△ Less
Submitted 26 January, 2025; v1 submitted 9 October, 2023;
originally announced October 2023.
-
Identifying the Risks of LM Agents with an LM-Emulated Sandbox
Authors:
Yangjun Ruan,
Honghua Dong,
Andrew Wang,
Silviu Pitis,
Yongchao Zhou,
Jimmy Ba,
Yann Dubois,
Chris J. Maddison,
Tatsunori Hashimoto
Abstract:
Recent advances in Language Model (LM) agents and tool use, exemplified by applications like ChatGPT Plugins, enable a rich set of capabilities but also amplify potential risks - such as leaking private data or causing financial losses. Identifying these risks is labor-intensive, necessitating implementing the tools, setting up the environment for each test scenario manually, and finding risky cas…
▽ More
Recent advances in Language Model (LM) agents and tool use, exemplified by applications like ChatGPT Plugins, enable a rich set of capabilities but also amplify potential risks - such as leaking private data or causing financial losses. Identifying these risks is labor-intensive, necessitating implementing the tools, setting up the environment for each test scenario manually, and finding risky cases. As tools and agents become more complex, the high cost of testing these agents will make it increasingly difficult to find high-stakes, long-tailed risks. To address these challenges, we introduce ToolEmu: a framework that uses an LM to emulate tool execution and enables the testing of LM agents against a diverse range of tools and scenarios, without manual instantiation. Alongside the emulator, we develop an LM-based automatic safety evaluator that examines agent failures and quantifies associated risks. We test both the tool emulator and evaluator through human evaluation and find that 68.8% of failures identified with ToolEmu would be valid real-world agent failures. Using our curated initial benchmark consisting of 36 high-stakes tools and 144 test cases, we provide a quantitative risk analysis of current LM agents and identify numerous failures with potentially severe outcomes. Notably, even the safest LM agent exhibits such failures 23.9% of the time according to our evaluator, underscoring the need to develop safer LM agents for real-world deployment.
△ Less
Submitted 17 May, 2024; v1 submitted 25 September, 2023;
originally announced September 2023.
-
Boosting Few-shot Action Recognition with Graph-guided Hybrid Matching
Authors:
Jiazheng Xing,
Mengmeng Wang,
Yudi Ruan,
Bofan Chen,
Yaowei Guo,
Boyu Mu,
Guang Dai,
Jingdong Wang,
Yong Liu
Abstract:
Class prototype construction and matching are core aspects of few-shot action recognition. Previous methods mainly focus on designing spatiotemporal relation modeling modules or complex temporal alignment algorithms. Despite the promising results, they ignored the value of class prototype construction and matching, leading to unsatisfactory performance in recognizing similar categories in every ta…
▽ More
Class prototype construction and matching are core aspects of few-shot action recognition. Previous methods mainly focus on designing spatiotemporal relation modeling modules or complex temporal alignment algorithms. Despite the promising results, they ignored the value of class prototype construction and matching, leading to unsatisfactory performance in recognizing similar categories in every task. In this paper, we propose GgHM, a new framework with Graph-guided Hybrid Matching. Concretely, we learn task-oriented features by the guidance of a graph neural network during class prototype construction, optimizing the intra- and inter-class feature correlation explicitly. Next, we design a hybrid matching strategy, combining frame-level and tuple-level matching to classify videos with multivariate styles. We additionally propose a learnable dense temporal modeling module to enhance the video feature temporal representation to build a more solid foundation for the matching process. GgHM shows consistent improvements over other challenging baselines on several few-shot datasets, demonstrating the effectiveness of our method. The code will be publicly available at https://github.com/jiazheng-xing/GgHM.
△ Less
Submitted 18 August, 2023;
originally announced August 2023.
-
pTSE: A Multi-model Ensemble Method for Probabilistic Time Series Forecasting
Authors:
Yunyi Zhou,
Zhixuan Chu,
Yijia Ruan,
Ge Jin,
Yuchen Huang,
Sheng Li
Abstract:
Various probabilistic time series forecasting models have sprung up and shown remarkably good performance. However, the choice of model highly relies on the characteristics of the input time series and the fixed distribution that the model is based on. Due to the fact that the probability distributions cannot be averaged over different models straightforwardly, the current time series model ensemb…
▽ More
Various probabilistic time series forecasting models have sprung up and shown remarkably good performance. However, the choice of model highly relies on the characteristics of the input time series and the fixed distribution that the model is based on. Due to the fact that the probability distributions cannot be averaged over different models straightforwardly, the current time series model ensemble methods cannot be directly applied to improve the robustness and accuracy of forecasting. To address this issue, we propose pTSE, a multi-model distribution ensemble method for probabilistic forecasting based on Hidden Markov Model (HMM). pTSE only takes off-the-shelf outputs from member models without requiring further information about each model. Besides, we provide a complete theoretical analysis of pTSE to prove that the empirical distribution of time series subject to an HMM will converge to the stationary distribution almost surely. Experiments on benchmarks show the superiority of pTSE overall member models and competitive ensemble methods.
△ Less
Submitted 30 August, 2023; v1 submitted 16 May, 2023;
originally announced May 2023.
-
PolarDB-IMCI: A Cloud-Native HTAP Database System at Alibaba
Authors:
Jianying Wang,
Tongliang Li,
Haoze Song,
Xinjun Yang,
Wenchao Zhou,
Feifei Li,
Baoyue Yan,
Qianqian Wu,
Yukun Liang,
Chengjun Ying,
Yujie Wang,
Baokai Chen,
Chang Cai,
Yubin Ruan,
Xiaoyi Weng,
Shibin Chen,
Liang Yin,
Chengzhong Yang,
Xin Cai,
Hongyan Xing,
Nanlong Yu,
Xiaofei Chen,
Dapeng Huang,
Jianling Sun
Abstract:
Cloud-native databases have become the de-facto choice for mission-critical applications on the cloud due to the need for high availability, resource elasticity, and cost efficiency. Meanwhile, driven by the increasing connectivity between data generation and analysis, users prefer a single database to efficiently process both OLTP and OLAP workloads, which enhances data freshness and reduces the…
▽ More
Cloud-native databases have become the de-facto choice for mission-critical applications on the cloud due to the need for high availability, resource elasticity, and cost efficiency. Meanwhile, driven by the increasing connectivity between data generation and analysis, users prefer a single database to efficiently process both OLTP and OLAP workloads, which enhances data freshness and reduces the complexity of data synchronization and the overall business cost.
In this paper, we summarize five crucial design goals for a cloud-native HTAP database based on our experience and customers' feedback, i.e., transparency, competitive OLAP performance, minimal perturbation on OLTP workloads, high data freshness, and excellent resource elasticity. As our solution to realize these goals, we present PolarDB-IMCI, a cloud-native HTAP database system designed and deployed at Alibaba Cloud. Our evaluation results show that PolarDB-IMCI is able to handle HTAP efficiently on both experimental and production workloads; notably, it speeds up analytical queries up to $\times149$ on TPC-H (100 $GB$). PolarDB-IMCI introduces low visibility delay and little performance perturbation on OLTP workloads (< 5%), and resource elasticity can be achieved by scaling out in tens of seconds.
△ Less
Submitted 15 May, 2023;
originally announced May 2023.
-
Reference-based OCT Angiogram Super-resolution with Learnable Texture Generation
Authors:
Yuyan Ruan,
Dawei Yang,
Ziqi Tang,
An Ran Ran,
Carol Y. Cheung,
Hao Chen
Abstract:
Optical coherence tomography angiography (OCTA) is a new imaging modality to visualize retinal microvasculature and has been readily adopted in clinics. High-resolution OCT angiograms are important to qualitatively and quantitatively identify potential biomarkers for different retinal diseases accurately. However, one significant problem of OCTA is the inevitable decrease in resolution when increa…
▽ More
Optical coherence tomography angiography (OCTA) is a new imaging modality to visualize retinal microvasculature and has been readily adopted in clinics. High-resolution OCT angiograms are important to qualitatively and quantitatively identify potential biomarkers for different retinal diseases accurately. However, one significant problem of OCTA is the inevitable decrease in resolution when increasing the field-of-view given a fixed acquisition time. To address this issue, we propose a novel reference-based super-resolution (RefSR) framework to preserve the resolution of the OCT angiograms while increasing the scanning area. Specifically, textures from the normal RefSR pipeline are used to train a learnable texture generator (LTG), which is designed to generate textures according to the input. The key difference between the proposed method and traditional RefSR models is that the textures used during inference are generated by the LTG instead of being searched from a single reference image. Since the LTG is optimized throughout the whole training process, the available texture space is significantly enlarged and no longer limited to a single reference image, but extends to all textures contained in the training samples. Moreover, our proposed LTGNet does not require a reference image at the inference phase, thereby becoming invulnerable to the selection of the reference image. Both experimental and visual results show that LTGNet has superior performance and robustness over state-of-the-art methods, indicating good reliability and promise in real-life deployment. The source code will be made available upon acceptance.
△ Less
Submitted 9 May, 2023;
originally announced May 2023.
-
P-Transformer: A Prompt-based Multimodal Transformer Architecture For Medical Tabular Data
Authors:
Yucheng Ruan,
Xiang Lan,
Daniel J. Tan,
Hairil Rizal Abdullah,
Mengling Feng
Abstract:
Medical tabular data, abundant in Electronic Health Records (EHRs), is a valuable resource for diverse medical tasks such as risk prediction. While deep learning approaches, particularly transformer-based models, have shown remarkable performance in tabular data prediction, there are still problems remaining for existing work to be effectively adapted into medical domain, such as ignoring unstruct…
▽ More
Medical tabular data, abundant in Electronic Health Records (EHRs), is a valuable resource for diverse medical tasks such as risk prediction. While deep learning approaches, particularly transformer-based models, have shown remarkable performance in tabular data prediction, there are still problems remaining for existing work to be effectively adapted into medical domain, such as ignoring unstructured free-texts and underutilizing the textual information in structured data. To address these issues, we propose PTransformer, a \underline{P}rompt-based multimodal \underline{Transformer} architecture designed specifically for medical tabular data. This framework consists of two critical components: a tabular cell embedding generator and a tabular transformer. The former efficiently encodes diverse modalities from both structured and unstructured tabular data into a harmonized language semantic space with the help of pre-trained sentence encoder and medical prompts. The latter integrates cell representations to generate patient embeddings for various medical tasks. In comprehensive experiments on two real-world datasets for three medical tasks, PTransformer demonstrated the improvements with 10.9%/11.0% on RMSE/MAE, 0.5%/2.2% on RMSE/MAE, and 1.6%/0.8% on BACC/AUROC compared to state-of-the-art (SOTA) baselines in predictability.
△ Less
Submitted 10 April, 2025; v1 submitted 30 March, 2023;
originally announced March 2023.
-
QS-ADN: Quasi-Supervised Artifact Disentanglement Network for Low-Dose CT Image Denoising by Local Similarity Among Unpaired Data
Authors:
Yuhui Ruan,
Qiao Yuan,
Chuang Niu,
Chen Li,
Yudong Yao,
Ge Wang,
Yueyang Teng
Abstract:
Deep learning has been successfully applied to low-dose CT (LDCT) image denoising for reducing potential radiation risk. However, the widely reported supervised LDCT denoising networks require a training set of paired images, which is expensive to obtain and cannot be perfectly simulated. Unsupervised learning utilizes unpaired data and is highly desirable for LDCT denoising. As an example, an art…
▽ More
Deep learning has been successfully applied to low-dose CT (LDCT) image denoising for reducing potential radiation risk. However, the widely reported supervised LDCT denoising networks require a training set of paired images, which is expensive to obtain and cannot be perfectly simulated. Unsupervised learning utilizes unpaired data and is highly desirable for LDCT denoising. As an example, an artifact disentanglement network (ADN) relies on unparied images and obviates the need for supervision but the results of artifact reduction are not as good as those through supervised learning.An important observation is that there is often hidden similarity among unpaired data that can be utilized. This paper introduces a new learning mode, called quasi-supervised learning, to empower the ADN for LDCT image denoising.For every LDCT image, the best matched image is first found from an unpaired normal-dose CT (NDCT) dataset. Then, the matched pairs and the corresponding matching degree as prior information are used to construct and train our ADN-type network for LDCT denoising.The proposed method is different from (but compatible with) supervised and semi-supervised learning modes and can be easily implemented by modifying existing networks. The experimental results show that the method is competitive with state-of-the-art methods in terms of noise suppression and contextual fidelity. The code and working dataset are publicly available at https://github.com/ruanyuhui/ADN-QSDL.git.
△ Less
Submitted 8 February, 2023;
originally announced February 2023.
-
Weighted Ensemble Self-Supervised Learning
Authors:
Yangjun Ruan,
Saurabh Singh,
Warren Morningstar,
Alexander A. Alemi,
Sergey Ioffe,
Ian Fischer,
Joshua V. Dillon
Abstract:
Ensembling has proven to be a powerful technique for boosting model performance, uncertainty estimation, and robustness in supervised learning. Advances in self-supervised learning (SSL) enable leveraging large unlabeled corpora for state-of-the-art few-shot and supervised learning performance. In this paper, we explore how ensemble methods can improve recent SSL techniques by developing a framewo…
▽ More
Ensembling has proven to be a powerful technique for boosting model performance, uncertainty estimation, and robustness in supervised learning. Advances in self-supervised learning (SSL) enable leveraging large unlabeled corpora for state-of-the-art few-shot and supervised learning performance. In this paper, we explore how ensemble methods can improve recent SSL techniques by developing a framework that permits data-dependent weighted cross-entropy losses. We refrain from ensembling the representation backbone; this choice yields an efficient ensemble method that incurs a small training cost and requires no architectural changes or computational overhead to downstream evaluation. The effectiveness of our method is demonstrated with two state-of-the-art SSL methods, DINO (Caron et al., 2021) and MSN (Assran et al., 2022). Our method outperforms both in multiple evaluation metrics on ImageNet-1K, particularly in the few-shot setting. We explore several weighting schemes and find that those which increase the diversity of ensemble heads lead to better downstream evaluation results. Thorough experiments yield improved prior art baselines which our method still surpasses; e.g., our overall improvement with MSN ViT-B/16 is 3.9 p.p. for 1-shot learning.
△ Less
Submitted 9 April, 2023; v1 submitted 17 November, 2022;
originally announced November 2022.
-
Parameter-Efficient Tuning on Layer Normalization for Pre-trained Language Models
Authors:
Wang Qi,
Yu-Ping Ruan,
Yuan Zuo,
Taihao Li
Abstract:
Conventional fine-tuning encounters increasing difficulties given the size of current Pre-trained Language Models, which makes parameter-efficient tuning become the focal point of frontier research. Previous methods in this field add tunable adapters into MHA or/and FFN of Transformer blocks to enable PLMs achieve transferability. However, as an important part of Transformer architecture, the powe…
▽ More
Conventional fine-tuning encounters increasing difficulties given the size of current Pre-trained Language Models, which makes parameter-efficient tuning become the focal point of frontier research. Previous methods in this field add tunable adapters into MHA or/and FFN of Transformer blocks to enable PLMs achieve transferability. However, as an important part of Transformer architecture, the power of layer normalization for parameter-efficent tuning is ignored. In this paper, we first propose LN-tuning, by tuning the gain and bias term of Layer Normalization module with only 0.03\% parameters, which is of high time-efficency and significantly superior to baselines which are less than 0.1\% tunable parameters. Further, we study the unified framework of combining LN-tuning with previous ones and we find that: (1) the unified framework of combining prefix-tuning, the adapter-based method working on MHA, and LN-tuning achieves SOTA performance. (2) unified framework which tunes MHA and LayerNorm simultaneously can get performance improvement but those which tune FFN and LayerNorm simultaneous will cause performance decrease. Ablation study validates LN-tuning is of no abundant parameters and gives a further understanding of it.
△ Less
Submitted 9 December, 2022; v1 submitted 16 November, 2022;
originally announced November 2022.
-
MFFN: Multi-view Feature Fusion Network for Camouflaged Object Detection
Authors:
Dehua Zheng,
Xiaochen Zheng,
Laurence T. Yang,
Yuan Gao,
Chenlu Zhu,
Yiheng Ruan
Abstract:
Recent research about camouflaged object detection (COD) aims to segment highly concealed objects hidden in complex surroundings. The tiny, fuzzy camouflaged objects result in visually indistinguishable properties. However, current single-view COD detectors are sensitive to background distractors. Therefore, blurred boundaries and variable shapes of the camouflaged objects are challenging to be fu…
▽ More
Recent research about camouflaged object detection (COD) aims to segment highly concealed objects hidden in complex surroundings. The tiny, fuzzy camouflaged objects result in visually indistinguishable properties. However, current single-view COD detectors are sensitive to background distractors. Therefore, blurred boundaries and variable shapes of the camouflaged objects are challenging to be fully captured with a single-view detector. To overcome these obstacles, we propose a behavior-inspired framework, called Multi-view Feature Fusion Network (MFFN), which mimics the human behaviors of finding indistinct objects in images, i.e., observing from multiple angles, distances, perspectives. Specifically, the key idea behind it is to generate multiple ways of observation (multi-view) by data augmentation and apply them as inputs. MFFN captures critical boundary and semantic information by comparing and fusing extracted multi-view features. In addition, our MFFN exploits the dependence and interaction between views and channels. Specifically, our methods leverage the complementary information between different views through a two-stage attention module called Co-attention of Multi-view (CAMV). And we design a local-overall module called Channel Fusion Unit (CFU) to explore the channel-wise contextual clues of diverse feature maps in an iterative manner. The experiment results show that our method performs favorably against existing state-of-the-art methods via training with the same data. The code will be available at https://github.com/dwardzheng/MFFN_COD.
△ Less
Submitted 19 October, 2022; v1 submitted 12 October, 2022;
originally announced October 2022.
-
Unsupervised Kinematic Motion Detection for Part-segmented 3D Shape Collections
Authors:
Xianghao Xu,
Yifan Ruan,
Srinath Sridhar,
Daniel Ritchie
Abstract:
3D models of manufactured objects are important for populating virtual worlds and for synthetic data generation for vision and robotics. To be most useful, such objects should be articulated: their parts should move when interacted with. While articulated object datasets exist, creating them is labor-intensive. Learning-based prediction of part motions can help, but all existing methods require an…
▽ More
3D models of manufactured objects are important for populating virtual worlds and for synthetic data generation for vision and robotics. To be most useful, such objects should be articulated: their parts should move when interacted with. While articulated object datasets exist, creating them is labor-intensive. Learning-based prediction of part motions can help, but all existing methods require annotated training data. In this paper, we present an unsupervised approach for discovering articulated motions in a part-segmented 3D shape collection. Our approach is based on a concept we call category closure: any valid articulation of an object's parts should keep the object in the same semantic category (e.g. a chair stays a chair). We operationalize this concept with an algorithm that optimizes a shape's part motion parameters such that it can transform into other shapes in the collection. We evaluate our approach by using it to re-discover part motions from the PartNet-Mobility dataset. For almost all shape categories, our method's predicted motion parameters have low error with respect to ground truth annotations, outperforming two supervised motion prediction methods.
△ Less
Submitted 16 June, 2022;
originally announced June 2022.
-
UFRC: A Unified Framework for Reliable COVID-19 Detection on Crowdsourced Cough Audio
Authors:
Jiangeng Chang,
Yucheng Ruan,
Cui Shaoze,
John Soong Tshon Yit,
Mengling Feng
Abstract:
We suggested a unified system with core components of data augmentation, ImageNet-pretrained ResNet-50, cost-sensitive loss, deep ensemble learning, and uncertainty estimation to quickly and consistently detect COVID-19 using acoustic evidence. To increase the model's capacity to identify a minority class, data augmentation and cost-sensitive loss are incorporated (infected samples). In the COVID-…
▽ More
We suggested a unified system with core components of data augmentation, ImageNet-pretrained ResNet-50, cost-sensitive loss, deep ensemble learning, and uncertainty estimation to quickly and consistently detect COVID-19 using acoustic evidence. To increase the model's capacity to identify a minority class, data augmentation and cost-sensitive loss are incorporated (infected samples). In the COVID-19 detection challenge, ImageNet-pretrained ResNet-50 has been found to be effective. The unified framework also integrates deep ensemble learning and uncertainty estimation to integrate predictions from various base classifiers for generalisation and reliability. We ran a series of tests using the DiCOVA2021 challenge dataset to assess the efficacy of our proposed method, and the results show that our method has an AUC-ROC of 85.43 percent, making it a promising method for COVID-19 detection. The unified framework also demonstrates that audio may be used to quickly diagnose different respiratory disorders.
△ Less
Submitted 30 June, 2022; v1 submitted 16 April, 2022;
originally announced April 2022.
-
Augment with Care: Contrastive Learning for Combinatorial Problems
Authors:
Haonan Duan,
Pashootan Vaezipoor,
Max B. Paulus,
Yangjun Ruan,
Chris J. Maddison
Abstract:
Supervised learning can improve the design of state-of-the-art solvers for combinatorial problems, but labelling large numbers of combinatorial instances is often impractical due to exponential worst-case complexity. Inspired by the recent success of contrastive pre-training for images, we conduct a scientific study of the effect of augmentation design on contrastive pre-training for the Boolean s…
▽ More
Supervised learning can improve the design of state-of-the-art solvers for combinatorial problems, but labelling large numbers of combinatorial instances is often impractical due to exponential worst-case complexity. Inspired by the recent success of contrastive pre-training for images, we conduct a scientific study of the effect of augmentation design on contrastive pre-training for the Boolean satisfiability problem. While typical graph contrastive pre-training uses label-agnostic augmentations, our key insight is that many combinatorial problems have well-studied invariances, which allow for the design of label-preserving augmentations. We find that label-preserving augmentations are critical for the success of contrastive pre-training. We show that our representations are able to achieve comparable test accuracy to fully-supervised learning while using only 1% of the labels. We also demonstrate that our representations are more transferable to larger problems from unseen domains. Our code is available at https://github.com/h4duan/contrastive-sat.
△ Less
Submitted 20 June, 2022; v1 submitted 16 February, 2022;
originally announced February 2022.
-
TriCoLo: Trimodal Contrastive Loss for Text to Shape Retrieval
Authors:
Yue Ruan,
Han-Hung Lee,
Yiming Zhang,
Ke Zhang,
Angel X. Chang
Abstract:
Text-to-shape retrieval is an increasingly relevant problem with the growth of 3D shape data. Recent work on contrastive losses for learning joint embeddings over multimodal data has been successful at tasks such as retrieval and classification. Thus far, work on joint representation learning for 3D shapes and text has focused on improving embeddings through modeling of complex attention between r…
▽ More
Text-to-shape retrieval is an increasingly relevant problem with the growth of 3D shape data. Recent work on contrastive losses for learning joint embeddings over multimodal data has been successful at tasks such as retrieval and classification. Thus far, work on joint representation learning for 3D shapes and text has focused on improving embeddings through modeling of complex attention between representations, or multi-task learning. We propose a trimodal learning scheme over text, multi-view images and 3D shape voxels, and show that with large batch contrastive learning we achieve good performance on text-to-shape retrieval without complex attention mechanisms or losses. Our experiments serve as a foundation for follow-up work on building trimodal embeddings for text-image-shape.
△ Less
Submitted 27 December, 2023; v1 submitted 18 January, 2022;
originally announced January 2022.
-
Optimal Representations for Covariate Shift
Authors:
Yangjun Ruan,
Yann Dubois,
Chris J. Maddison
Abstract:
Machine learning systems often experience a distribution shift between training and testing. In this paper, we introduce a simple variational objective whose optima are exactly the set of all representations on which risk minimizers are guaranteed to be robust to any distribution shift that preserves the Bayes predictor, e.g., covariate shifts. Our objective has two components. First, a representa…
▽ More
Machine learning systems often experience a distribution shift between training and testing. In this paper, we introduce a simple variational objective whose optima are exactly the set of all representations on which risk minimizers are guaranteed to be robust to any distribution shift that preserves the Bayes predictor, e.g., covariate shifts. Our objective has two components. First, a representation must remain discriminative for the task, i.e., some predictor must be able to simultaneously minimize the source and target risk. Second, the representation's marginal support needs to be the same across source and target. We make this practical by designing self-supervised objectives that only use unlabelled data and augmentations to train robust representations. Our objectives give insights into the robustness of CLIP, and further improve CLIP's representations to achieve SOTA results on DomainBed.
△ Less
Submitted 14 March, 2022; v1 submitted 31 December, 2021;
originally announced January 2022.
-
FedSoft: Soft Clustered Federated Learning with Proximal Local Updating
Authors:
Yichen Ruan,
Carlee Joe-Wong
Abstract:
Traditionally, clustered federated learning groups clients with the same data distribution into a cluster, so that every client is uniquely associated with one data distribution and helps train a model for this distribution. We relax this hard association assumption to soft clustered federated learning, which allows every local dataset to follow a mixture of multiple source distributions. We propo…
▽ More
Traditionally, clustered federated learning groups clients with the same data distribution into a cluster, so that every client is uniquely associated with one data distribution and helps train a model for this distribution. We relax this hard association assumption to soft clustered federated learning, which allows every local dataset to follow a mixture of multiple source distributions. We propose FedSoft, which trains both locally personalized models and high-quality cluster models in this setting. FedSoft limits client workload by using proximal updates to require the completion of only one optimization task from a subset of clients in every communication round. We show, analytically and empirically, that FedSoft effectively exploits similarities between the source distributions to learn personalized and cluster models that perform well.
△ Less
Submitted 22 March, 2022; v1 submitted 11 December, 2021;
originally announced December 2021.
-
SemEval-2021 Task 4: Reading Comprehension of Abstract Meaning
Authors:
Boyuan Zheng,
Xiaoyu Yang,
Yu-Ping Ruan,
Zhenhua Ling,
Quan Liu,
Si Wei,
Xiaodan Zhu
Abstract:
This paper introduces the SemEval-2021 shared task 4: Reading Comprehension of Abstract Meaning (ReCAM). This shared task is designed to help evaluate the ability of machines in representing and understanding abstract concepts. Given a passage and the corresponding question, a participating system is expected to choose the correct answer from five candidates of abstract concepts in a cloze-style m…
▽ More
This paper introduces the SemEval-2021 shared task 4: Reading Comprehension of Abstract Meaning (ReCAM). This shared task is designed to help evaluate the ability of machines in representing and understanding abstract concepts. Given a passage and the corresponding question, a participating system is expected to choose the correct answer from five candidates of abstract concepts in a cloze-style machine reading comprehension setup. Based on two typical definitions of abstractness, i.e., the imperceptibility and nonspecificity, our task provides three subtasks to evaluate the participating models. Specifically, Subtask 1 aims to evaluate how well a system can model concepts that cannot be directly perceived in the physical world. Subtask 2 focuses on models' ability in comprehending nonspecific concepts located high in a hypernym hierarchy given the context of a passage. Subtask 3 aims to provide some insights into models' generalizability over the two types of abstractness. During the SemEval-2021 official evaluation period, we received 23 submissions to Subtask 1 and 28 to Subtask 2. The participating teams additionally made 29 submissions to Subtask 3. The leaderboard and competition website can be found at https://competitions.codalab.org/competitions/26153. The data and baseline code are available at https://github.com/boyuanzheng010/SemEval2021-Reading-Comprehension-of-Abstract-Meaning.
△ Less
Submitted 1 June, 2021; v1 submitted 31 May, 2021;
originally announced May 2021.
-
Emotion-Regularized Conditional Variational Autoencoder for Emotional Response Generation
Authors:
Yu-Ping Ruan,
Zhen-Hua Ling
Abstract:
This paper presents an emotion-regularized conditional variational autoencoder (Emo-CVAE) model for generating emotional conversation responses. In conventional CVAE-based emotional response generation, emotion labels are simply used as additional conditions in prior, posterior and decoder networks. Considering that emotion styles are naturally entangled with semantic contents in the language spac…
▽ More
This paper presents an emotion-regularized conditional variational autoencoder (Emo-CVAE) model for generating emotional conversation responses. In conventional CVAE-based emotional response generation, emotion labels are simply used as additional conditions in prior, posterior and decoder networks. Considering that emotion styles are naturally entangled with semantic contents in the language space, the Emo-CVAE model utilizes emotion labels to regularize the CVAE latent space by introducing an extra emotion prediction network. In the training stage, the estimated latent variables are required to predict the emotion labels and token sequences of the input responses simultaneously. Experimental results show that our Emo-CVAE model can learn a more informative and structured latent space than a conventional CVAE model and output responses with better content and emotion performance than baseline CVAE and sequence-to-sequence (Seq2Seq) models.
△ Less
Submitted 18 April, 2021;
originally announced April 2021.
-
Improving Lossless Compression Rates via Monte Carlo Bits-Back Coding
Authors:
Yangjun Ruan,
Karen Ullrich,
Daniel Severo,
James Townsend,
Ashish Khisti,
Arnaud Doucet,
Alireza Makhzani,
Chris J. Maddison
Abstract:
Latent variable models have been successfully applied in lossless compression with the bits-back coding algorithm. However, bits-back suffers from an increase in the bitrate equal to the KL divergence between the approximate posterior and the true posterior. In this paper, we show how to remove this gap asymptotically by deriving bits-back coding algorithms from tighter variational bounds. The key…
▽ More
Latent variable models have been successfully applied in lossless compression with the bits-back coding algorithm. However, bits-back suffers from an increase in the bitrate equal to the KL divergence between the approximate posterior and the true posterior. In this paper, we show how to remove this gap asymptotically by deriving bits-back coding algorithms from tighter variational bounds. The key idea is to exploit extended space representations of Monte Carlo estimators of the marginal likelihood. Naively applied, our schemes would require more initial bits than the standard bits-back coder, but we show how to drastically reduce this additional cost with couplings in the latent space. When parallel architectures can be exploited, our coders can achieve better rates than bits-back with little additional cost. We demonstrate improved lossless compression rates in a variety of settings, especially in out-of-distribution or sequential data compression.
△ Less
Submitted 14 June, 2021; v1 submitted 22 February, 2021;
originally announced February 2021.
-
Linear Bandits with Limited Adaptivity and Learning Distributional Optimal Design
Authors:
Yufei Ruan,
Jiaqi Yang,
Yuan Zhou
Abstract:
Motivated by practical needs such as large-scale learning, we study the impact of adaptivity constraints to linear contextual bandits, a central problem in online active learning. We consider two popular limited adaptivity models in literature: batch learning and rare policy switches. We show that, when the context vectors are adversarially chosen in $d$-dimensional linear contextual bandits, the…
▽ More
Motivated by practical needs such as large-scale learning, we study the impact of adaptivity constraints to linear contextual bandits, a central problem in online active learning. We consider two popular limited adaptivity models in literature: batch learning and rare policy switches. We show that, when the context vectors are adversarially chosen in $d$-dimensional linear contextual bandits, the learner needs $O(d \log d \log T)$ policy switches to achieve the minimax-optimal regret, and this is optimal up to $\mathrm{poly}(\log d, \log \log T)$ factors; for stochastic context vectors, even in the more restricted batch learning model, only $O(\log \log T)$ batches are needed to achieve the optimal regret. Together with the known results in literature, our results present a complete picture about the adaptivity constraints in linear contextual bandits. Along the way, we propose the distributional optimal design, a natural extension of the optimal experiment design, and provide a both statistically and computationally efficient learning algorithm for the problem, which may be of independent interest.
△ Less
Submitted 23 April, 2021; v1 submitted 3 July, 2020;
originally announced July 2020.
-
Towards Flexible Device Participation in Federated Learning
Authors:
Yichen Ruan,
Xiaoxi Zhang,
Shu-Che Liang,
Carlee Joe-Wong
Abstract:
Traditional federated learning algorithms impose strict requirements on the participation rates of devices, which limit the potential reach of federated learning. This paper extends the current learning paradigm to include devices that may become inactive, compute incomplete updates, and depart or arrive in the middle of training. We derive analytical results to illustrate how allowing more flexib…
▽ More
Traditional federated learning algorithms impose strict requirements on the participation rates of devices, which limit the potential reach of federated learning. This paper extends the current learning paradigm to include devices that may become inactive, compute incomplete updates, and depart or arrive in the middle of training. We derive analytical results to illustrate how allowing more flexible device participation can affect the learning convergence when data is not independently and identically distributed (non-IID). We then propose a new federated aggregation scheme that converges even when devices may be inactive or return incomplete updates. We also study how the learning process can adapt to early departures or late arrivals, and analyze their impacts on the convergence.
△ Less
Submitted 25 February, 2021; v1 submitted 12 June, 2020;
originally announced June 2020.