-
What Makes a Good Natural Language Prompt?
Authors:
Do Xuan Long,
Duy Dinh,
Ngoc-Hai Nguyen,
Kenji Kawaguchi,
Nancy F. Chen,
Shafiq Joty,
Min-Yen Kan
Abstract:
As large language models (LLMs) have progressed towards more human-like and human--AI communications have become prevalent, prompting has emerged as a decisive component. However, there is limited conceptual consensus on what exactly quantifies natural language prompts. We attempt to address this question by conducting a meta-analysis surveying more than 150 prompting-related papers from leading N…
▽ More
As large language models (LLMs) have progressed towards more human-like and human--AI communications have become prevalent, prompting has emerged as a decisive component. However, there is limited conceptual consensus on what exactly quantifies natural language prompts. We attempt to address this question by conducting a meta-analysis surveying more than 150 prompting-related papers from leading NLP and AI conferences from 2022 to 2025 and blogs. We propose a property- and human-centric framework for evaluating prompt quality, encompassing 21 properties categorized into six dimensions. We then examine how existing studies assess their impact on LLMs, revealing their imbalanced support across models and tasks, and substantial research gaps. Further, we analyze correlations among properties in high-quality natural language prompts, deriving prompting recommendations. We then empirically explore multi-property prompt enhancements in reasoning tasks, observing that single-property enhancements often have the greatest impact. Finally, we discover that instruction-tuning on property-enhanced prompts can result in better reasoning models. Our findings establish a foundation for property-centric prompt evaluation and optimization, bridging the gaps between human--AI communication and opening new prompting research directions.
△ Less
Submitted 7 June, 2025;
originally announced June 2025.
-
Qwen3 Embedding: Advancing Text Embedding and Reranking Through Foundation Models
Authors:
Yanzhao Zhang,
Mingxin Li,
Dingkun Long,
Xin Zhang,
Huan Lin,
Baosong Yang,
Pengjun Xie,
An Yang,
Dayiheng Liu,
Junyang Lin,
Fei Huang,
Jingren Zhou
Abstract:
In this work, we introduce the Qwen3 Embedding series, a significant advancement over its predecessor, the GTE-Qwen series, in text embedding and reranking capabilities, built upon the Qwen3 foundation models. Leveraging the Qwen3 LLMs' robust capabilities in multilingual text understanding and generation, our innovative multi-stage training pipeline combines large-scale unsupervised pre-training…
▽ More
In this work, we introduce the Qwen3 Embedding series, a significant advancement over its predecessor, the GTE-Qwen series, in text embedding and reranking capabilities, built upon the Qwen3 foundation models. Leveraging the Qwen3 LLMs' robust capabilities in multilingual text understanding and generation, our innovative multi-stage training pipeline combines large-scale unsupervised pre-training with supervised fine-tuning on high-quality datasets. Effective model merging strategies further ensure the robustness and adaptability of the Qwen3 Embedding series. During the training process, the Qwen3 LLMs serve not only as backbone models but also play a crucial role in synthesizing high-quality, rich, and diverse training data across multiple domains and languages, thus enhancing the training pipeline. The Qwen3 Embedding series offers a spectrum of model sizes (0.6B, 4B, 8B) for both embedding and reranking tasks, addressing diverse deployment scenarios where users can optimize for either efficiency or effectiveness. Empirical evaluations demonstrate that the Qwen3 Embedding series achieves state-of-the-art results across diverse benchmarks. Notably, it excels on the multilingual evaluation benchmark MTEB for text embedding, as well as in various retrieval tasks, including code retrieval, cross-lingual retrieval and multilingual retrieval. To facilitate reproducibility and promote community-driven research and development, the Qwen3 Embedding models are publicly available under the Apache 2.0 license.
△ Less
Submitted 10 June, 2025; v1 submitted 5 June, 2025;
originally announced June 2025.
-
A Graph-Retrieval-Augmented Generation Framework Enhances Decision-Making in the Circular Economy
Authors:
Yang Zhao,
Chengxiao Dai,
Dusit Niyato,
Chuan Fu Tan,
Keyi Xiang,
Yueyang Wang,
Zhiquan Yeo,
Daren Tan Zong Loong,
Jonathan Low Zhaozhi,
Eugene H. Z. HO
Abstract:
Large language models (LLMs) hold promise for sustainable manufacturing, but often hallucinate industrial codes and emission factors, undermining regulatory and investment decisions. We introduce CircuGraphRAG, a retrieval-augmented generation (RAG) framework that grounds LLMs outputs in a domain-specific knowledge graph for the circular economy. This graph connects 117,380 industrial and waste en…
▽ More
Large language models (LLMs) hold promise for sustainable manufacturing, but often hallucinate industrial codes and emission factors, undermining regulatory and investment decisions. We introduce CircuGraphRAG, a retrieval-augmented generation (RAG) framework that grounds LLMs outputs in a domain-specific knowledge graph for the circular economy. This graph connects 117,380 industrial and waste entities with classification codes and GWP100 emission data, enabling structured multi-hop reasoning. Natural language queries are translated into SPARQL and verified subgraphs are retrieved to ensure accuracy and traceability. Compared with Standalone LLMs and Naive RAG, CircuGraphRAG achieves superior performance in single-hop and multi-hop question answering, with ROUGE-L F1 scores up to 1.0, while baseline scores below 0.08. It also improves efficiency, halving the response time and reducing token usage by 16% in representative tasks. CircuGraphRAG provides fact-checked, regulatory-ready support for circular economy planning, advancing reliable, low-carbon resource decision making.
△ Less
Submitted 1 June, 2025;
originally announced June 2025.
-
Beyond In-Context Learning: Aligning Long-form Generation of Large Language Models via Task-Inherent Attribute Guidelines
Authors:
Do Xuan Long,
Duong Ngoc Yen,
Do Xuan Trong,
Luu Anh Tuan,
Kenji Kawaguchi,
Shafiq Joty,
Min-Yen Kan,
Nancy F. Chen
Abstract:
In-context learning (ICL) is an important yet not fully understood ability of pre-trained large language models (LLMs). It can greatly enhance task performance using a few examples, termed demonstrations, without fine-tuning. Although effective in question answering, ICL often underperforms in long-form generation tasks such as summarization. Under appropriately realistic assumptions, we empirical…
▽ More
In-context learning (ICL) is an important yet not fully understood ability of pre-trained large language models (LLMs). It can greatly enhance task performance using a few examples, termed demonstrations, without fine-tuning. Although effective in question answering, ICL often underperforms in long-form generation tasks such as summarization. Under appropriately realistic assumptions, we empirically and theoretically show that ICL demonstrations alone are insufficient to teach LLMs the task language and format distributions for generation. We argue for explicit exposure to the task distributions and hypothesize that defining them by prompting enhances model performance. To this end, we present LongGuide, which efficiently generates two parallel streams of guidelines capturing task language and format properties: (i) Metric Guidelines (MGs) that instruct models to optimize self-evaluated metrics; and (ii) Output Constraint Guidelines (OCGs) that constrain generation at both token and sentence levels. LongGuide automatically selects the best combination of guidelines, improving both strong open- and closed-source LLMs by over 5% in both zero- and few-shot settings. We show that LongGuide is generalizable, learnable by weak models to enhance strong ones, and integrates synergistically with automatic prompt optimizers.
△ Less
Submitted 1 June, 2025;
originally announced June 2025.
-
Breaking the Modality Barrier: Universal Embedding Learning with Multimodal LLMs
Authors:
Tiancheng Gu,
Kaicheng Yang,
Ziyong Feng,
Xingjun Wang,
Yanzhao Zhang,
Dingkun Long,
Yingda Chen,
Weidong Cai,
Jiankang Deng
Abstract:
The Contrastive Language-Image Pre-training (CLIP) framework has become a widely used approach for multimodal representation learning, particularly in image-text retrieval and clustering. However, its efficacy is constrained by three key limitations: (1) text token truncation, (2) isolated image-text encoding, and (3) deficient compositionality due to bag-of-words behavior. While recent Multimodal…
▽ More
The Contrastive Language-Image Pre-training (CLIP) framework has become a widely used approach for multimodal representation learning, particularly in image-text retrieval and clustering. However, its efficacy is constrained by three key limitations: (1) text token truncation, (2) isolated image-text encoding, and (3) deficient compositionality due to bag-of-words behavior. While recent Multimodal Large Language Models (MLLMs) have demonstrated significant advances in generalized vision-language understanding, their potential for learning transferable multimodal representations remains underexplored.In this work, we present UniME (Universal Multimodal Embedding), a novel two-stage framework that leverages MLLMs to learn discriminative representations for diverse downstream tasks. In the first stage, we perform textual discriminative knowledge distillation from a powerful LLM-based teacher model to enhance the embedding capability of the MLLMś language component. In the second stage, we introduce hard negative enhanced instruction tuning to further advance discriminative representation learning. Specifically, we initially mitigate false negative contamination and then sample multiple hard negatives per instance within each batch, forcing the model to focus on challenging samples. This approach not only improves discriminative power but also enhances instruction-following ability in downstream tasks. We conduct extensive experiments on the MMEB benchmark and multiple retrieval tasks, including short and long caption retrieval and compositional retrieval. Results demonstrate that UniME achieves consistent performance improvement across all tasks, exhibiting superior discriminative and compositional capabilities.
△ Less
Submitted 24 April, 2025;
originally announced April 2025.
-
A Survey of Frontiers in LLM Reasoning: Inference Scaling, Learning to Reason, and Agentic Systems
Authors:
Zixuan Ke,
Fangkai Jiao,
Yifei Ming,
Xuan-Phi Nguyen,
Austin Xu,
Do Xuan Long,
Minzhi Li,
Chengwei Qin,
Peifeng Wang,
Silvio Savarese,
Caiming Xiong,
Shafiq Joty
Abstract:
Reasoning is a fundamental cognitive process that enables logical inference, problem-solving, and decision-making. With the rapid advancement of large language models (LLMs), reasoning has emerged as a key capability that distinguishes advanced AI systems from conventional models that empower chatbots. In this survey, we categorize existing methods along two orthogonal dimensions: (1) Regimes, whi…
▽ More
Reasoning is a fundamental cognitive process that enables logical inference, problem-solving, and decision-making. With the rapid advancement of large language models (LLMs), reasoning has emerged as a key capability that distinguishes advanced AI systems from conventional models that empower chatbots. In this survey, we categorize existing methods along two orthogonal dimensions: (1) Regimes, which define the stage at which reasoning is achieved (either at inference time or through dedicated training); and (2) Architectures, which determine the components involved in the reasoning process, distinguishing between standalone LLMs and agentic compound systems that incorporate external tools, and multi-agent collaborations. Within each dimension, we analyze two key perspectives: (1) Input level, which focuses on techniques that construct high-quality prompts that the LLM condition on; and (2) Output level, which methods that refine multiple sampled candidates to enhance reasoning quality. This categorization provides a systematic understanding of the evolving landscape of LLM reasoning, highlighting emerging trends such as the shift from inference-scaling to learning-to-reason (e.g., DeepSeek-R1), and the transition to agentic workflows (e.g., OpenAI Deep Research, Manus Agent). Additionally, we cover a broad spectrum of learning algorithms, from supervised fine-tuning to reinforcement learning such as PPO and GRPO, and the training of reasoners and verifiers. We also examine key designs of agentic workflows, from established patterns like generator-evaluator and LLM debate to recent innovations. ...
△ Less
Submitted 11 April, 2025;
originally announced April 2025.
-
Making the unmodulated pyramid wavefront sensor smart II. First on-sky demonstration of extreme adaptive optics with deep learning
Authors:
R. Landman,
S. Y. Haffert,
J. D. Long,
J. R. Males,
L. M. Close,
W. B. Foster,
K. Van Gorkom,
O. Guyon,
A. D. Hedglen,
P. T. Johnson,
M. Y. Kautz,
J. K. Kueny,
J. Li,
J. Liberman,
J. Lumbres,
E. A. McEwen,
A. McLeod,
L. Schatz,
E. Tonucci,
K. Twitchell
Abstract:
Pyramid wavefront sensors (PWFSs) are the preferred choice for current and future extreme adaptive optics (XAO) systems. Almost all instruments use the PWFS in its modulated form to mitigate its limited linearity range. However, this modulation comes at the cost of a reduction in sensitivity, a blindness to petal-piston modes, and a limit to the sensor's ability to operate at high speeds. Therefor…
▽ More
Pyramid wavefront sensors (PWFSs) are the preferred choice for current and future extreme adaptive optics (XAO) systems. Almost all instruments use the PWFS in its modulated form to mitigate its limited linearity range. However, this modulation comes at the cost of a reduction in sensitivity, a blindness to petal-piston modes, and a limit to the sensor's ability to operate at high speeds. Therefore, there is strong interest to use the PWFS without modulation, which can be enabled with nonlinear reconstructors. Here, we present the first on-sky demonstration of XAO with an unmodulated PWFS using a nonlinear reconstructor based on convolutional neural networks. We discuss the real-time implementation on the Magellan Adaptive Optics eXtreme (MagAO-X) instrument using the optimized TensorRT framework and show that inference is fast enough to run the control loop at >2 kHz frequencies. Our on-sky results demonstrate a successful closed-loop operation using a model calibrated with internal source data that delivers stable and robust correction under varying conditions. Performance analysis reveals that our smart PWFS achieves nearly the same Strehl ratio as the highly optimized modulated PWFS under favorable conditions on bright stars. Notably, we observe an improvement in performance on a fainter star under the influence of strong winds. These findings confirm the feasibility of using the PWFS in its unmodulated form and highlight its potential for next-generation instruments. Future efforts will focus on achieving even higher control loop frequencies (>3 kHz), optimizing the calibration procedures, and testing its performance on fainter stars, where more gain is expected for the unmodulated PWFS compared to its modulated counterpart.
△ Less
Submitted 20 March, 2025;
originally announced March 2025.
-
Spatio-temporal Fourier Transformer (StFT) for Long-term Dynamics Prediction
Authors:
Da Long,
Shandian Zhe,
Samuel Williams,
Leonid Oliker,
Zhe Bai
Abstract:
Simulating the long-term dynamics of multi-scale and multi-physics systems poses a significant challenge in understanding complex phenomena across science and engineering. The complexity arises from the intricate interactions between scales and the interplay of diverse physical processes. Neural operators have emerged as promising models for predicting such dynamics due to their flexibility and co…
▽ More
Simulating the long-term dynamics of multi-scale and multi-physics systems poses a significant challenge in understanding complex phenomena across science and engineering. The complexity arises from the intricate interactions between scales and the interplay of diverse physical processes. Neural operators have emerged as promising models for predicting such dynamics due to their flexibility and computational efficiency. However, they often fail to effectively capture multi-scale interactions or quantify the uncertainties inherent in the predictions. These limitations lead to rapid error accumulation, particularly in long-term forecasting of systems characterized by complex and coupled dynamics. To address these challenges, we propose a spatio-temporal Fourier transformer (StFT), in which each transformer block is designed to learn dynamics at a specific scale. By leveraging a structured hierarchy of StFT blocks, the model explicitly captures dynamics across both macro- and micro- spatial scales. Furthermore, a generative residual correction mechanism is integrated to estimate and mitigate predictive uncertainties, enhancing both the accuracy and reliability of long-term forecasts. Evaluations conducted on three benchmark datasets (plasma, fluid, and atmospheric dynamics) demonstrate the advantages of our approach over state-of-the-art ML methods.
△ Less
Submitted 14 March, 2025;
originally announced March 2025.
-
Fragility-aware Classification for Understanding Risk and Improving Generalization
Authors:
Chen Yang,
Zheng Cui,
Daniel Zhuoyu Long,
Jin Qi,
Ruohan Zhan
Abstract:
Classification models play a critical role in data-driven decision-making applications such as medical diagnosis, user profiling, recommendation systems, and default detection. Traditional performance metrics, such as accuracy, focus on overall error rates but fail to account for the confidence of incorrect predictions, thereby overlooking the risk of confident misjudgments. This risk is particula…
▽ More
Classification models play a critical role in data-driven decision-making applications such as medical diagnosis, user profiling, recommendation systems, and default detection. Traditional performance metrics, such as accuracy, focus on overall error rates but fail to account for the confidence of incorrect predictions, thereby overlooking the risk of confident misjudgments. This risk is particularly significant in cost-sensitive and safety-critical domains like medical diagnosis and autonomous driving, where overconfident false predictions may cause severe consequences. To address this issue, we introduce the Fragility Index (FI), a novel metric that evaluates classification performance from a risk-averse perspective by explicitly capturing the tail risk of confident misjudgments. To enhance generalizability, we define FI within the robust satisficing (RS) framework, incorporating data uncertainty. We further develop a model training approach that optimizes FI while maintaining tractability for common loss functions. Specifically, we derive exact reformulations for cross-entropy loss, hinge-type loss, and Lipschitz loss, and extend the approach to deep learning models. Through synthetic experiments and real-world medical diagnosis tasks, we demonstrate that FI effectively identifies misjudgment risk and FI-based training improves model robustness and generalizability. Finally, we extend our framework to deep neural network training, further validating its effectiveness in enhancing deep learning models.
△ Less
Submitted 18 February, 2025;
originally announced February 2025.
-
Towards Text-Image Interleaved Retrieval
Authors:
Xin Zhang,
Ziqi Dai,
Yongqi Li,
Yanzhao Zhang,
Dingkun Long,
Pengjun Xie,
Meishan Zhang,
Jun Yu,
Wenjie Li,
Min Zhang
Abstract:
Current multimodal information retrieval studies mainly focus on single-image inputs, which limits real-world applications involving multiple images and text-image interleaved content. In this work, we introduce the text-image interleaved retrieval (TIIR) task, where the query and document are interleaved text-image sequences, and the model is required to understand the semantics from the interlea…
▽ More
Current multimodal information retrieval studies mainly focus on single-image inputs, which limits real-world applications involving multiple images and text-image interleaved content. In this work, we introduce the text-image interleaved retrieval (TIIR) task, where the query and document are interleaved text-image sequences, and the model is required to understand the semantics from the interleaved context for effective retrieval. We construct a TIIR benchmark based on naturally interleaved wikiHow tutorials, where a specific pipeline is designed to generate interleaved queries. To explore the task, we adapt several off-the-shelf retrievers and build a dense baseline by interleaved multimodal large language model (MLLM). We then propose a novel Matryoshka Multimodal Embedder (MME), which compresses the number of visual tokens at different granularity, to address the challenge of excessive visual tokens in MLLM-based TIIR models. Experiments demonstrate that simple adaption of existing models does not consistently yield effective results. Our MME achieves significant improvements over the baseline by substantially fewer visual tokens. We provide extensive analysis and will release the dataset and code to facilitate future research.
△ Less
Submitted 18 February, 2025;
originally announced February 2025.
-
Pseudo-Physics-Informed Neural Operators: Enhancing Operator Learning from Limited Data
Authors:
Keyan Chen,
Yile Li,
Da Long,
Zhitong Xu,
Wei Xing,
Jacob Hochhalter,
Shandian Zhe
Abstract:
Neural operators have shown great potential in surrogate modeling. However, training a well-performing neural operator typically requires a substantial amount of data, which can pose a major challenge in complex applications. In such scenarios, detailed physical knowledge can be unavailable or difficult to obtain, and collecting extensive data is often prohibitively expensive. To mitigate this cha…
▽ More
Neural operators have shown great potential in surrogate modeling. However, training a well-performing neural operator typically requires a substantial amount of data, which can pose a major challenge in complex applications. In such scenarios, detailed physical knowledge can be unavailable or difficult to obtain, and collecting extensive data is often prohibitively expensive. To mitigate this challenge, we propose the Pseudo Physics-Informed Neural Operator (PPI-NO) framework. PPI-NO constructs a surrogate physics system for the target system using partial differential equations (PDEs) derived from simple, rudimentary physics principles, such as basic differential operators. This surrogate system is coupled with a neural operator model, using an alternating update and learning process to iteratively enhance the model's predictive power. While the physics derived via PPI-NO may not mirror the ground-truth underlying physical laws -- hence the term ``pseudo physics'' -- this approach significantly improves the accuracy of standard operator learning models in data-scarce scenarios, which is evidenced by extensive evaluations across five benchmark tasks and a fatigue modeling application.
△ Less
Submitted 4 February, 2025;
originally announced February 2025.
-
Host-guided data placement: whose job is it anyway?
Authors:
Devashish R. Purandare,
Peter Alvaro,
Avani Wildani,
Darrell D. E. Long,
Ethan L. Miller
Abstract:
The increasing demand for SSDs coupled with scaling difficulties have left manufacturers scrambling for newer SSD interfaces which promise better performance and durability. While these interfaces reduce the rigidity of traditional abstractions, they require application or system-level changes that can impact the stability, security, and portability of systems. To make matters worse, such changes…
▽ More
The increasing demand for SSDs coupled with scaling difficulties have left manufacturers scrambling for newer SSD interfaces which promise better performance and durability. While these interfaces reduce the rigidity of traditional abstractions, they require application or system-level changes that can impact the stability, security, and portability of systems. To make matters worse, such changes are rendered futile with introduction of next-generation interfaces. Further, there is little guidance on data placement and hardware specifics are often abstracted from the application layer. It is no surprise therefore that such interfaces have seen limited adoption, leaving behind a graveyard of experimental interfaces ranging from open-channel SSDs to zoned namespaces.
In this paper, we show how shim layers can to shield systems from changing hardware interfaces while benefiting from them. We present Reshim, an all-userspace shim layer that performs affinity and lifetime based data placement with no change to the operating system or the application. We demonstrate Reshim's ease of adoption with host-device coordination for three widely-used data-intensive systems: RocksDB, MongoDB, and CacheLib. With Reshim, these systems see 2-6 times highe write throughput, up to 6 times lower latency, and reduced write amplification compared to filesystems like F2FS. Reshim performs on par with application-specific backends like ZenFS while offering more generality, lower latency, and richer data placement. With Reshim we demonstrate the value of isolating the complexity of the placement logic, allowing easy deployment of dynamic placement rules across several applications and storage interfaces.
△ Less
Submitted 1 January, 2025;
originally announced January 2025.
-
GME: Improving Universal Multimodal Retrieval by Multimodal LLMs
Authors:
Xin Zhang,
Yanzhao Zhang,
Wen Xie,
Mingxin Li,
Ziqi Dai,
Dingkun Long,
Pengjun Xie,
Meishan Zhang,
Wenjie Li,
Min Zhang
Abstract:
Universal Multimodal Retrieval (UMR) aims to enable search across various modalities using a unified model, where queries and candidates can consist of pure text, images, or a combination of both. Previous work has attempted to adopt multimodal large language models (MLLMs) to realize UMR using only text data. However, our preliminary experiments demonstrate that more diverse multimodal training d…
▽ More
Universal Multimodal Retrieval (UMR) aims to enable search across various modalities using a unified model, where queries and candidates can consist of pure text, images, or a combination of both. Previous work has attempted to adopt multimodal large language models (MLLMs) to realize UMR using only text data. However, our preliminary experiments demonstrate that more diverse multimodal training data can further unlock the potential of MLLMs. Despite its effectiveness, the existing multimodal training data is highly imbalanced in terms of modality, which motivates us to develop a training data synthesis pipeline and construct a large-scale, high-quality fused-modal training dataset. Based on the synthetic training data, we develop the General Multimodal Embedder (GME), an MLLM-based dense retriever designed for UMR. Furthermore, we construct a comprehensive UMR Benchmark (UMRB) to evaluate the effectiveness of our approach. Experimental results show that our method achieves state-of-the-art performance among existing UMR methods. Last, we provide in-depth analyses of model scaling and training strategies, and perform ablation studies on both the model and synthetic data.
△ Less
Submitted 1 April, 2025; v1 submitted 21 December, 2024;
originally announced December 2024.
-
When Text Embedding Meets Large Language Model: A Comprehensive Survey
Authors:
Zhijie Nie,
Zhangchi Feng,
Mingxin Li,
Cunwang Zhang,
Yanzhao Zhang,
Dingkun Long,
Richong Zhang
Abstract:
Text embedding has become a foundational technology in natural language processing (NLP) during the deep learning era, driving advancements across a wide array of downstream tasks. While many natural language understanding challenges can now be modeled using generative paradigms and leverage the robust generative and comprehension capabilities of large language models (LLMs), numerous practical ap…
▽ More
Text embedding has become a foundational technology in natural language processing (NLP) during the deep learning era, driving advancements across a wide array of downstream tasks. While many natural language understanding challenges can now be modeled using generative paradigms and leverage the robust generative and comprehension capabilities of large language models (LLMs), numerous practical applications - such as semantic matching, clustering, and information retrieval - continue to rely on text embeddings for their efficiency and effectiveness. Therefore, integrating LLMs with text embeddings has become a major research focus in recent years. In this survey, we categorize the interplay between LLMs and text embeddings into three overarching themes: (1) LLM-augmented text embedding, enhancing traditional embedding methods with LLMs; (2) LLMs as text embedders, adapting their innate capabilities for high-quality embedding; and (3) Text embedding understanding with LLMs, leveraging LLMs to analyze and interpret embeddings. By organizing recent works based on interaction patterns rather than specific downstream applications, we offer a novel and systematic overview of contributions from various research and application domains in the era of LLMs. Furthermore, we highlight the unresolved challenges that persisted in the pre-LLM era with pre-trained language models (PLMs) and explore the emerging obstacles brought forth by LLMs. Building on this analysis, we outline prospective directions for the evolution of text embedding, addressing both theoretical and practical opportunities in the rapidly advancing landscape of NLP.
△ Less
Submitted 20 March, 2025; v1 submitted 12 December, 2024;
originally announced December 2024.
-
Multi-expert Prompting Improves Reliability, Safety, and Usefulness of Large Language Models
Authors:
Do Xuan Long,
Duong Ngoc Yen,
Anh Tuan Luu,
Kenji Kawaguchi,
Min-Yen Kan,
Nancy F. Chen
Abstract:
We present Multi-expert Prompting, a novel enhancement of ExpertPrompting (Xu et al., 2023), designed to improve the large language model (LLM) generation. Specifically, it guides an LLM to fulfill an input instruction by simulating multiple experts, aggregating their responses, and selecting the best among individual and aggregated responses. This process is performed in a single chain of thought…
▽ More
We present Multi-expert Prompting, a novel enhancement of ExpertPrompting (Xu et al., 2023), designed to improve the large language model (LLM) generation. Specifically, it guides an LLM to fulfill an input instruction by simulating multiple experts, aggregating their responses, and selecting the best among individual and aggregated responses. This process is performed in a single chain of thoughts through our seven carefully designed subtasks derived from the Nominal Group Technique (Ven and Delbecq, 1974), a well-established decision-making framework. Our evaluations demonstrate that Multi-expert Prompting significantly outperforms ExpertPrompting and comparable baselines in enhancing the truthfulness, factuality, informativeness, and usefulness of responses while reducing toxicity and hurtfulness. It further achieves state-of-the-art truthfulness by outperforming the best baseline by 8.69% with ChatGPT. Multi-expert Prompting is efficient, explainable, and highly adaptable to diverse scenarios, eliminating the need for manual prompt construction.
△ Less
Submitted 1 November, 2024;
originally announced November 2024.
-
Improving General Text Embedding Model: Tackling Task Conflict and Data Imbalance through Model Merging
Authors:
Mingxin Li,
Zhijie Nie,
Yanzhao Zhang,
Dingkun Long,
Richong Zhang,
Pengjun Xie
Abstract:
Text embeddings are vital for tasks such as text retrieval and semantic textual similarity (STS). Recently, the advent of pretrained language models, along with unified benchmarks like the Massive Text Embedding Benchmark (MTEB), has facilitated the development of versatile general-purpose text embedding models. Advanced embedding models are typically developed using large-scale multi-task data an…
▽ More
Text embeddings are vital for tasks such as text retrieval and semantic textual similarity (STS). Recently, the advent of pretrained language models, along with unified benchmarks like the Massive Text Embedding Benchmark (MTEB), has facilitated the development of versatile general-purpose text embedding models. Advanced embedding models are typically developed using large-scale multi-task data and joint training across multiple tasks. However, our experimental analysis reveals two significant drawbacks of joint training: 1) Task Conflict: Gradients from different tasks interfere with each other, leading to negative transfer. 2) Data Imbalance: Disproportionate data distribution introduces biases that negatively impact performance across tasks. To overcome these challenges, we explore model merging-a technique that combines independently trained models to mitigate gradient conflicts and balance data distribution. We introduce a novel method, Self Positioning, which efficiently searches for optimal model combinations within the interpolation space of task vectors using stochastic gradient descent. Our experiments demonstrate that Self Positioning significantly enhances multi-task performance on the MTEB dataset, achieving an absolute improvement of 0.7 points. It outperforms traditional resampling methods while reducing computational costs. This work offers a robust approach to building generalized text embedding models with superior performance across diverse embedding-related tasks.
△ Less
Submitted 19 October, 2024;
originally announced October 2024.
-
Arbitrarily-Conditioned Multi-Functional Diffusion for Multi-Physics Emulation
Authors:
Da Long,
Zhitong Xu,
Guang Yang,
Akil Narayan,
Shandian Zhe
Abstract:
Modern physics simulation often involves multiple functions of interests, and traditional numerical approaches are known to be complex and computationally costly. While machine learning-based surrogate models can offer significant cost reductions, most focus on a single task, such as forward prediction, and typically lack uncertainty quantification -- an essential component in many applications. T…
▽ More
Modern physics simulation often involves multiple functions of interests, and traditional numerical approaches are known to be complex and computationally costly. While machine learning-based surrogate models can offer significant cost reductions, most focus on a single task, such as forward prediction, and typically lack uncertainty quantification -- an essential component in many applications. To overcome these limitations, we propose Arbitrarily-Conditioned Multi-Functional Diffusion (ACM-FD), a versatile probabilistic surrogate model for multi-physics emulation. ACM-FD can perform a wide range of tasks within a single framework, including forward prediction, various inverse problems, and simulating data for entire systems or subsets of quantities conditioned on others. Specifically, we extend the standard Denoising Diffusion Probabilistic Model (DDPM) for multi-functional generation by modeling noise as Gaussian processes (GP). We propose a random-mask based, zero-regularized denoising loss to achieve flexible and robust conditional generation. We induce a Kronecker product structure in the GP covariance matrix, substantially reducing the computational cost and enabling efficient training and sampling. We demonstrate the effectiveness of ACM-FD across several fundamental multi-physics systems.
△ Less
Submitted 6 June, 2025; v1 submitted 17 October, 2024;
originally announced October 2024.
-
Toward Efficient Kernel-Based Solvers for Nonlinear PDEs
Authors:
Zhitong Xu,
Da Long,
Yiming Xu,
Guang Yang,
Shandian Zhe,
Houman Owhadi
Abstract:
We introduce a novel kernel learning framework toward efficiently solving nonlinear partial differential equations (PDEs). In contrast to the state-of-the-art kernel solver that embeds differential operators within kernels, posing challenges with a large number of collocation points, our approach eliminates these operators from the kernel. We model the solution using a standard kernel interpolatio…
▽ More
We introduce a novel kernel learning framework toward efficiently solving nonlinear partial differential equations (PDEs). In contrast to the state-of-the-art kernel solver that embeds differential operators within kernels, posing challenges with a large number of collocation points, our approach eliminates these operators from the kernel. We model the solution using a standard kernel interpolation form and differentiate the interpolant to compute the derivatives. Our framework obviates the need for complex Gram matrix construction between solutions and their derivatives, allowing for a straightforward implementation and scalable computation. As an instance, we allocate the collocation points on a grid and adopt a product kernel, which yields a Kronecker product structure in the interpolation. This structure enables us to avoid computing the full Gram matrix, reducing costs and scaling efficiently to a large number of collocation points. We provide a proof of the convergence and rate analysis of our method under appropriate regularity assumptions. In numerical experiments, we demonstrate the advantages of our method in solving several benchmark PDEs.
△ Less
Submitted 5 June, 2025; v1 submitted 14 October, 2024;
originally announced October 2024.
-
Streamlined shape of cyborg cockroach promotes traversability in confined environments by gap negotiation
Authors:
Kazuki Kai,
Le Duc Long,
Hirotaka Sato
Abstract:
The centimeter-scale cyborg insects have a potential advantage for application in narrow environments where humans cannot operate. To realize such tasks, researchers have developed a small printed-circuit-board (PCB) which an insect can carry and control it. The electronic components usually remain bare on the board and the whole board is mounted on platform animals, resulting in uneven morphology…
▽ More
The centimeter-scale cyborg insects have a potential advantage for application in narrow environments where humans cannot operate. To realize such tasks, researchers have developed a small printed-circuit-board (PCB) which an insect can carry and control it. The electronic components usually remain bare on the board and the whole board is mounted on platform animals, resulting in uneven morphology of whole cyborg with sharp edges. It is well known that streamlined body shape in artificial vehicles or robots contributes to effective locomotion by reducing drag force in media. However, little is known how the entire body shape impacts on locomotor performance of cyborg insect. Here, we developed a 10 mm by 10 mm board which provided electrical stimulation via Sub-GHz communication and investigated the impact of physical arrangement of the board using Madagascar hissing cockroach. We compared the success rate of gap negotiation between the cyborg with mounted board and implanted board and found the latter outperformed the former. We demonstrated our cyborg cockroach with implanted board could follow faithfully to the locomotion command via antennal or cercal stimulation and traverse a narrow gap like air vent cover. In contrast to the conventional arrangement, our cyborg insects are suitable for application in a concealed environment.
△ Less
Submitted 9 October, 2024;
originally announced October 2024.
-
Artistic Portrait Drawing with Vector Strokes
Authors:
Yiqi Liang,
Ying Liu,
Dandan Long,
Ruihui Li
Abstract:
In this paper, we present a method, VectorPD, for converting a given human face image into a vector portrait sketch. VectorPD supports different levels of abstraction by simply controlling the number of strokes. Since vector graphics are composed of different shape primitives, it is challenging for rendering complex faces to accurately express facial details and structure. To address this, VectorP…
▽ More
In this paper, we present a method, VectorPD, for converting a given human face image into a vector portrait sketch. VectorPD supports different levels of abstraction by simply controlling the number of strokes. Since vector graphics are composed of different shape primitives, it is challenging for rendering complex faces to accurately express facial details and structure. To address this, VectorPD employs a novel two-round optimization mechanism. We first initialize the strokes with facial keypoints, and generate a basic portrait sketch by a CLIP-based Semantic Loss. Then we complete the face structure through VGG-based Structure Loss, and propose a novel Crop-based Shadow Loss to enrich the shadow details of the sketch, achieving a visually pleasing portrait sketch. Quantitative and qualitative evaluations both demonstrate that the portrait sketches generated by VectorPD can produce better visual effects than existing state-of-the-art methods, maintaining as much fidelity as possible at different levels of abstraction.
△ Less
Submitted 5 October, 2024;
originally announced October 2024.
-
MROSS: Multi-Round Region-based Optimization for Scene Sketching
Authors:
Yiqi Liang,
Ying Liu,
Dandan Long,
Ruihui Li
Abstract:
Scene sketching is to convert a scene into a simplified, abstract representation that captures the essential elements and composition of the original scene. It requires a semantic understanding of the scene and consideration of different regions within the scene. Since scenes often contain diverse visual information across various regions, such as foreground objects, background elements, and spati…
▽ More
Scene sketching is to convert a scene into a simplified, abstract representation that captures the essential elements and composition of the original scene. It requires a semantic understanding of the scene and consideration of different regions within the scene. Since scenes often contain diverse visual information across various regions, such as foreground objects, background elements, and spatial divisions, dealing with these different regions poses unique difficulties. In this paper, we define a sketch as some sets of Bézier curves because of their smooth and versatile characteristics. We optimize different regions of input scene in multiple rounds. In each optimization round, the strokes sampled from the next region can seamlessly be integrated into the sketch generated in the previous optimization round. We propose an additional stroke initialization method to ensure the integrity of the scene and the convergence of optimization. A novel CLIP-based Semantic Loss and a VGG-based Feature Loss are utilized to guide our multi-round optimization. Extensive experimental results on the quality and quantity of the generated sketches confirm the effectiveness of our method.
△ Less
Submitted 15 April, 2025; v1 submitted 5 October, 2024;
originally announced October 2024.
-
LLMs Are Biased Towards Output Formats! Systematically Evaluating and Mitigating Output Format Bias of LLMs
Authors:
Do Xuan Long,
Hai Nguyen Ngoc,
Tiviatis Sim,
Hieu Dao,
Shafiq Joty,
Kenji Kawaguchi,
Nancy F. Chen,
Min-Yen Kan
Abstract:
We present the first systematic evaluation examining format bias in performance of large language models (LLMs). Our approach distinguishes between two categories of an evaluation metric under format constraints to reliably and accurately assess performance: one measures performance when format constraints are adhered to, while the other evaluates performance regardless of constraint adherence. We…
▽ More
We present the first systematic evaluation examining format bias in performance of large language models (LLMs). Our approach distinguishes between two categories of an evaluation metric under format constraints to reliably and accurately assess performance: one measures performance when format constraints are adhered to, while the other evaluates performance regardless of constraint adherence. We then define a metric for measuring the format bias of LLMs and establish effective strategies to reduce it. Subsequently, we present our empirical format bias evaluation spanning four commonly used categories -- multiple-choice question-answer, wrapping, list, and mapping -- covering 15 widely-used formats. Our evaluation on eight generation tasks uncovers significant format bias across state-of-the-art LLMs. We further discover that improving the format-instruction following capabilities of LLMs across formats potentially reduces format bias. Based on our evaluation findings, we study prompting and fine-tuning with synthesized format data techniques to mitigate format bias. Our methods successfully reduce the variance in ChatGPT's performance among wrapping formats from 235.33 to 0.71 (%$^2$).
△ Less
Submitted 22 February, 2025; v1 submitted 16 August, 2024;
originally announced August 2024.
-
An End-to-End Model for Photo-Sharing Multi-modal Dialogue Generation
Authors:
Peiming Guo,
Sinuo Liu,
Yanzhao Zhang,
Dingkun Long,
Pengjun Xie,
Meishan Zhang,
Min Zhang
Abstract:
Photo-Sharing Multi-modal dialogue generation requires a dialogue agent not only to generate text responses but also to share photos at the proper moment. Using image text caption as the bridge, a pipeline model integrates an image caption model, a text generation model, and an image generation model to handle this complex multi-modal task. However, representing the images with text captions may l…
▽ More
Photo-Sharing Multi-modal dialogue generation requires a dialogue agent not only to generate text responses but also to share photos at the proper moment. Using image text caption as the bridge, a pipeline model integrates an image caption model, a text generation model, and an image generation model to handle this complex multi-modal task. However, representing the images with text captions may loss important visual details and information and cause error propagation in the complex dialogue system. Besides, the pipeline model isolates the three models separately because discrete image text captions hinder end-to-end gradient propagation. We propose the first end-to-end model for photo-sharing multi-modal dialogue generation, which integrates an image perceptron and an image generator with a large language model. The large language model employs the Q-Former to perceive visual images in the input end. For image generation in the output end, we propose a dynamic vocabulary transformation matrix and use straight-through and gumbel-softmax techniques to align the large language model and stable diffusion model and achieve end-to-end gradient propagation. We perform experiments on PhotoChat and DialogCC datasets to evaluate our end-to-end model. Compared with pipeline models, the end-to-end model gains state-of-the-art performances on various metrics of text and image generation. More analysis experiments also verify the effectiveness of the end-to-end model for photo-sharing multi-modal dialogue generation.
△ Less
Submitted 29 March, 2025; v1 submitted 16 August, 2024;
originally announced August 2024.
-
mGTE: Generalized Long-Context Text Representation and Reranking Models for Multilingual Text Retrieval
Authors:
Xin Zhang,
Yanzhao Zhang,
Dingkun Long,
Wen Xie,
Ziqi Dai,
Jialong Tang,
Huan Lin,
Baosong Yang,
Pengjun Xie,
Fei Huang,
Meishan Zhang,
Wenjie Li,
Min Zhang
Abstract:
We present systematic efforts in building long-context multilingual text representation model (TRM) and reranker from scratch for text retrieval. We first introduce a text encoder (base size) enhanced with RoPE and unpadding, pre-trained in a native 8192-token context (longer than 512 of previous multilingual encoders). Then we construct a hybrid TRM and a cross-encoder reranker by contrastive lea…
▽ More
We present systematic efforts in building long-context multilingual text representation model (TRM) and reranker from scratch for text retrieval. We first introduce a text encoder (base size) enhanced with RoPE and unpadding, pre-trained in a native 8192-token context (longer than 512 of previous multilingual encoders). Then we construct a hybrid TRM and a cross-encoder reranker by contrastive learning. Evaluations show that our text encoder outperforms the same-sized previous state-of-the-art XLM-R. Meanwhile, our TRM and reranker match the performance of large-sized state-of-the-art BGE-M3 models and achieve better results on long-context retrieval benchmarks. Further analysis demonstrate that our proposed models exhibit higher efficiency during both training and inference. We believe their efficiency and effectiveness could benefit various researches and industrial applications.
△ Less
Submitted 14 October, 2024; v1 submitted 28 July, 2024;
originally announced July 2024.
-
Chinese Sequence Labeling with Semi-Supervised Boundary-Aware Language Model Pre-training
Authors:
Longhui Zhang,
Dingkun Long,
Meishan Zhang,
Yanzhao Zhang,
Pengjun Xie,
Min Zhang
Abstract:
Chinese sequence labeling tasks are heavily reliant on accurate word boundary demarcation. Although current pre-trained language models (PLMs) have achieved substantial gains on these tasks, they rarely explicitly incorporate boundary information into the modeling process. An exception to this is BABERT, which incorporates unsupervised statistical boundary information into Chinese BERT's pre-train…
▽ More
Chinese sequence labeling tasks are heavily reliant on accurate word boundary demarcation. Although current pre-trained language models (PLMs) have achieved substantial gains on these tasks, they rarely explicitly incorporate boundary information into the modeling process. An exception to this is BABERT, which incorporates unsupervised statistical boundary information into Chinese BERT's pre-training objectives. Building upon this approach, we input supervised high-quality boundary information to enhance BABERT's learning, developing a semi-supervised boundary-aware PLM. To assess PLMs' ability to encode boundaries, we introduce a novel ``Boundary Information Metric'' that is both simple and effective. This metric allows comparison of different PLMs without task-specific fine-tuning. Experimental results on Chinese sequence labeling datasets demonstrate that the improved BABERT variant outperforms the vanilla version, not only on these tasks but also more broadly across a range of Chinese natural language understanding tasks. Additionally, our proposed metric offers a convenient and accurate means of evaluating PLMs' boundary awareness.
△ Less
Submitted 8 April, 2024;
originally announced April 2024.
-
Invertible Fourier Neural Operators for Tackling Both Forward and Inverse Problems
Authors:
Da Long,
Zhitong Xu,
Qiwei Yuan,
Yin Yang,
Shandian Zhe
Abstract:
Fourier Neural Operator (FNO) is a powerful and popular operator learning method. However, FNO is mainly used in forward prediction, yet a great many applications rely on solving inverse problems. In this paper, we propose an invertible Fourier Neural Operator (iFNO) for jointly tackling the forward and inverse problems. We developed a series of invertible Fourier blocks in the latent channel spac…
▽ More
Fourier Neural Operator (FNO) is a powerful and popular operator learning method. However, FNO is mainly used in forward prediction, yet a great many applications rely on solving inverse problems. In this paper, we propose an invertible Fourier Neural Operator (iFNO) for jointly tackling the forward and inverse problems. We developed a series of invertible Fourier blocks in the latent channel space to share the model parameters, exchange the information, and mutually regularize the learning for the bi-directional tasks. We integrated a variational auto-encoder to capture the intrinsic structures within the input space and to enable posterior inference so as to mitigate challenges of illposedness, data shortage, noises that are common in inverse problems. We proposed a three-step process to combine the invertible blocks and the VAE component for effective training. The evaluations on seven benchmark forward and inverse tasks have demonstrated the advantages of our approach.
△ Less
Submitted 5 May, 2025; v1 submitted 18 February, 2024;
originally announced February 2024.
-
Deep peak property learning for efficient chiral molecules ECD spectra prediction
Authors:
Hao Li,
Da Long,
Li Yuan,
Yonghong Tian,
Xinchang Wang,
Fanyang Mo
Abstract:
Chiral molecule assignation is crucial for asymmetric catalysis, functional materials, and the drug industry. The conventional approach requires theoretical calculations of electronic circular dichroism (ECD) spectra, which is time-consuming and costly. To speed up this process, we have incorporated deep learning techniques for the ECD prediction. We first set up a large-scale dataset of Chiral Mo…
▽ More
Chiral molecule assignation is crucial for asymmetric catalysis, functional materials, and the drug industry. The conventional approach requires theoretical calculations of electronic circular dichroism (ECD) spectra, which is time-consuming and costly. To speed up this process, we have incorporated deep learning techniques for the ECD prediction. We first set up a large-scale dataset of Chiral Molecular ECD spectra (CMCDS) with calculated ECD spectra. We further develop the ECDFormer model, a Transformer-based model to learn the chiral molecular representations and predict corresponding ECD spectra with improved efficiency and accuracy. Unlike other models for spectrum prediction, our ECDFormer creatively focused on peak properties rather than the whole spectrum sequence for prediction, inspired by the scenario of chiral molecule assignation. Specifically, ECDFormer predicts the peak properties, including number, position, and symbol, then renders the ECD spectra from these peak properties, which significantly outperforms other models in ECD prediction, Our ECDFormer reduces the time of acquiring ECD spectra from 1-100 hours per molecule to 1.5s.
△ Less
Submitted 7 January, 2024;
originally announced January 2024.
-
A Two-Stage Adaptation of Large Language Models for Text Ranking
Authors:
Longhui Zhang,
Yanzhao Zhang,
Dingkun Long,
Pengjun Xie,
Meishan Zhang,
Min Zhang
Abstract:
Text ranking is a critical task in information retrieval. Recent advances in pre-trained language models (PLMs), especially large language models (LLMs), present new opportunities for applying them to text ranking. While supervised fine-tuning (SFT) with ranking data has been widely explored to better align PLMs with text ranking goals, previous studies have focused primarily on encoder-only and e…
▽ More
Text ranking is a critical task in information retrieval. Recent advances in pre-trained language models (PLMs), especially large language models (LLMs), present new opportunities for applying them to text ranking. While supervised fine-tuning (SFT) with ranking data has been widely explored to better align PLMs with text ranking goals, previous studies have focused primarily on encoder-only and encoder-decoder PLMs. Research on leveraging decoder-only LLMs for text ranking remains scarce. An exception to this is RankLLaMA, which uses direct SFT to explore LLaMA's potential for text ranking. In this work, we propose a two-stage progressive paradigm to better adapt LLMs to text ranking. First, we conduct continual pre-training (CPT) of LLMs on a large weakly-supervised corpus. Second, we perform SFT, and propose an improved optimization strategy building upon RankLLaMA. Our experimental results on multiple benchmarks show that our approach outperforms previous methods in both in-domain and out-domain scenarios.
△ Less
Submitted 1 June, 2024; v1 submitted 28 November, 2023;
originally announced November 2023.
-
Aligning Large Language Models with Human Opinions through Persona Selection and Value--Belief--Norm Reasoning
Authors:
Do Xuan Long,
Kenji Kawaguchi,
Min-Yen Kan,
Nancy F. Chen
Abstract:
Reasoning and predicting human opinions with large language models (LLMs) is essential yet challenging. Current methods employ role-playing with personae but face two major issues: LLMs are sensitive to even a single irrelevant persona, skewing predictions by up to 30%, and LLMs fail to reason strategically over personae. We propose Chain-of-Opinion (COO), a simple four-step solution modeling whic…
▽ More
Reasoning and predicting human opinions with large language models (LLMs) is essential yet challenging. Current methods employ role-playing with personae but face two major issues: LLMs are sensitive to even a single irrelevant persona, skewing predictions by up to 30%, and LLMs fail to reason strategically over personae. We propose Chain-of-Opinion (COO), a simple four-step solution modeling which and how to reason with personae, inspired by the Value--Belief--Norm (VBN) theory. COO differentiates between explicit personae (demographics and ideology) and implicit personae (historical opinions), involves: (1) filtering irrelevant attributes from explicit personae, (2) ranking implicit personae into a preferential list for selecting top-k, (3) applying novel VBN reasoning to extract user environmental and personal value, belief, and norm variables for accurate and reliable predictions, and (4) iterating VBN reasoning with progressively larger lists of implicit personae to handle potential persona insufficiency. COO efficiently achieves new state-of-the-art opinion prediction via prompting with only 5 inference calls, improving prior techniques by up to 4%. Notably, fine-tuning LMs with COO data results in significantly better opinion-aligned models, by up to 23%.
△ Less
Submitted 14 December, 2024; v1 submitted 14 November, 2023;
originally announced November 2023.
-
Text Representation Distillation via Information Bottleneck Principle
Authors:
Yanzhao Zhang,
Dingkun Long,
Zehan Li,
Pengjun Xie
Abstract:
Pre-trained language models (PLMs) have recently shown great success in text representation field. However, the high computational cost and high-dimensional representation of PLMs pose significant challenges for practical applications. To make models more accessible, an effective method is to distill large models into smaller representation models. In order to relieve the issue of performance degr…
▽ More
Pre-trained language models (PLMs) have recently shown great success in text representation field. However, the high computational cost and high-dimensional representation of PLMs pose significant challenges for practical applications. To make models more accessible, an effective method is to distill large models into smaller representation models. In order to relieve the issue of performance degradation after distillation, we propose a novel Knowledge Distillation method called IBKD. This approach is motivated by the Information Bottleneck principle and aims to maximize the mutual information between the final representation of the teacher and student model, while simultaneously reducing the mutual information between the student model's representation and the input data. This enables the student model to preserve important learned information while avoiding unnecessary information, thus reducing the risk of over-fitting. Empirical studies on two main downstream applications of text representation (Semantic Textual Similarity and Dense Retrieval tasks) demonstrate the effectiveness of our proposed approach.
△ Less
Submitted 9 November, 2023;
originally announced November 2023.
-
Solving High Frequency and Multi-Scale PDEs with Gaussian Processes
Authors:
Shikai Fang,
Madison Cooley,
Da Long,
Shibo Li,
Robert Kirby,
Shandian Zhe
Abstract:
Machine learning based solvers have garnered much attention in physical simulation and scientific computing, with a prominent example, physics-informed neural networks (PINNs). However, PINNs often struggle to solve high-frequency and multi-scale PDEs, which can be due to spectral bias during neural network training. To address this problem, we resort to the Gaussian process (GP) framework. To fle…
▽ More
Machine learning based solvers have garnered much attention in physical simulation and scientific computing, with a prominent example, physics-informed neural networks (PINNs). However, PINNs often struggle to solve high-frequency and multi-scale PDEs, which can be due to spectral bias during neural network training. To address this problem, we resort to the Gaussian process (GP) framework. To flexibly capture the dominant frequencies, we model the power spectrum of the PDE solution with a student $t$ mixture or Gaussian mixture. We apply the inverse Fourier transform to obtain the covariance function (by Wiener-Khinchin theorem). The covariance derived from the Gaussian mixture spectrum corresponds to the known spectral mixture kernel. Next, we estimate the mixture weights in the log domain, which we show is equivalent to placing a Jeffreys prior. It automatically induces sparsity, prunes excessive frequencies, and adjusts the remaining toward the ground truth. Third, to enable efficient and scalable computation on massive collocation points, which are critical to capture high frequencies, we place the collocation points on a grid, and multiply our covariance function at each input dimension. We use the GP conditional mean to predict the solution and its derivatives so as to fit the boundary condition and the equation itself. As a result, we can derive a Kronecker product structure in the covariance matrix. We use Kronecker product properties and multilinear algebra to promote computational efficiency and scalability, without low-rank approximations. We show the advantage of our method in systematic experiments. The code is released at \url{https://github.com/xuangu-fang/Gaussian-Process-Slover-for-High-Freq-PDE}.
△ Less
Submitted 18 March, 2024; v1 submitted 8 November, 2023;
originally announced November 2023.
-
Language Models are Universal Embedders
Authors:
Xin Zhang,
Zehan Li,
Yanzhao Zhang,
Dingkun Long,
Pengjun Xie,
Meishan Zhang,
Min Zhang
Abstract:
In the large language model (LLM) revolution, embedding is a key component of various systems, such as retrieving knowledge or memories for LLMs or building content moderation filters. As such cases span from English to other natural or programming languages, from retrieval to classification and beyond, it is advantageous to build a unified embedding model rather than dedicated ones for each scena…
▽ More
In the large language model (LLM) revolution, embedding is a key component of various systems, such as retrieving knowledge or memories for LLMs or building content moderation filters. As such cases span from English to other natural or programming languages, from retrieval to classification and beyond, it is advantageous to build a unified embedding model rather than dedicated ones for each scenario. In this context, the pre-trained multilingual decoder-only large language models, e.g., BLOOM, emerge as a viable backbone option. To assess their potential, we propose straightforward strategies for constructing embedders and introduce a universal evaluation benchmark. Experimental results show that our trained model is proficient at generating good embeddings across languages and tasks, even extending to languages and tasks for which no finetuning/pretraining data is available. We also present detailed analyses and additional evaluations. We hope that this work could encourage the development of more robust open-source universal embedders.
△ Less
Submitted 22 May, 2025; v1 submitted 12 October, 2023;
originally announced October 2023.
-
Equation Discovery with Bayesian Spike-and-Slab Priors and Efficient Kernels
Authors:
Da Long,
Wei W. Xing,
Aditi S. Krishnapriyan,
Robert M. Kirby,
Shandian Zhe,
Michael W. Mahoney
Abstract:
Discovering governing equations from data is important to many scientific and engineering applications. Despite promising successes, existing methods are still challenged by data sparsity and noise issues, both of which are ubiquitous in practice. Moreover, state-of-the-art methods lack uncertainty quantification and/or are costly in training. To overcome these limitations, we propose a novel equa…
▽ More
Discovering governing equations from data is important to many scientific and engineering applications. Despite promising successes, existing methods are still challenged by data sparsity and noise issues, both of which are ubiquitous in practice. Moreover, state-of-the-art methods lack uncertainty quantification and/or are costly in training. To overcome these limitations, we propose a novel equation discovery method based on Kernel learning and BAyesian Spike-and-Slab priors (KBASS). We use kernel regression to estimate the target function, which is flexible, expressive, and more robust to data sparsity and noises. We combine it with a Bayesian spike-and-slab prior -- an ideal Bayesian sparse distribution -- for effective operator selection and uncertainty quantification. We develop an expectation-propagation expectation-maximization (EP-EM) algorithm for efficient posterior inference and function estimation. To overcome the computational challenge of kernel regression, we place the function values on a mesh and induce a Kronecker product construction, and we use tensor algebra to enable efficient computation and optimization. We show the advantages of KBASS on a list of benchmark ODE and PDE discovery tasks.
△ Less
Submitted 21 April, 2024; v1 submitted 8 October, 2023;
originally announced October 2023.
-
Hybrid Retrieval and Multi-stage Text Ranking Solution at TREC 2022 Deep Learning Track
Authors:
Guangwei Xu,
Yangzhao Zhang,
Longhui Zhang,
Dingkun Long,
Pengjun Xie,
Ruijie Guo
Abstract:
Large-scale text retrieval technology has been widely used in various practical business scenarios. This paper presents our systems for the TREC 2022 Deep Learning Track. We explain the hybrid text retrieval and multi-stage text ranking method adopted in our solution. The retrieval stage combined the two structures of traditional sparse retrieval and neural dense retrieval. In the ranking stage, i…
▽ More
Large-scale text retrieval technology has been widely used in various practical business scenarios. This paper presents our systems for the TREC 2022 Deep Learning Track. We explain the hybrid text retrieval and multi-stage text ranking method adopted in our solution. The retrieval stage combined the two structures of traditional sparse retrieval and neural dense retrieval. In the ranking stage, in addition to the full interaction-based ranking model built on large pre-trained language model, we also proposes a lightweight sub-ranking module to further enhance the final text ranking performance. Evaluation results demonstrate the effectiveness of our proposed approach. Our models achieve the 1st and 4th rank on the test set of passage ranking and document ranking respectively.
△ Less
Submitted 23 August, 2023;
originally announced August 2023.
-
Towards General Text Embeddings with Multi-stage Contrastive Learning
Authors:
Zehan Li,
Xin Zhang,
Yanzhao Zhang,
Dingkun Long,
Pengjun Xie,
Meishan Zhang
Abstract:
We present GTE, a general-purpose text embedding model trained with multi-stage contrastive learning. In line with recent advancements in unifying various NLP tasks into a single format, we train a unified text embedding model by employing contrastive learning over a diverse mixture of datasets from multiple sources. By significantly increasing the number of training data during both unsupervised…
▽ More
We present GTE, a general-purpose text embedding model trained with multi-stage contrastive learning. In line with recent advancements in unifying various NLP tasks into a single format, we train a unified text embedding model by employing contrastive learning over a diverse mixture of datasets from multiple sources. By significantly increasing the number of training data during both unsupervised pre-training and supervised fine-tuning stages, we achieve substantial performance gains over existing embedding models. Notably, even with a relatively modest parameter count of 110M, GTE$_\text{base}$ outperforms the black-box embedding API provided by OpenAI and even surpasses 10x larger text embedding models on the massive text embedding benchmark. Furthermore, without additional fine-tuning on each programming language individually, our model outperforms previous best code retrievers of similar size by treating code as text. In summary, our model achieves impressive results by effectively harnessing multi-stage contrastive learning, offering a powerful and efficient text embedding model with broad applicability across various NLP and code-related tasks.
△ Less
Submitted 6 August, 2023;
originally announced August 2023.
-
Challenging Decoder helps in Masked Auto-Encoder Pre-training for Dense Passage Retrieval
Authors:
Zehan Li,
Yanzhao Zhang,
Dingkun Long,
Pengjun Xie
Abstract:
Recently, various studies have been directed towards exploring dense passage retrieval techniques employing pre-trained language models, among which the masked auto-encoder (MAE) pre-training architecture has emerged as the most promising. The conventional MAE framework relies on leveraging the passage reconstruction of decoder to bolster the text representation ability of encoder, thereby enhanci…
▽ More
Recently, various studies have been directed towards exploring dense passage retrieval techniques employing pre-trained language models, among which the masked auto-encoder (MAE) pre-training architecture has emerged as the most promising. The conventional MAE framework relies on leveraging the passage reconstruction of decoder to bolster the text representation ability of encoder, thereby enhancing the performance of resulting dense retrieval systems. Within the context of building the representation ability of the encoder through passage reconstruction of decoder, it is reasonable to postulate that a ``more demanding'' decoder will necessitate a corresponding increase in the encoder's ability. To this end, we propose a novel token importance aware masking strategy based on pointwise mutual information to intensify the challenge of the decoder. Importantly, our approach can be implemented in an unsupervised manner, without adding additional expenses to the pre-training phase. Our experiments verify that the proposed method is both effective and robust on large-scale supervised passage retrieval datasets and out-of-domain zero-shot retrieval benchmarks.
△ Less
Submitted 22 May, 2023;
originally announced May 2023.
-
Making Sense of Machine Learning: Integrating Youth's Conceptual, Creative, and Critical Understandings of AI
Authors:
Luis Morales-Navarro,
Yasmin B. Kafai,
Francisco Castro,
William Payne,
Kayla DesPortes,
Daniella DiPaola,
Randi Williams,
Safinah Ali,
Cynthia Breazeal,
Clifford Lee,
Elisabeth Soep,
Duri Long,
Brian Magerko,
Jaemarie Solyst,
Amy Ogan,
Cansu Tatar,
Shiyan Jiang,
Jie Chao,
Carolyn P. Rosé,
Sepehr Vakil
Abstract:
Understanding how youth make sense of machine learning and how learning about machine learning can be supported in and out of school is more relevant than ever before as young people interact with machine learning powered applications everyday; while connecting with friends, listening to music, playing games, or attending school. In this symposium, we present different perspectives on understandin…
▽ More
Understanding how youth make sense of machine learning and how learning about machine learning can be supported in and out of school is more relevant than ever before as young people interact with machine learning powered applications everyday; while connecting with friends, listening to music, playing games, or attending school. In this symposium, we present different perspectives on understanding how learners make sense of machine learning in their everyday lives, how sensemaking of machine learning can be supported in and out of school through the construction of applications, and how youth critically evaluate machine learning powered systems. We discuss how sensemaking of machine learning applications involves the development and integration of conceptual, creative, and critical understandings that are increasingly important to prepare youth to participate in the world.
△ Less
Submitted 4 May, 2023;
originally announced May 2023.
-
Unsupervised Boundary-Aware Language Model Pretraining for Chinese Sequence Labeling
Authors:
Peijie Jiang,
Dingkun Long,
Yanzhao Zhang,
Pengjun Xie,
Meishan Zhang,
Min Zhang
Abstract:
Boundary information is critical for various Chinese language processing tasks, such as word segmentation, part-of-speech tagging, and named entity recognition. Previous studies usually resorted to the use of a high-quality external lexicon, where lexicon items can offer explicit boundary information. However, to ensure the quality of the lexicon, great human effort is always necessary, which has…
▽ More
Boundary information is critical for various Chinese language processing tasks, such as word segmentation, part-of-speech tagging, and named entity recognition. Previous studies usually resorted to the use of a high-quality external lexicon, where lexicon items can offer explicit boundary information. However, to ensure the quality of the lexicon, great human effort is always necessary, which has been generally ignored. In this work, we suggest unsupervised statistical boundary information instead, and propose an architecture to encode the information directly into pre-trained language models, resulting in Boundary-Aware BERT (BABERT). We apply BABERT for feature induction of Chinese sequence labeling tasks. Experimental results on ten benchmarks of Chinese sequence labeling demonstrate that BABERT can provide consistent improvements on all datasets. In addition, our method can complement previous supervised lexicon exploration, where further improvements can be achieved when integrated with external lexicon information.
△ Less
Submitted 27 October, 2022;
originally announced October 2022.
-
Retrieval Oriented Masking Pre-training Language Model for Dense Passage Retrieval
Authors:
Dingkun Long,
Yanzhao Zhang,
Guangwei Xu,
Pengjun Xie
Abstract:
Pre-trained language model (PTM) has been shown to yield powerful text representations for dense passage retrieval task. The Masked Language Modeling (MLM) is a major sub-task of the pre-training process. However, we found that the conventional random masking strategy tend to select a large number of tokens that have limited effect on the passage retrieval task (e,g. stop-words and punctuation). B…
▽ More
Pre-trained language model (PTM) has been shown to yield powerful text representations for dense passage retrieval task. The Masked Language Modeling (MLM) is a major sub-task of the pre-training process. However, we found that the conventional random masking strategy tend to select a large number of tokens that have limited effect on the passage retrieval task (e,g. stop-words and punctuation). By noticing the term importance weight can provide valuable information for passage retrieval, we hereby propose alternative retrieval oriented masking (dubbed as ROM) strategy where more important tokens will have a higher probability of being masked out, to capture this straightforward yet essential information to facilitate the language model pre-training process. Notably, the proposed new token masking method will not change the architecture and learning objective of original PTM. Our experiments verify that the proposed ROM enables term importance information to help language model pre-training thus achieving better performance on multiple passage retrieval benchmarks.
△ Less
Submitted 26 October, 2022;
originally announced October 2022.
-
A Kernel Approach for PDE Discovery and Operator Learning
Authors:
Da Long,
Nicole Mrvaljevic,
Shandian Zhe,
Bamdad Hosseini
Abstract:
This article presents a three-step framework for learning and solving partial differential equations (PDEs) using kernel methods. Given a training set consisting of pairs of noisy PDE solutions and source/boundary terms on a mesh, kernel smoothing is utilized to denoise the data and approximate derivatives of the solution. This information is then used in a kernel regression model to learn the alg…
▽ More
This article presents a three-step framework for learning and solving partial differential equations (PDEs) using kernel methods. Given a training set consisting of pairs of noisy PDE solutions and source/boundary terms on a mesh, kernel smoothing is utilized to denoise the data and approximate derivatives of the solution. This information is then used in a kernel regression model to learn the algebraic form of the PDE. The learned PDE is then used within a kernel based solver to approximate the solution of the PDE with a new source/boundary term, thereby constituting an operator learning framework. Numerical experiments compare the method to state-of-the-art algorithms and demonstrate its competitive performance.
△ Less
Submitted 30 March, 2023; v1 submitted 14 October, 2022;
originally announced October 2022.
-
Autonomous Passage Planning for a Polar Vessel
Authors:
Jonathan D. Smith,
Samuel Hall,
George Coombs,
James Byrne,
Michael A. S. Thorne,
J. Alexander Brearley,
Derek Long,
Michael Meredith,
Maria Fox
Abstract:
We introduce a method for long-distance maritime route planning in polar regions, taking into account complex changing environmental conditions. The method allows the construction of optimised routes, describing the three main stages of the process: discrete modelling of the environmental conditions using a non-uniform mesh, the construction of mesh-optimal paths, and path smoothing. In order to a…
▽ More
We introduce a method for long-distance maritime route planning in polar regions, taking into account complex changing environmental conditions. The method allows the construction of optimised routes, describing the three main stages of the process: discrete modelling of the environmental conditions using a non-uniform mesh, the construction of mesh-optimal paths, and path smoothing. In order to account for different vehicle properties we construct a series of data driven functions that can be applied to the environmental mesh to determine the speed limitations and fuel requirements for a given vessel and mesh cell, representing these quantities graphically and geospatially. In describing our results, we demonstrate an example use case for route planning for the polar research ship the RRS Sir David Attenborough (SDA), accounting for ice-performance characteristics and validating the spatial-temporal route construction in the region of the Weddell Sea, Antarctica. We demonstrate the versatility of this route construction method by demonstrating that routes change depending on the seasonal sea ice variability, differences in the route-planning objective functions used, and the presence of other environmental conditions such as currents. To demonstrate the generality of our approach, we present examples in the Arctic Ocean and the Baltic Sea. The techniques outlined in this manuscript are generic and can therefore be applied to vessels with different characteristics. Our approach can have considerable utility beyond just a single vessel planning procedure, and we outline how this workflow is applicable to a wider community, e.g. commercial and passenger shipping.
△ Less
Submitted 13 September, 2022; v1 submitted 17 August, 2022;
originally announced September 2022.
-
Understanding a Robot's Guiding Ethical Principles via Automatically Generated Explanations
Authors:
Benjamin Krarup,
Felix Lindner,
Senka Krivic,
Derek Long
Abstract:
The continued development of robots has enabled their wider usage in human surroundings. Robots are more trusted to make increasingly important decisions with potentially critical outcomes. Therefore, it is essential to consider the ethical principles under which robots operate. In this paper we examine how contrastive and non-contrastive explanations can be used in understanding the ethics of rob…
▽ More
The continued development of robots has enabled their wider usage in human surroundings. Robots are more trusted to make increasingly important decisions with potentially critical outcomes. Therefore, it is essential to consider the ethical principles under which robots operate. In this paper we examine how contrastive and non-contrastive explanations can be used in understanding the ethics of robot action plans. We build upon an existing ethical framework to allow users to make suggestions about plans and receive automatically generated contrastive explanations. Results of a user study indicate that the generated explanations help humans to understand the ethical principles that underlie a robot's plan.
△ Less
Submitted 20 June, 2022;
originally announced June 2022.
-
HLATR: Enhance Multi-stage Text Retrieval with Hybrid List Aware Transformer Reranking
Authors:
Yanzhao Zhang,
Dingkun Long,
Guangwei Xu,
Pengjun Xie
Abstract:
Deep pre-trained language models (e,g. BERT) are effective at large-scale text retrieval task. Existing text retrieval systems with state-of-the-art performance usually adopt a retrieve-then-reranking architecture due to the high computational cost of pre-trained language models and the large corpus size. Under such a multi-stage architecture, previous studies mainly focused on optimizing single s…
▽ More
Deep pre-trained language models (e,g. BERT) are effective at large-scale text retrieval task. Existing text retrieval systems with state-of-the-art performance usually adopt a retrieve-then-reranking architecture due to the high computational cost of pre-trained language models and the large corpus size. Under such a multi-stage architecture, previous studies mainly focused on optimizing single stage of the framework thus improving the overall retrieval performance. However, how to directly couple multi-stage features for optimization has not been well studied. In this paper, we design Hybrid List Aware Transformer Reranking (HLATR) as a subsequent reranking module to incorporate both retrieval and reranking stage features. HLATR is lightweight and can be easily parallelized with existing text retrieval systems so that the reranking process can be performed in a single yet efficient processing. Empirical experiments on two large-scale text retrieval datasets show that HLATR can efficiently improve the ranking performance of existing multi-stage text retrieval methods.
△ Less
Submitted 21 May, 2022;
originally announced May 2022.
-
Towards on-sky adaptive optics control using reinforcement learning
Authors:
J. Nousiainen,
C. Rajani,
M. Kasper,
T. Helin,
S. Y. Haffert,
C. Vérinaud,
J. R. Males,
K. Van Gorkom,
L. M. Close,
J. D. Long,
A. D. Hedglen,
O. Guyon,
L. Schatz,
M. Kautz,
J. Lumbres,
A. Rodack,
J. M. Knight,
K. Miller
Abstract:
The direct imaging of potentially habitable Exoplanets is one prime science case for the next generation of high contrast imaging instruments on ground-based extremely large telescopes. To reach this demanding science goal, the instruments are equipped with eXtreme Adaptive Optics (XAO) systems which will control thousands of actuators at a framerate of kilohertz to several kilohertz. Most of the…
▽ More
The direct imaging of potentially habitable Exoplanets is one prime science case for the next generation of high contrast imaging instruments on ground-based extremely large telescopes. To reach this demanding science goal, the instruments are equipped with eXtreme Adaptive Optics (XAO) systems which will control thousands of actuators at a framerate of kilohertz to several kilohertz. Most of the habitable exoplanets are located at small angular separations from their host stars, where the current XAO systems' control laws leave strong residuals.Current AO control strategies like static matrix-based wavefront reconstruction and integrator control suffer from temporal delay error and are sensitive to mis-registration, i.e., to dynamic variations of the control system geometry. We aim to produce control methods that cope with these limitations, provide a significantly improved AO correction and, therefore, reduce the residual flux in the coronagraphic point spread function.
We extend previous work in Reinforcement Learning for AO. The improved method, called PO4AO, learns a dynamics model and optimizes a control neural network, called a policy. We introduce the method and study it through numerical simulations of XAO with Pyramid wavefront sensing for the 8-m and 40-m telescope aperture cases. We further implemented PO4AO and carried out experiments in a laboratory environment using MagAO-X at the Steward laboratory. PO4AO provides the desired performance by improving the coronagraphic contrast in numerical simulations by factors 3-5 within the control region of DM and Pyramid WFS, in simulation and in the laboratory. The presented method is also quick to train, i.e., on timescales of typically 5-10 seconds, and the inference time is sufficiently small (< ms) to be used in real-time control for XAO with currently available hardware even for extremely large telescopes.
△ Less
Submitted 16 May, 2022;
originally announced May 2022.
-
ChartQA: A Benchmark for Question Answering about Charts with Visual and Logical Reasoning
Authors:
Ahmed Masry,
Do Xuan Long,
Jia Qing Tan,
Shafiq Joty,
Enamul Hoque
Abstract:
Charts are very popular for analyzing data. When exploring charts, people often ask a variety of complex reasoning questions that involve several logical and arithmetic operations. They also commonly refer to visual features of a chart in their questions. However, most existing datasets do not focus on such complex reasoning questions as their questions are template-based and answers come from a f…
▽ More
Charts are very popular for analyzing data. When exploring charts, people often ask a variety of complex reasoning questions that involve several logical and arithmetic operations. They also commonly refer to visual features of a chart in their questions. However, most existing datasets do not focus on such complex reasoning questions as their questions are template-based and answers come from a fixed-vocabulary. In this work, we present a large-scale benchmark covering 9.6K human-written questions as well as 23.1K questions generated from human-written chart summaries. To address the unique challenges in our benchmark involving visual and logical reasoning over charts, we present two transformer-based models that combine visual features and the data table of the chart in a unified way to answer questions. While our models achieve the state-of-the-art results on the previous datasets as well as on our benchmark, the evaluation also reveals several challenges in answering complex reasoning questions.
△ Less
Submitted 19 March, 2022;
originally announced March 2022.
-
Multi-CPR: A Multi Domain Chinese Dataset for Passage Retrieval
Authors:
Dingkun Long,
Qiong Gao,
Kuan Zou,
Guangwei Xu,
Pengjun Xie,
Ruijie Guo,
Jian Xu,
Guanjun Jiang,
Luxi Xing,
Ping Yang
Abstract:
Passage retrieval is a fundamental task in information retrieval (IR) research, which has drawn much attention recently. In the English field, the availability of large-scale annotated dataset (e.g, MS MARCO) and the emergence of deep pre-trained language models (e.g, BERT) has resulted in a substantial improvement of existing passage retrieval systems. However, in the Chinese field, especially fo…
▽ More
Passage retrieval is a fundamental task in information retrieval (IR) research, which has drawn much attention recently. In the English field, the availability of large-scale annotated dataset (e.g, MS MARCO) and the emergence of deep pre-trained language models (e.g, BERT) has resulted in a substantial improvement of existing passage retrieval systems. However, in the Chinese field, especially for specific domains, passage retrieval systems are still immature due to quality-annotated dataset being limited by scale. Therefore, in this paper, we present a novel multi-domain Chinese dataset for passage retrieval (Multi-CPR). The dataset is collected from three different domains, including E-commerce, Entertainment video and Medical. Each dataset contains millions of passages and a certain amount of human annotated query-passage related pairs. We implement various representative passage retrieval methods as baselines. We find that the performance of retrieval models trained on dataset from general domain will inevitably decrease on specific domain. Nevertheless, a passage retrieval system built on in-domain annotated dataset can achieve significant improvement, which indeed demonstrates the necessity of domain labeled data for further optimization. We hope the release of the Multi-CPR dataset could benchmark Chinese passage retrieval task in specific domain and also make advances for future studies.
△ Less
Submitted 24 April, 2022; v1 submitted 7 March, 2022;
originally announced March 2022.
-
BlazeNeo: Blazing fast polyp segmentation and neoplasm detection
Authors:
Nguyen Sy An,
Phan Ngoc Lan,
Dao Viet Hang,
Dao Van Long,
Tran Quang Trung,
Nguyen Thi Thuy,
Dinh Viet Sang
Abstract:
In recent years, computer-aided automatic polyp segmentation and neoplasm detection have been an emerging topic in medical image analysis, providing valuable support to colonoscopy procedures. Attentions have been paid to improving the accuracy of polyp detection and segmentation. However, not much focus has been given to latency and throughput for performing these tasks on dedicated devices, whic…
▽ More
In recent years, computer-aided automatic polyp segmentation and neoplasm detection have been an emerging topic in medical image analysis, providing valuable support to colonoscopy procedures. Attentions have been paid to improving the accuracy of polyp detection and segmentation. However, not much focus has been given to latency and throughput for performing these tasks on dedicated devices, which can be crucial for practical applications. This paper introduces a novel deep neural network architecture called BlazeNeo, for the task of polyp segmentation and neoplasm detection with an emphasis on compactness and speed while maintaining high accuracy. The model leverages the highly efficient HarDNet backbone alongside lightweight Receptive Field Blocks for computational efficiency, and an auxiliary training mechanism to take full advantage of the training data for the segmentation quality. Our experiments on a challenging dataset show that BlazeNeo achieves improvements in latency and model size while maintaining comparable accuracy against state-of-the-art methods. When deploying on the Jetson AGX Xavier edge device in INT8 precision, our BlazeNeo achieves over 155 fps while yielding the best accuracy among all compared methods.
△ Less
Submitted 28 February, 2022;
originally announced March 2022.
-
AutoIP: A United Framework to Integrate Physics into Gaussian Processes
Authors:
Da Long,
Zheng Wang,
Aditi Krishnapriyan,
Robert Kirby,
Shandian Zhe,
Michael Mahoney
Abstract:
Physical modeling is critical for many modern science and engineering applications. From a data science or machine learning perspective, where more domain-agnostic, data-driven models are pervasive, physical knowledge -- often expressed as differential equations -- is valuable in that it is complementary to data, and it can potentially help overcome issues such as data sparsity, noise, and inaccur…
▽ More
Physical modeling is critical for many modern science and engineering applications. From a data science or machine learning perspective, where more domain-agnostic, data-driven models are pervasive, physical knowledge -- often expressed as differential equations -- is valuable in that it is complementary to data, and it can potentially help overcome issues such as data sparsity, noise, and inaccuracy. In this work, we propose a simple, yet powerful and general framework -- AutoIP, for Automatically Incorporating Physics -- that can integrate all kinds of differential equations into Gaussian Processes (GPs) to enhance prediction accuracy and uncertainty quantification. These equations can be linear or nonlinear, spatial, temporal, or spatio-temporal, complete or incomplete with unknown source terms, and so on. Based on kernel differentiation, we construct a GP prior to sample the values of the target function, equation-related derivatives, and latent source functions, which are all jointly from a multivariate Gaussian distribution. The sampled values are fed to two likelihoods: one to fit the observations, and the other to conform to the equation. We use the whitening method to evade the strong dependency between the sampled function values and kernel parameters, and we develop a stochastic variational learning algorithm. AutoIP shows improvement upon vanilla GPs in both simulation and several real-world applications, even using rough, incomplete equations.
△ Less
Submitted 20 July, 2022; v1 submitted 24 February, 2022;
originally announced February 2022.
-
Non-invasive hemodynamic analysis for aortic regurgitation using computational fluid dynamics and deep learning
Authors:
Derek Long,
Cameron McMurdo,
Edward Ferdian,
Charlene A. Mauger,
David Marlevi,
Alistair A. Young,
Martyn P. Nash
Abstract:
Changes in cardiovascular hemodynamics are closely related to the development of aortic regurgitation, a type of valvular heart disease. Metrics derived from blood flows are used to indicate aortic regurgitation onset and evaluate its severity. These metrics can be non-invasively obtained using four-dimensional (4D) flow magnetic resonance imaging (MRI), where accuracy is primarily dependent on sp…
▽ More
Changes in cardiovascular hemodynamics are closely related to the development of aortic regurgitation, a type of valvular heart disease. Metrics derived from blood flows are used to indicate aortic regurgitation onset and evaluate its severity. These metrics can be non-invasively obtained using four-dimensional (4D) flow magnetic resonance imaging (MRI), where accuracy is primarily dependent on spatial resolution. However, insufficient resolution often results from limitations in 4D flow MRI and complex aortic regurgitation hemodynamics. To address this, computational fluid dynamics simulations were transformed into synthetic 4D flow MRI data and used to train a variety of neural networks. These networks generated super resolution, full-field phase images with an upsample factor of 4. Results showed decreased velocity error, high structural similarity scores, and improved learning capabilities from previous work. Further validation was performed on two sets of in-vivo 4D flow MRI data and demonstrated success in de-noising flow images. This approach presents an opportunity to comprehensively analyse aortic regurgitation hemodynamics in a non-invasive manner.
△ Less
Submitted 5 April, 2022; v1 submitted 23 November, 2021;
originally announced November 2021.
-
Long-Range Route-planning for Autonomous Vehicles in the Polar Oceans
Authors:
Maria Fox,
Michael Meredith,
J. Alexander Brearley,
Dan Jones,
Derek Long
Abstract:
There is an increasing demand for piloted autonomous underwater vehicles (AUVs) to operate in polar ice conditions. At present, AUVs are deployed from ships and directly human-piloted in these regions, entailing a high carbon cost and limiting the scope of operations. A key requirement for long-term autonomous missions is a long-range route planning capability that is aware of the changing ice con…
▽ More
There is an increasing demand for piloted autonomous underwater vehicles (AUVs) to operate in polar ice conditions. At present, AUVs are deployed from ships and directly human-piloted in these regions, entailing a high carbon cost and limiting the scope of operations. A key requirement for long-term autonomous missions is a long-range route planning capability that is aware of the changing ice conditions. In this paper we address the problem of automating long-range route-planning for AUVs operating in the Southern Ocean. We present the route-planning method and results showing that efficient, ice-avoiding, long-distance traverses can be planned.
△ Less
Submitted 20 November, 2021; v1 submitted 30 October, 2021;
originally announced November 2021.