-
SuperGPQA: Scaling LLM Evaluation across 285 Graduate Disciplines
Authors:
M-A-P Team,
Xinrun Du,
Yifan Yao,
Kaijing Ma,
Bingli Wang,
Tianyu Zheng,
King Zhu,
Minghao Liu,
Yiming Liang,
Xiaolong Jin,
Zhenlin Wei,
Chujie Zheng,
Kaixin Deng,
Shawn Gavin,
Shian Jia,
Sichao Jiang,
Yiyan Liao,
Rui Li,
Qinrui Li,
Sirun Li,
Yizhi Li,
Yunwen Li,
David Ma,
Yuansheng Ni,
Haoran Que
, et al. (72 additional authors not shown)
Abstract:
Large language models (LLMs) have demonstrated remarkable proficiency in mainstream academic disciplines such as mathematics, physics, and computer science. However, human knowledge encompasses over 200 specialized disciplines, far exceeding the scope of existing benchmarks. The capabilities of LLMs in many of these specialized fields-particularly in light industry, agriculture, and service-orient…
▽ More
Large language models (LLMs) have demonstrated remarkable proficiency in mainstream academic disciplines such as mathematics, physics, and computer science. However, human knowledge encompasses over 200 specialized disciplines, far exceeding the scope of existing benchmarks. The capabilities of LLMs in many of these specialized fields-particularly in light industry, agriculture, and service-oriented disciplines-remain inadequately evaluated. To address this gap, we present SuperGPQA, a comprehensive benchmark that evaluates graduate-level knowledge and reasoning capabilities across 285 disciplines. Our benchmark employs a novel Human-LLM collaborative filtering mechanism to eliminate trivial or ambiguous questions through iterative refinement based on both LLM responses and expert feedback. Our experimental results reveal significant room for improvement in the performance of current state-of-the-art LLMs across diverse knowledge domains (e.g., the reasoning-focused model DeepSeek-R1 achieved the highest accuracy of 61.82% on SuperGPQA), highlighting the considerable gap between current model capabilities and artificial general intelligence. Additionally, we present comprehensive insights from our management of a large-scale annotation process, involving over 80 expert annotators and an interactive Human-LLM collaborative system, offering valuable methodological guidance for future research initiatives of comparable scope.
△ Less
Submitted 28 March, 2025; v1 submitted 20 February, 2025;
originally announced February 2025.
-
Rumor Detection by Multi-task Suffix Learning based on Time-series Dual Sentiments
Authors:
Zhiwei Liu,
Kailai Yang,
Eduard Hovy,
Sophia Ananiadou
Abstract:
The widespread dissemination of rumors on social media has a significant impact on people's lives, potentially leading to public panic and fear. Rumors often evoke specific sentiments, resonating with readers and prompting sharing. To effectively detect and track rumors, it is essential to observe the fine-grained sentiments of both source and response message pairs as the rumor evolves over time.…
▽ More
The widespread dissemination of rumors on social media has a significant impact on people's lives, potentially leading to public panic and fear. Rumors often evoke specific sentiments, resonating with readers and prompting sharing. To effectively detect and track rumors, it is essential to observe the fine-grained sentiments of both source and response message pairs as the rumor evolves over time. However, current rumor detection methods fail to account for this aspect. In this paper, we propose MSuf, the first multi-task suffix learning framework for rumor detection and tracking using time series dual (coupled) sentiments. MSuf includes three modules: (1) an LLM to extract sentiment intensity features and sort them chronologically; (2) a module that fuses the sorted sentiment features with their source text word embeddings to obtain an aligned embedding; (3) two hard prompts are combined with the aligned vector to perform rumor detection and sentiment analysis using one frozen LLM. MSuf effectively enhances the performance of LLMs for rumor detection with only minimal parameter fine-tuning. Evaluating MSuf on four rumor detection benchmarks, we find significant improvements compared to other emotion-based methods.
△ Less
Submitted 20 February, 2025;
originally announced February 2025.
-
Proving Olympiad Inequalities by Synergizing LLMs and Symbolic Reasoning
Authors:
Zenan Li,
Zhaoyu Li,
Wen Tang,
Xian Zhang,
Yuan Yao,
Xujie Si,
Fan Yang,
Kaiyu Yang,
Xiaoxing Ma
Abstract:
Large language models (LLMs) can prove mathematical theorems formally by generating proof steps (\textit{a.k.a.} tactics) within a proof system. However, the space of possible tactics is vast and complex, while the available training data for formal proofs is limited, posing a significant challenge to LLM-based tactic generation. To address this, we introduce a neuro-symbolic tactic generator that…
▽ More
Large language models (LLMs) can prove mathematical theorems formally by generating proof steps (\textit{a.k.a.} tactics) within a proof system. However, the space of possible tactics is vast and complex, while the available training data for formal proofs is limited, posing a significant challenge to LLM-based tactic generation. To address this, we introduce a neuro-symbolic tactic generator that synergizes the mathematical intuition learned by LLMs with domain-specific insights encoded by symbolic methods. The key aspect of this integration is identifying which parts of mathematical reasoning are best suited to LLMs and which to symbolic methods. While the high-level idea of neuro-symbolic integration is broadly applicable to various mathematical problems, in this paper, we focus specifically on Olympiad inequalities (Figure~1). We analyze how humans solve these problems and distill the techniques into two types of tactics: (1) scaling, handled by symbolic methods, and (2) rewriting, handled by LLMs. In addition, we combine symbolic tools with LLMs to prune and rank the proof goals for efficient proof search. We evaluate our framework on 161 challenging inequalities from multiple mathematics competitions, achieving state-of-the-art performance and significantly outperforming existing LLM and symbolic approaches without requiring additional training data.
△ Less
Submitted 26 February, 2025; v1 submitted 19 February, 2025;
originally announced February 2025.
-
FlexDuo: A Pluggable System for Enabling Full-Duplex Capabilities in Speech Dialogue Systems
Authors:
Borui Liao,
Yulong Xu,
Jiao Ou,
Kaiyuan Yang,
Weihua Jian,
Pengfei Wan,
Di Zhang
Abstract:
Full-Duplex Speech Dialogue Systems (Full-Duplex SDS) have significantly enhanced the naturalness of human-machine interaction by enabling real-time bidirectional communication. However, existing approaches face challenges such as difficulties in independent module optimization and contextual noise interference due to highly coupled architectural designs and oversimplified binary state modeling. T…
▽ More
Full-Duplex Speech Dialogue Systems (Full-Duplex SDS) have significantly enhanced the naturalness of human-machine interaction by enabling real-time bidirectional communication. However, existing approaches face challenges such as difficulties in independent module optimization and contextual noise interference due to highly coupled architectural designs and oversimplified binary state modeling. This paper proposes FlexDuo, a flexible full-duplex control module that decouples duplex control from spoken dialogue systems through a plug-and-play architectural design. Furthermore, inspired by human information-filtering mechanisms in conversations, we introduce an explicit Idle state. On one hand, the Idle state filters redundant noise and irrelevant audio to enhance dialogue quality. On the other hand, it establishes a semantic integrity-based buffering mechanism, reducing the risk of mutual interruptions while ensuring accurate response transitions. Experimental results on the Fisher corpus demonstrate that FlexDuo reduces the false interruption rate by 24.9% and improves response accuracy by 7.6% compared to integrated full-duplex dialogue system baselines. It also outperforms voice activity detection (VAD) controlled baseline systems in both Chinese and English dialogue quality. The proposed modular architecture and state-based dialogue model provide a novel technical pathway for building flexible and efficient duplex dialogue systems.
△ Less
Submitted 19 February, 2025;
originally announced February 2025.
-
An Algorithm Board in Neural Decoding
Authors:
Jingyi Feng,
Kai Yang
Abstract:
Understanding the mechanisms of neural encoding and decoding has always been a highly interesting research topic in fields such as neuroscience and cognitive intelligence. In prior studies, some researchers identified a symmetry in neural data decoded by unsupervised methods in motor scenarios and constructed a cognitive learning system based on this pattern (i.e., symmetry). Nevertheless, the dis…
▽ More
Understanding the mechanisms of neural encoding and decoding has always been a highly interesting research topic in fields such as neuroscience and cognitive intelligence. In prior studies, some researchers identified a symmetry in neural data decoded by unsupervised methods in motor scenarios and constructed a cognitive learning system based on this pattern (i.e., symmetry). Nevertheless, the distribution state of the data flow that significantly influences neural decoding positions still remains a mystery within the system, which further restricts the enhancement of the system's interpretability. Based on this, this paper mainly explores changes in the distribution state within the system from the machine learning and mathematical statistics perspectives. In the experiment, we assessed the correctness of this symmetry using various tools and indicators commonly utilized in mathematics and statistics. According to the experimental results, the normal distribution (or Gaussian distribution) plays a crucial role in the decoding of prediction positions within the system. Eventually, an algorithm board similar to the Galton board was built to serve as the mathematical foundation of the discovered symmetry.
△ Less
Submitted 17 February, 2025;
originally announced February 2025.
-
RealSyn: An Effective and Scalable Multimodal Interleaved Document Transformation Paradigm
Authors:
Tiancheng Gu,
Kaicheng Yang,
Chaoyi Zhang,
Yin Xie,
Xiang An,
Ziyong Feng,
Dongnan Liu,
Weidong Cai,
Jiankang Deng
Abstract:
After pre-training on extensive image-text pairs, Contrastive Language-Image Pre-training (CLIP) demonstrates promising performance on a wide variety of benchmarks. However, a substantial volume of multimodal interleaved documents remains underutilized for contrastive vision-language representation learning. To fully leverage these unpaired documents, we initially establish a Real-World Data Extra…
▽ More
After pre-training on extensive image-text pairs, Contrastive Language-Image Pre-training (CLIP) demonstrates promising performance on a wide variety of benchmarks. However, a substantial volume of multimodal interleaved documents remains underutilized for contrastive vision-language representation learning. To fully leverage these unpaired documents, we initially establish a Real-World Data Extraction pipeline to extract high-quality images and texts. Then we design a hierarchical retrieval method to efficiently associate each image with multiple semantically relevant realistic texts. To further enhance fine-grained visual information, we propose an image semantic augmented generation module for synthetic text production. Furthermore, we employ a semantic balance sampling strategy to improve dataset diversity, enabling better learning of long-tail concepts. Based on these innovations, we construct RealSyn, a dataset combining realistic and synthetic texts, available in three scales: 15M, 30M, and 100M. We compare our dataset with other widely used datasets of equivalent scale for CLIP training. Models pre-trained on RealSyn consistently achieve state-of-the-art performance across various downstream tasks, including linear probe, zero-shot transfer, zero-shot robustness, and zero-shot retrieval. Furthermore, extensive experiments confirm that RealSyn significantly enhances contrastive vision-language representation learning and demonstrates robust scalability. To facilitate future research, the RealSyn dataset and pretrained model weights are released at https://github.com/deepglint/RealSyn.
△ Less
Submitted 16 April, 2025; v1 submitted 17 February, 2025;
originally announced February 2025.
-
MotifBench: A standardized protein design benchmark for motif-scaffolding problems
Authors:
Zhuoqi Zheng,
Bo Zhang,
Kieran Didi,
Kevin K. Yang,
Jason Yim,
Joseph L. Watson,
Hai-Feng Chen,
Brian L. Trippe
Abstract:
The motif-scaffolding problem is a central task in computational protein design: Given the coordinates of atoms in a geometry chosen to confer a desired biochemical function (a motif), the task is to identify diverse protein structures (scaffolds) that include the motif and maintain its geometry. Significant recent progress on motif-scaffolding has been made due to computational evaluation with re…
▽ More
The motif-scaffolding problem is a central task in computational protein design: Given the coordinates of atoms in a geometry chosen to confer a desired biochemical function (a motif), the task is to identify diverse protein structures (scaffolds) that include the motif and maintain its geometry. Significant recent progress on motif-scaffolding has been made due to computational evaluation with reliable protein structure prediction and fixed-backbone sequence design methods. However, significant variability in evaluation strategies across publications has hindered comparability of results, challenged reproducibility, and impeded robust progress. In response we introduce MotifBench, comprising (1) a precisely specified pipeline and evaluation metrics, (2) a collection of 30 benchmark problems, and (3) an implementation of this benchmark and leaderboard at github.com/blt2114/MotifBench. The MotifBench test cases are more difficult compared to earlier benchmarks, and include protein design problems for which solutions are known but on which, to the best of our knowledge, state-of-the-art methods fail to identify any solution.
△ Less
Submitted 19 February, 2025; v1 submitted 17 February, 2025;
originally announced February 2025.
-
Roadmap to fault tolerant quantum computation using topological qubit arrays
Authors:
David Aasen,
Morteza Aghaee,
Zulfi Alam,
Mariusz Andrzejczuk,
Andrey Antipov,
Mikhail Astafev,
Lukas Avilovas,
Amin Barzegar,
Bela Bauer,
Jonathan Becker,
Juan M. Bello-Rivas,
Umesh Bhaskar,
Alex Bocharov,
Srini Boddapati,
David Bohn,
Jouri Bommer,
Parsa Bonderson,
Jan Borovsky,
Leo Bourdet,
Samuel Boutin,
Tom Brown,
Gary Campbell,
Lucas Casparis,
Srivatsa Chakravarthi,
Rui Chao
, et al. (157 additional authors not shown)
Abstract:
We describe a concrete device roadmap towards a fault-tolerant quantum computing architecture based on noise-resilient, topologically protected Majorana-based qubits. Our roadmap encompasses four generations of devices: a single-qubit device that enables a measurement-based qubit benchmarking protocol; a two-qubit device that uses measurement-based braiding to perform single-qubit Clifford operati…
▽ More
We describe a concrete device roadmap towards a fault-tolerant quantum computing architecture based on noise-resilient, topologically protected Majorana-based qubits. Our roadmap encompasses four generations of devices: a single-qubit device that enables a measurement-based qubit benchmarking protocol; a two-qubit device that uses measurement-based braiding to perform single-qubit Clifford operations; an eight-qubit device that can be used to show an improvement of a two-qubit operation when performed on logical qubits rather than directly on physical qubits; and a topological qubit array supporting lattice surgery demonstrations on two logical qubits. Devices that enable this path require a superconductor-semiconductor heterostructure that supports a topological phase, quantum dots and coupling between those quantum dots that can create the appropriate loops for interferometric measurements, and a microwave readout system that can perform fast, low-error single-shot measurements. We describe the key design components of these qubit devices, along with the associated protocols for demonstrations of single-qubit benchmarking, Clifford gate execution, quantum error detection, and quantum error correction, which differ greatly from those in more conventional qubits. Finally, we comment on implications and advantages of this architecture for utility-scale quantum computation.
△ Less
Submitted 7 April, 2025; v1 submitted 17 February, 2025;
originally announced February 2025.
-
InfiR : Crafting Effective Small Language Models and Multimodal Small Language Models in Reasoning
Authors:
Congkai Xie,
Shuo Cai,
Wenjun Wang,
Pengxiang Li,
Zhijie Sang,
Kejing Yang,
Yiming Zhang,
Zhen Li,
Guanghao Zhu,
Zeyu Liu,
Yang Yu,
Yuhang Liu,
Su Lu,
Baoyi He,
Qi Zhou,
Xiaotian Han,
Jianbo Yuan,
Shengyu Zhang,
Fei Wu,
Hongxia Yang
Abstract:
Large Language Models (LLMs) and Multimodal Large Language Models (MLLMs) have made significant advancements in reasoning capabilities. However, they still face challenges such as high computational demands and privacy concerns. This paper focuses on developing efficient Small Language Models (SLMs) and Multimodal Small Language Models (MSLMs) that retain competitive reasoning abilities. We introd…
▽ More
Large Language Models (LLMs) and Multimodal Large Language Models (MLLMs) have made significant advancements in reasoning capabilities. However, they still face challenges such as high computational demands and privacy concerns. This paper focuses on developing efficient Small Language Models (SLMs) and Multimodal Small Language Models (MSLMs) that retain competitive reasoning abilities. We introduce a novel training pipeline that enhances reasoning capabilities and facilitates deployment on edge devices, achieving state-of-the-art performance while minimizing development costs. \InfR~ aims to advance AI systems by improving reasoning, reducing adoption barriers, and addressing privacy concerns through smaller model sizes. Resources are available at https://github. com/Reallm-Labs/InfiR.
△ Less
Submitted 17 February, 2025;
originally announced February 2025.
-
ResiComp: Loss-Resilient Image Compression via Dual-Functional Masked Visual Token Modeling
Authors:
Sixian Wang,
Jincheng Dai,
Xiaoqi Qin,
Ke Yang,
Kai Niu,
Ping Zhang
Abstract:
Recent advancements in neural image codecs (NICs) are of significant compression performance, but limited attention has been paid to their error resilience.
These resulting NICs tend to be sensitive to packet losses, which are prevalent in real-time communications.
In this paper, we investigate how to elevate the resilience ability of NICs to combat packet losses.
We propose ResiComp, a pion…
▽ More
Recent advancements in neural image codecs (NICs) are of significant compression performance, but limited attention has been paid to their error resilience.
These resulting NICs tend to be sensitive to packet losses, which are prevalent in real-time communications.
In this paper, we investigate how to elevate the resilience ability of NICs to combat packet losses.
We propose ResiComp, a pioneering neural image compression framework with feature-domain packet loss concealment (PLC).
Motivated by the inherent consistency between generation and compression, we advocate merging the tasks of entropy modeling and PLC into a unified framework focused on latent space context modeling.
To this end, we take inspiration from the impressive generative capabilities of large language models (LLMs), particularly the recent advances of masked visual token modeling (MVTM).
During training, we integrate MVTM to mirror the effects of packet loss, enabling a dual-functional Transformer to restore the masked latents by predicting their missing values and conditional probability mass functions.
Our ResiComp jointly optimizes compression efficiency and loss resilience.
Moreover, ResiComp provides flexible coding modes, allowing for explicitly adjusting the efficiency-resilience trade-off in response to varying Internet or wireless network conditions.
Extensive experiments demonstrate that ResiComp can significantly enhance the NIC's resilience against packet losses, while exhibits a worthy trade-off between compression efficiency and packet loss resilience.
△ Less
Submitted 28 February, 2025; v1 submitted 15 February, 2025;
originally announced February 2025.
-
Angular analysis of $B^0\rightarrow K^{*0}e^{+}e^{-}$ decays
Authors:
LHCb collaboration,
R. Aaij,
A. S. W. Abdelmotteleb,
C. Abellan Beteta,
F. Abudinén,
T. Ackernley,
A. A. Adefisoye,
B. Adeva,
M. Adinolfi,
P. Adlarson,
C. Agapopoulou,
C. A. Aidala,
Z. Ajaltouni,
S. Akar,
K. Akiba,
P. Albicocco,
J. Albrecht,
F. Alessio,
M. Alexander,
Z. Aliouche,
P. Alvarez Cartelle,
R. Amalric,
S. Amato,
J. L. Amey,
Y. Amhis
, et al. (1115 additional authors not shown)
Abstract:
An angular analysis of $B^0\rightarrow K^{*0}e^{+}e^{-}$ decays is presented using proton-proton collision data collected by the LHCb experiment at centre-of-mass energies of 7, 8 and 13 TeV, corresponding to an integrated luminosity of 9 fb$^{-1}$. The analysis is performed in the region of the dilepton invariant mass squared of 1.1-6.0 GeV$^{2}/c^{4}$. In addition, a test of lepton flavour unive…
▽ More
An angular analysis of $B^0\rightarrow K^{*0}e^{+}e^{-}$ decays is presented using proton-proton collision data collected by the LHCb experiment at centre-of-mass energies of 7, 8 and 13 TeV, corresponding to an integrated luminosity of 9 fb$^{-1}$. The analysis is performed in the region of the dilepton invariant mass squared of 1.1-6.0 GeV$^{2}/c^{4}$. In addition, a test of lepton flavour universality is performed by comparing the obtained angular observables with those measured in $B^0\rightarrow K^{*0}μ^{+}μ^{-}$ decays. In general, the angular observables are found to be consistent with the Standard Model expectations as well as with global analyses of other $b \rightarrow s \ell^{+} \ell^{-}$ processes, where $\ell$ is either a muon or an electron. No sign of lepton-flavour-violating effects is observed.
△ Less
Submitted 14 February, 2025;
originally announced February 2025.
-
Survey on Single-Image Reflection Removal using Deep Learning Techniques
Authors:
Kangning Yang,
Huiming Sun,
Jie Cai,
Lan Fu,
Jiaming Ding,
Jinlong Li,
Chiu Man Ho,
Zibo Meng
Abstract:
The phenomenon of reflection is quite common in digital images, posing significant challenges for various applications such as computer vision, photography, and image processing. Traditional methods for reflection removal often struggle to achieve clean results while maintaining high fidelity and robustness, particularly in real-world scenarios. Over the past few decades, numerous deep learning-ba…
▽ More
The phenomenon of reflection is quite common in digital images, posing significant challenges for various applications such as computer vision, photography, and image processing. Traditional methods for reflection removal often struggle to achieve clean results while maintaining high fidelity and robustness, particularly in real-world scenarios. Over the past few decades, numerous deep learning-based approaches for reflection removal have emerged, yielding impressive results. In this survey, we conduct a comprehensive review of the current literature by focusing on key venues such as ICCV, ECCV, CVPR, NeurIPS, etc., as these conferences and journals have been central to advances in the field. Our review follows a structured paper selection process, and we critically assess both single-stage and two-stage deep learning methods for reflection removal. The contribution of this survey is three-fold: first, we provide a comprehensive summary of the most recent work on single-image reflection removal; second, we outline task hypotheses, current deep learning techniques, publicly available datasets, and relevant evaluation metrics; and third, we identify key challenges and opportunities in deep learning-based reflection removal, highlighting the potential of this rapidly evolving research area.
△ Less
Submitted 12 February, 2025;
originally announced February 2025.
-
Spectral Journey: How Transformers Predict the Shortest Path
Authors:
Andrew Cohen,
Andrey Gromov,
Kaiyu Yang,
Yuandong Tian
Abstract:
Decoder-only transformers lead to a step-change in capability of large language models. However, opinions are mixed as to whether they are really planning or reasoning. A path to making progress in this direction is to study the model's behavior in a setting with carefully controlled data. Then interpret the learned representations and reverse-engineer the computation performed internally. We stud…
▽ More
Decoder-only transformers lead to a step-change in capability of large language models. However, opinions are mixed as to whether they are really planning or reasoning. A path to making progress in this direction is to study the model's behavior in a setting with carefully controlled data. Then interpret the learned representations and reverse-engineer the computation performed internally. We study decoder-only transformer language models trained from scratch to predict shortest paths on simple, connected and undirected graphs. In this setting, the representations and the dynamics learned by the model are interpretable. We present three major results: (1) Two-layer decoder-only language models can learn to predict shortest paths on simple, connected graphs containing up to 10 nodes. (2) Models learn a graph embedding that is correlated with the spectral decomposition of the line graph. (3) Following the insights, we discover a novel approximate path-finding algorithm Spectral Line Navigator (SLN) that finds shortest path by greedily selecting nodes in the space of spectral embedding of the line graph.
△ Less
Submitted 12 February, 2025;
originally announced February 2025.
-
ADMN: A Layer-Wise Adaptive Multimodal Network for Dynamic Input Noise and Compute Resources
Authors:
Jason Wu,
Kang Yang,
Lance Kaplan,
Mani Srivastava
Abstract:
Multimodal deep learning systems are deployed in dynamic scenarios due to the robustness afforded by multiple sensing modalities. Nevertheless, they struggle with varying compute resource availability (due to multi-tenancy, device heterogeneity, etc.) and fluctuating quality of inputs (from sensor feed corruption, environmental noise, etc.). Current multimodal systems employ static resource provis…
▽ More
Multimodal deep learning systems are deployed in dynamic scenarios due to the robustness afforded by multiple sensing modalities. Nevertheless, they struggle with varying compute resource availability (due to multi-tenancy, device heterogeneity, etc.) and fluctuating quality of inputs (from sensor feed corruption, environmental noise, etc.). Current multimodal systems employ static resource provisioning and cannot easily adapt when compute resources change over time. Additionally, their reliance on processing sensor data with fixed feature extractors is ill-equipped to handle variations in modality quality. Consequently, uninformative modalities, such as those with high noise, needlessly consume resources better allocated towards other modalities. We propose ADMN, a layer-wise Adaptive Depth Multimodal Network capable of tackling both challenges - it adjusts the total number of active layers across all modalities to meet compute resource constraints, and continually reallocates layers across input modalities according to their modality quality. Our evaluations showcase ADMN can match the accuracy of state-of-the-art networks while reducing up to 75% of their floating-point operations.
△ Less
Submitted 11 February, 2025;
originally announced February 2025.
-
Goedel-Prover: A Frontier Model for Open-Source Automated Theorem Proving
Authors:
Yong Lin,
Shange Tang,
Bohan Lyu,
Jiayun Wu,
Hongzhou Lin,
Kaiyu Yang,
Jia Li,
Mengzhou Xia,
Danqi Chen,
Sanjeev Arora,
Chi Jin
Abstract:
We introduce Goedel-Prover, an open-source language model that achieves state-of-the-art (as of April 5 2025) performance in automated formal proof generation for mathematical problems. A key challenge in this field is the scarcity of formalized mathematical statements and proofs, which we address through the following approaches. First, we train LLMs to convert natural language math problems from…
▽ More
We introduce Goedel-Prover, an open-source language model that achieves state-of-the-art (as of April 5 2025) performance in automated formal proof generation for mathematical problems. A key challenge in this field is the scarcity of formalized mathematical statements and proofs, which we address through the following approaches. First, we train LLMs to convert natural language math problems from the Numina dataset to equivalent formal statements in Lean 4. This process creates the dataset Goedel-Pset-v1, which includes 1.64 million formal statements. Next, we develop a large dataset of formal proofs by training a series of provers. Each new prover can prove many statements that previous ones could not, and these new proofs are added to the training set for the next prover. Finally, we obtain the dataset Goedel-Pset-v1-solved, which contains proofs for over 800K statements from Goedel-Pset-v1. Supervised fine-tuning (SFT) of DeepSeek-Prover-V1.5-Base on Goedel-Pset-v1-solved (i.e., no RL) yields a Goedel-Prover-SFT that achieves a success rate of 57.6% (Pass@32) on miniF2F, surpassing the previous leader DeepSeek-Prover-V1.5-RL (trained using SFT + RL on a proprietary dataset) by 7.6%. On PutnamBench, Goedel-Prover-SFT successfully solves 7 problems (Pass@512), ranking first on the leaderboard. We provide extensive discussion of our training methodology, highlighting the key design choices that contribute to Goedel-Prover's strong performance. Further RL training (including DPO) improves Goedel-Prover-SFT's success rate to over 60% (Pass@32) on miniF2F.
To aid future research, we provide extensive discussion of our training methodology and design choices. We also fully open-source our codes, models, and datasets. Additionally, we open-source formal proofs for 29.7K problems in Lean Workbook, nearly doubling the 15.7K solved by prior provers.
△ Less
Submitted 19 April, 2025; v1 submitted 11 February, 2025;
originally announced February 2025.
-
Learning to Synthesize Compatible Fashion Items Using Semantic Alignment and Collocation Classification: An Outfit Generation Framework
Authors:
Dongliang Zhou,
Haijun Zhang,
Kai Yang,
Linlin Liu,
Han Yan,
Xiaofei Xu,
Zhao Zhang,
Shuicheng Yan
Abstract:
The field of fashion compatibility learning has attracted great attention from both the academic and industrial communities in recent years. Many studies have been carried out for fashion compatibility prediction, collocated outfit recommendation, artificial intelligence (AI)-enabled compatible fashion design, and related topics. In particular, AI-enabled compatible fashion design can be used to s…
▽ More
The field of fashion compatibility learning has attracted great attention from both the academic and industrial communities in recent years. Many studies have been carried out for fashion compatibility prediction, collocated outfit recommendation, artificial intelligence (AI)-enabled compatible fashion design, and related topics. In particular, AI-enabled compatible fashion design can be used to synthesize compatible fashion items or outfits in order to improve the design experience for designers or the efficacy of recommendations for customers. However, previous generative models for collocated fashion synthesis have generally focused on the image-to-image translation between fashion items of upper and lower clothing. In this paper, we propose a novel outfit generation framework, i.e., OutfitGAN, with the aim of synthesizing a set of complementary items to compose an entire outfit, given one extant fashion item and reference masks of target synthesized items. OutfitGAN includes a semantic alignment module, which is responsible for characterizing the mapping correspondence between the existing fashion items and the synthesized ones, to improve the quality of the synthesized images, and a collocation classification module, which is used to improve the compatibility of a synthesized outfit. In order to evaluate the performance of our proposed models, we built a large-scale dataset consisting of 20,000 fashion outfits. Extensive experimental results on this dataset show that our OutfitGAN can synthesize photo-realistic outfits and outperform state-of-the-art methods in terms of similarity, authenticity and compatibility measurements.
△ Less
Submitted 5 February, 2025;
originally announced February 2025.
-
CT-UIO: Continuous-Time UWB-Inertial-Odometer Localization Using Non-Uniform B-spline with Fewer Anchors
Authors:
Jian Sun,
Wei Sun,
Genwei Zhang,
Kailun Yang,
Song Li,
Xiangqi Meng,
Na Deng,
Chongbin Tan
Abstract:
Ultra-wideband (UWB) based positioning with fewer anchors has attracted significant research interest in recent years, especially under energy-constrained conditions. However, most existing methods rely on discrete-time representations and smoothness priors to infer a robot's motion states, which often struggle with ensuring multi-sensor data synchronization. In this paper, we present an efficient…
▽ More
Ultra-wideband (UWB) based positioning with fewer anchors has attracted significant research interest in recent years, especially under energy-constrained conditions. However, most existing methods rely on discrete-time representations and smoothness priors to infer a robot's motion states, which often struggle with ensuring multi-sensor data synchronization. In this paper, we present an efficient UWB-Inertial-odometer localization system, utilizing a non-uniform B-spline framework with fewer anchors. Unlike traditional uniform B-spline-based continuous-time methods, we introduce an adaptive knot-span adjustment strategy for non-uniform continuous-time trajectory representation. This is accomplished by adjusting control points dynamically based on movement speed. To enable efficient fusion of IMU and odometer data, we propose an improved Extended Kalman Filter (EKF) with innovation-based adaptive estimation to provide short-term accurate motion prior. Furthermore, to address the challenge of achieving a fully observable UWB localization system under few-anchor conditions, the Virtual Anchor (VA) generation method based on multiple hypotheses is proposed. At the backend, we propose a CT-UIO factor graph with an adaptive sliding window for global trajectory estimation. Comprehensive experiments conducted on corridor and exhibition hall datasets validate the proposed system's high precision and robust performance. The codebase and datasets of this work will be open-sourced at https://github.com/JasonSun623/CT-UIO.
△ Less
Submitted 10 February, 2025;
originally announced February 2025.
-
GWRF: A Generalizable Wireless Radiance Field for Wireless Signal Propagation Modeling
Authors:
Kang Yang,
Yuning Chen,
Wan Du
Abstract:
We present Generalizable Wireless Radiance Fields (GWRF), a framework for modeling wireless signal propagation at arbitrary 3D transmitter and receiver positions. Unlike previous methods that adapt vanilla Neural Radiance Fields (NeRF) from the optical to the wireless signal domain, requiring extensive per-scene training, GWRF generalizes effectively across scenes. First, a geometry-aware Transfor…
▽ More
We present Generalizable Wireless Radiance Fields (GWRF), a framework for modeling wireless signal propagation at arbitrary 3D transmitter and receiver positions. Unlike previous methods that adapt vanilla Neural Radiance Fields (NeRF) from the optical to the wireless signal domain, requiring extensive per-scene training, GWRF generalizes effectively across scenes. First, a geometry-aware Transformer encoder-based wireless scene representation module incorporates information from geographically proximate transmitters to learn a generalizable wireless radiance field. Second, a neural-driven ray tracing algorithm operates on this field to automatically compute signal reception at the receiver. Experimental results demonstrate that GWRF outperforms existing methods on single scenes and achieves state-of-the-art performance on unseen scenes.
△ Less
Submitted 8 February, 2025;
originally announced February 2025.
-
Multi-Class Segmentation of Aortic Branches and Zones in Computed Tomography Angiography: The AortaSeg24 Challenge
Authors:
Muhammad Imran,
Jonathan R. Krebs,
Vishal Balaji Sivaraman,
Teng Zhang,
Amarjeet Kumar,
Walker R. Ueland,
Michael J. Fassler,
Jinlong Huang,
Xiao Sun,
Lisheng Wang,
Pengcheng Shi,
Maximilian Rokuss,
Michael Baumgartner,
Yannick Kirchhof,
Klaus H. Maier-Hein,
Fabian Isensee,
Shuolin Liu,
Bing Han,
Bong Thanh Nguyen,
Dong-jin Shin,
Park Ji-Woo,
Mathew Choi,
Kwang-Hyun Uhm,
Sung-Jea Ko,
Chanwoong Lee
, et al. (38 additional authors not shown)
Abstract:
Multi-class segmentation of the aorta in computed tomography angiography (CTA) scans is essential for diagnosing and planning complex endovascular treatments for patients with aortic dissections. However, existing methods reduce aortic segmentation to a binary problem, limiting their ability to measure diameters across different branches and zones. Furthermore, no open-source dataset is currently…
▽ More
Multi-class segmentation of the aorta in computed tomography angiography (CTA) scans is essential for diagnosing and planning complex endovascular treatments for patients with aortic dissections. However, existing methods reduce aortic segmentation to a binary problem, limiting their ability to measure diameters across different branches and zones. Furthermore, no open-source dataset is currently available to support the development of multi-class aortic segmentation methods. To address this gap, we organized the AortaSeg24 MICCAI Challenge, introducing the first dataset of 100 CTA volumes annotated for 23 clinically relevant aortic branches and zones. This dataset was designed to facilitate both model development and validation. The challenge attracted 121 teams worldwide, with participants leveraging state-of-the-art frameworks such as nnU-Net and exploring novel techniques, including cascaded models, data augmentation strategies, and custom loss functions. We evaluated the submitted algorithms using the Dice Similarity Coefficient (DSC) and Normalized Surface Distance (NSD), highlighting the approaches adopted by the top five performing teams. This paper presents the challenge design, dataset details, evaluation metrics, and an in-depth analysis of the top-performing algorithms. The annotated dataset, evaluation code, and implementations of the leading methods are publicly available to support further research. All resources can be accessed at https://aortaseg24.grand-challenge.org.
△ Less
Submitted 7 February, 2025;
originally announced February 2025.
-
Affine Frequency Division Multiplexing: Extending OFDM for Scenario-Flexibility and Resilience
Authors:
Haoran Yin,
Yanqun Tang,
Ali Bemani,
Marios Kountouris,
Yu Zhou,
Xingyao Zhang,
Yuqing Liu,
Gaojie Chen,
Kai Yang,
Fan Liu,
Christos Masouros,
Shuangyang Li,
Giuseppe Caire,
Pei Xiao
Abstract:
Next-generation wireless networks are conceived to provide reliable and high-data-rate communication services for diverse scenarios, such as vehicle-to-vehicle, unmanned aerial vehicles, and satellite networks. The severe Doppler spreads in the underlying time-varying channels induce destructive inter-carrier interference (ICI) in the extensively adopted orthogonal frequency division multiplexing…
▽ More
Next-generation wireless networks are conceived to provide reliable and high-data-rate communication services for diverse scenarios, such as vehicle-to-vehicle, unmanned aerial vehicles, and satellite networks. The severe Doppler spreads in the underlying time-varying channels induce destructive inter-carrier interference (ICI) in the extensively adopted orthogonal frequency division multiplexing (OFDM) waveform, leading to severe performance degradation. This calls for a new air interface design that can accommodate the severe delay-Doppler spreads in highly dynamic channels while possessing sufficient flexibility to cater to various applications. This article provides a comprehensive overview of a promising chirp-based waveform named affine frequency division multiplexing (AFDM). It is featured with two tunable parameters and achieves optimal diversity order in doubly dispersive channels (DDC). We study the fundamental principle of AFDM, illustrating its intrinsic suitability for DDC. Based on that, several potential applications of AFDM are explored. Furthermore, the major challenges and the corresponding solutions of AFDM are presented, followed by several future research directions. Finally, we draw some instructive conclusions about AFDM, hoping to provide useful inspiration for its development.
△ Less
Submitted 7 February, 2025;
originally announced February 2025.
-
Search for resonance-enhanced $CP$ and angular asymmetries in the $Λ^+_{c}\to pμ^+μ^-$ decay at LHCb
Authors:
LHCb collaboration,
R. Aaij,
A. S. W. Abdelmotteleb,
C. Abellan Beteta,
F. Abudinén,
T. Ackernley,
A. A. Adefisoye,
B. Adeva,
M. Adinolfi,
P. Adlarson,
C. Agapopoulou,
C. A. Aidala,
Z. Ajaltouni,
S. Akar,
K. Akiba,
P. Albicocco,
J. Albrecht,
F. Alessio,
M. Alexander,
Z. Aliouche,
P. Alvarez Cartelle,
R. Amalric,
S. Amato,
J. L. Amey,
Y. Amhis
, et al. (1127 additional authors not shown)
Abstract:
The first measurement of the $CP$ asymmetry of the decay rate ($A_{CP}$) and the $CP$ average ($ΣA_{\text{FB}}$) and $CP$ asymmetry ($ΔA_{\text{FB}}$) of the forward-backward asymmetry in the muon system of $\mathitΛ^+_c\to pμ^+μ^-$ decays is reported. The measurement is performed using a data sample of proton-proton collisions, recorded by the LHCb experiment from 2016 to 2018 at a center-of-mass…
▽ More
The first measurement of the $CP$ asymmetry of the decay rate ($A_{CP}$) and the $CP$ average ($ΣA_{\text{FB}}$) and $CP$ asymmetry ($ΔA_{\text{FB}}$) of the forward-backward asymmetry in the muon system of $\mathitΛ^+_c\to pμ^+μ^-$ decays is reported. The measurement is performed using a data sample of proton-proton collisions, recorded by the LHCb experiment from 2016 to 2018 at a center-of-mass energy of 13$\text{ TeV}$, which corresponds to an integrated luminosity of 5.4$\text{ fb}^{-1}$. The asymmetries are measured in two regions of dimuon mass near the $φ$-meson mass peak. The dimuon-mass integrated results are \begin{align*} A_{CP} &= (-1.1 \pm 4.0 \pm 0.5)\%,\\ ΣA_{\text{FB}} &= (\phantom{-}3.9 \pm 4.0 \pm 0.6)\%,\\ ΔA_{\text{FB}} &= (\phantom{-}3.1 \pm 4.0 \pm 0.4)\%, \end{align*} where the first uncertainty is statistical and the second systematic. The results are consistent with the conservation of $CP$ symmetry and the Standard Model expectations.
△ Less
Submitted 6 February, 2025;
originally announced February 2025.
-
ALMA observations of massive clouds in the central molecular zone: slim filaments tracing parsec-scale shocks
Authors:
Kai Yang,
Xing Lu,
Yichen Zhang,
Xunchuan Liu,
Adam Ginsburg,
Hauyu Baobab Liu,
Yu Cheng,
Siyi Feng,
Tie Liu,
Qizhou Zhang,
Elisabeth A. C. Mills,
Daniel L. Walker,
Shu-ichiro Inutsuka,
Cara Battersby,
Steven N. Longmore,
Xindi Tang,
Jens Kauffmann,
Qilao Gu,
Shanghuo Li,
Qiuyi Luo,
J. M. Diederik Kruijssen,
Thushara Pillai,
Hai-Hua Qiao,
Keping Qiu,
Zhiqiang Shen
Abstract:
The central molecular zone (CMZ) of our Galaxy exhibits widespread emission from SiO and various complex organic molecules (COMs), yet the exact origin of such emission is uncertain. Here we report the discovery of a unique class of long ($>$0.5 pc) and narrow ($<$0.03 pc) filaments in the emission of SiO 5$-$4 and eight additional molecular lines, including several COMs, in our ALMA 1.3 mm spectr…
▽ More
The central molecular zone (CMZ) of our Galaxy exhibits widespread emission from SiO and various complex organic molecules (COMs), yet the exact origin of such emission is uncertain. Here we report the discovery of a unique class of long ($>$0.5 pc) and narrow ($<$0.03 pc) filaments in the emission of SiO 5$-$4 and eight additional molecular lines, including several COMs, in our ALMA 1.3 mm spectral line observations toward two massive molecular clouds in the CMZ, which we name as slim filaments. However, these filaments are not detected in the 1.3 mm continuum at the 5$σ$ level. Their line-of-sight velocities are coherent and inconsistent with being outflows. The column densities and relative abundances of the detected molecules are statistically similar to those in protostellar outflows but different from those in dense cores within the same clouds. Turbulent pressure in these filaments dominates over self gravity and leads to hydrostatic inequilibrium, indicating that they are a different class of objects than the dense gas filaments in dynamical equilibrium ubiquitously found in nearby molecular clouds. We argue that these newly detected slim filaments are associated with parsec-scale shocks, likely arising from dynamic interactions between shock waves and molecular clouds. The dissipation of the slim filaments may replenish SiO and COMs in the interstellar medium and lead to their widespread emission in the CMZ.
△ Less
Submitted 6 February, 2025;
originally announced February 2025.
-
Event-aided Semantic Scene Completion
Authors:
Shangwei Guo,
Hao Shi,
Song Wang,
Xiaoting Yin,
Kailun Yang,
Kaiwei Wang
Abstract:
Autonomous driving systems rely on robust 3D scene understanding. Recent advances in Semantic Scene Completion (SSC) for autonomous driving underscore the limitations of RGB-based approaches, which struggle under motion blur, poor lighting, and adverse weather. Event cameras, offering high dynamic range and low latency, address these challenges by providing asynchronous data that complements RGB i…
▽ More
Autonomous driving systems rely on robust 3D scene understanding. Recent advances in Semantic Scene Completion (SSC) for autonomous driving underscore the limitations of RGB-based approaches, which struggle under motion blur, poor lighting, and adverse weather. Event cameras, offering high dynamic range and low latency, address these challenges by providing asynchronous data that complements RGB inputs. We present DSEC-SSC, the first real-world benchmark specifically designed for event-aided SSC, which includes a novel 4D labeling pipeline for generating dense, visibility-aware labels that adapt dynamically to object motion. Our proposed RGB-Event fusion framework, EvSSC, introduces an Event-aided Lifting Module (ELM) that effectively bridges 2D RGB-Event features to 3D space, enhancing view transformation and the robustness of 3D volume construction across SSC models. Extensive experiments on DSEC-SSC and simulated SemanticKITTI-E demonstrate that EvSSC is adaptable to both transformer-based and LSS-based SSC architectures. Notably, evaluations on SemanticKITTI-C demonstrate that EvSSC achieves consistently improved prediction accuracy across five degradation modes and both In-domain and Out-of-domain settings, achieving up to a 52.5% relative improvement in mIoU when the image sensor partially fails. Additionally, we quantitatively and qualitatively validate the superiority of EvSSC under motion blur and extreme weather conditions, where autonomous driving is challenged. The established datasets and our codebase will be made publicly at https://github.com/Pandapan01/EvSSC.
△ Less
Submitted 4 February, 2025;
originally announced February 2025.
-
Scalable 3D Gaussian Splatting-Based RF Signal Spatial Propagation Modeling
Authors:
Kang Yang,
Gaofeng Dong,
Sijie Ji,
Wan Du,
Mani Srivastava
Abstract:
Effective network planning and sensing in wireless networks require resource-intensive site surveys for data collection. An alternative is Radio-Frequency (RF) signal spatial propagation modeling, which computes received signals given transceiver positions in a scene (e.g.s a conference room). We identify a fundamental trade-off between scalability and fidelity in the state-of-the-art method. To a…
▽ More
Effective network planning and sensing in wireless networks require resource-intensive site surveys for data collection. An alternative is Radio-Frequency (RF) signal spatial propagation modeling, which computes received signals given transceiver positions in a scene (e.g.s a conference room). We identify a fundamental trade-off between scalability and fidelity in the state-of-the-art method. To address this issue, we explore leveraging 3D Gaussian Splatting (3DGS), an advanced technique for the image synthesis of 3D scenes in real-time from arbitrary camera poses. By integrating domain-specific insights, we design three components for adapting 3DGS to the RF domain, including Gaussian-based RF scene representation, gradient-guided RF attribute learning, and RF-customized CUDA for ray tracing. Building on them, we develop RFSPM, an end-to-end framework for scalable RF signal Spatial Propagation Modeling. We evaluate RFSPM in four field studies and two applications across RFID, BLE, LoRa, and 5G, covering diverse frequencies, antennas, signals, and scenes. The results show that RFSPM matches the fidelity of the state-of-the-art method while reducing data requirements, training GPU-hours, and inference latency by up to 9.8\,$\times$, 18.6\,$\times$, and 84.4\,$\times$, respectively.
△ Less
Submitted 3 February, 2025;
originally announced February 2025.
-
BiMaCoSR: Binary One-Step Diffusion Model Leveraging Flexible Matrix Compression for Real Super-Resolution
Authors:
Kai Liu,
Kaicheng Yang,
Zheng Chen,
Zhiteng Li,
Yong Guo,
Wenbo Li,
Linghe Kong,
Yulun Zhang
Abstract:
While super-resolution (SR) methods based on diffusion models (DM) have demonstrated inspiring performance, their deployment is impeded due to the heavy request of memory and computation. Recent researchers apply two kinds of methods to compress or fasten the DM. One is to compress the DM into 1-bit, aka binarization, alleviating the storage and computation pressure. The other distills the multi-s…
▽ More
While super-resolution (SR) methods based on diffusion models (DM) have demonstrated inspiring performance, their deployment is impeded due to the heavy request of memory and computation. Recent researchers apply two kinds of methods to compress or fasten the DM. One is to compress the DM into 1-bit, aka binarization, alleviating the storage and computation pressure. The other distills the multi-step DM into only one step, significantly speeding up inference process. Nonetheless, it remains impossible to deploy DM to resource-limited edge devices. To address this problem, we propose BiMaCoSR, which combines binarization and one-step distillation to obtain extreme compression and acceleration. To prevent the catastrophic collapse of the model caused by binarization, we proposed sparse matrix branch (SMB) and low rank matrix branch (LRMB). Both auxiliary branches pass the full-precision (FP) information but in different ways. SMB absorbs the extreme values and its output is high rank, carrying abundant FP information. Whereas, the design of LRMB is inspired by LoRA and is initialized with the top r SVD components, outputting low rank representation. The computation and storage overhead of our proposed branches can be safely ignored. Comprehensive comparison experiments are conducted to exhibit BiMaCoSR outperforms current state-of-the-art binarization methods and gains competitive performance compared with FP one-step model. BiMaCoSR achieves a 23.8x compression ratio and a 27.4x speedup ratio compared to FP counterpart. Our code and model are available at https://github.com/Kai-Liu001/BiMaCoSR.
△ Less
Submitted 3 February, 2025; v1 submitted 1 February, 2025;
originally announced February 2025.
-
Colorful Helly via induced matchings
Authors:
Cosmin Pohoata,
Kevin Yang,
Shengtong Zhang
Abstract:
We establish a theorem regarding the maximum size of an {\it{induced}} matching in the bipartite complement of the incidence graph of a set system $(X,\mathcal{F})$. We show that this quantity plus one provides an upper bound on the colorful Helly number of this set system, i.e. the minimum positive integer $N$ for which the following statement holds: if finite subfamilies…
▽ More
We establish a theorem regarding the maximum size of an {\it{induced}} matching in the bipartite complement of the incidence graph of a set system $(X,\mathcal{F})$. We show that this quantity plus one provides an upper bound on the colorful Helly number of this set system, i.e. the minimum positive integer $N$ for which the following statement holds: if finite subfamilies $\mathcal{F}_1,\ldots, \mathcal{F}_{N} \subset \mathcal{F}$ are such that $\cap_{F \in \mathcal{F}_{i}} F = 0$ for every $i=1,\ldots,N$, then there exists $F_i \in \mathcal{F}_i$ such that $F_1 \cap \ldots \cap F_{N} = \emptyset$. We will also discuss some natural refinements of this result and applications.
△ Less
Submitted 29 January, 2025; v1 submitted 28 January, 2025;
originally announced January 2025.
-
RG-Attn: Radian Glue Attention for Multi-modality Multi-agent Cooperative Perception
Authors:
Lantao Li,
Kang Yang,
Wenqi Zhang,
Xiaoxue Wang,
Chen Sun
Abstract:
Cooperative perception offers an optimal solution to overcome the perception limitations of single-agent systems by leveraging Vehicle-to-Everything (V2X) communication for data sharing and fusion across multiple agents. However, most existing approaches focus on single-modality data exchange, limiting the potential of both homogeneous and heterogeneous fusion across agents. This overlooks the opp…
▽ More
Cooperative perception offers an optimal solution to overcome the perception limitations of single-agent systems by leveraging Vehicle-to-Everything (V2X) communication for data sharing and fusion across multiple agents. However, most existing approaches focus on single-modality data exchange, limiting the potential of both homogeneous and heterogeneous fusion across agents. This overlooks the opportunity to utilize multi-modality data per agent, restricting the system's performance. In the automotive industry, manufacturers adopt diverse sensor configurations, resulting in heterogeneous combinations of sensor modalities across agents. To harness the potential of every possible data source for optimal performance, we design a robust LiDAR and camera cross-modality fusion module, Radian-Glue-Attention (RG-Attn), applicable to both intra-agent cross-modality fusion and inter-agent cross-modality fusion scenarios, owing to the convenient coordinate conversion by transformation matrix and the unified sampling/inversion mechanism. We also propose two different architectures, named Paint-To-Puzzle (PTP) and Co-Sketching-Co-Coloring (CoS-CoCo), for conducting cooperative perception. PTP aims for maximum precision performance and achieves smaller data packet size by limiting cross-agent fusion to a single instance, but requiring all participants to be equipped with LiDAR. In contrast, CoS-CoCo supports agents with any configuration-LiDAR-only, camera-only, or LiDAR-camera-both, presenting more generalization ability. Our approach achieves state-of-the-art (SOTA) performance on both real and simulated cooperative perception datasets. The code is now available at GitHub.
△ Less
Submitted 31 March, 2025; v1 submitted 28 January, 2025;
originally announced January 2025.
-
Qwen2.5-1M Technical Report
Authors:
An Yang,
Bowen Yu,
Chengyuan Li,
Dayiheng Liu,
Fei Huang,
Haoyan Huang,
Jiandong Jiang,
Jianhong Tu,
Jianwei Zhang,
Jingren Zhou,
Junyang Lin,
Kai Dang,
Kexin Yang,
Le Yu,
Mei Li,
Minmin Sun,
Qin Zhu,
Rui Men,
Tao He,
Weijia Xu,
Wenbiao Yin,
Wenyuan Yu,
Xiafei Qiu,
Xingzhang Ren,
Xinlong Yang
, et al. (3 additional authors not shown)
Abstract:
We introduce Qwen2.5-1M, a series of models that extend the context length to 1 million tokens. Compared to the previous 128K version, the Qwen2.5-1M series have significantly enhanced long-context capabilities through long-context pre-training and post-training. Key techniques such as long data synthesis, progressive pre-training, and multi-stage supervised fine-tuning are employed to effectively…
▽ More
We introduce Qwen2.5-1M, a series of models that extend the context length to 1 million tokens. Compared to the previous 128K version, the Qwen2.5-1M series have significantly enhanced long-context capabilities through long-context pre-training and post-training. Key techniques such as long data synthesis, progressive pre-training, and multi-stage supervised fine-tuning are employed to effectively enhance long-context performance while reducing training costs.
To promote the use of long-context models among a broader user base, we present and open-source our inference framework. This framework includes a length extrapolation method that can expand the model context lengths by at least four times, or even more, without additional training. To reduce inference costs, we implement a sparse attention method along with chunked prefill optimization for deployment scenarios and a sparsity refinement method to improve precision. Additionally, we detail our optimizations in the inference engine, including kernel optimization, pipeline parallelism, and scheduling optimization, which significantly enhance overall inference performance. By leveraging our inference framework, the Qwen2.5-1M models achieve a remarkable 3x to 7x prefill speedup in scenarios with 1 million tokens of context. This framework provides an efficient and powerful solution for developing applications that require long-context processing using open-source models.
The Qwen2.5-1M series currently includes the open-source models Qwen2.5-7B-Instruct-1M and Qwen2.5-14B-Instruct-1M, as well as the API-accessed model Qwen2.5-Turbo. Evaluations show that Qwen2.5-1M models have been greatly improved in long-context tasks without compromising performance in short-context scenarios. Specifically, the Qwen2.5-14B-Instruct-1M model significantly outperforms GPT-4o-mini in long-context tasks and supports contexts eight times longer.
△ Less
Submitted 25 January, 2025;
originally announced January 2025.
-
Evidence for $B^-\rightarrow D^{**0}τ^-\overline{ν_τ}$ decays
Authors:
LHCb collaboration,
R. Aaij,
A. S. W. Abdelmotteleb,
C. Abellan Beteta,
F. Abudinén,
T. Ackernley,
A. A. Adefisoye,
B. Adeva,
M. Adinolfi,
P. Adlarson,
C. Agapopoulou,
C. A. Aidala,
Z. Ajaltouni,
S. Akar,
K. Akiba,
P. Albicocco,
J. Albrecht,
F. Alessio,
M. Alexander,
Z. Aliouche,
P. Alvarez Cartelle,
R. Amalric,
S. Amato,
J. L. Amey,
Y. Amhis
, et al. (1127 additional authors not shown)
Abstract:
The first evidence for the decay $B^-\rightarrow D^{**0}τ^-\overline{ν_τ}$ is obtained using proton-proton collision data collected by the LHCb experiment, corresponding to an integrated luminosity of 9 fb$^{-1}$ , at centre-of-mass energies of 7, 8 and 13 Tev. Here, the $D^{**0}$ meson represents any of the three excited charm mesons $D_{1}(2420)^{0}$, $D_{2}^{*}(2460)^{0}$, and…
▽ More
The first evidence for the decay $B^-\rightarrow D^{**0}τ^-\overline{ν_τ}$ is obtained using proton-proton collision data collected by the LHCb experiment, corresponding to an integrated luminosity of 9 fb$^{-1}$ , at centre-of-mass energies of 7, 8 and 13 Tev. Here, the $D^{**0}$ meson represents any of the three excited charm mesons $D_{1}(2420)^{0}$, $D_{2}^{*}(2460)^{0}$, and $D_{1}^{'}(2400)^{0}$. The $B^-\rightarrow D^{**0}τ^-\overline{ν_τ}$ signal is measured with a significance of 3.5 $σ$, including systematic uncertainties. The combined branching fraction $BR(B^-\rightarrow D^{**0}_{1,2}τ^-\overline{ν_τ})\times BR(D^{**0}_{1,2}\rightarrow D^{*+}π^-)$, where $D^{**0}_{1,2}$ denotes both $D_{1}(2420)^{0}$ and $D_{2}^{*}(2460)^{0}$ contributions, is measured to be $(0.051\pm0.013(stat)\pm 0.006(syst)\pm 0.009(\rm{ext}) )\%$, where the last uncertainty reflects that of the branching fraction of the normalisation channel $B^-\rightarrow D^{**0}_{1,2}D_s^{(*)-}$. The ratio between the tauonic and muonic semileptonic $B$ decays, with the latter taken from world average values, is also determined and found to be ${\cal R}(D^{**0}_{1,2})=0.13\pm0.03(stat)\pm0.01(syst)\pm0.02\,(\rm{ext})$.
△ Less
Submitted 21 March, 2025; v1 submitted 24 January, 2025;
originally announced January 2025.
-
Sigma: Differential Rescaling of Query, Key and Value for Efficient Language Models
Authors:
Zhenghao Lin,
Zihao Tang,
Xiao Liu,
Yeyun Gong,
Yi Cheng,
Qi Chen,
Hang Li,
Ying Xin,
Ziyue Yang,
Kailai Yang,
Yu Yan,
Xiao Liang,
Shuai Lu,
Yiming Huang,
Zheheng Luo,
Lei Qu,
Xuan Feng,
Yaoxiang Wang,
Yuqing Xia,
Feiyang Chen,
Yuting Jiang,
Yasen Hu,
Hao Ni,
Binyang Li,
Guoshuai Zhao
, et al. (9 additional authors not shown)
Abstract:
We introduce Sigma, an efficient large language model specialized for the system domain, empowered by a novel architecture including DiffQKV attention, and pre-trained on our meticulously collected system domain data. DiffQKV attention significantly enhances the inference efficiency of Sigma by optimizing the Query (Q), Key (K), and Value (V) components in the attention mechanism differentially, b…
▽ More
We introduce Sigma, an efficient large language model specialized for the system domain, empowered by a novel architecture including DiffQKV attention, and pre-trained on our meticulously collected system domain data. DiffQKV attention significantly enhances the inference efficiency of Sigma by optimizing the Query (Q), Key (K), and Value (V) components in the attention mechanism differentially, based on their varying impacts on the model performance and efficiency indicators. Specifically, we (1) conduct extensive experiments that demonstrate the model's varying sensitivity to the compression of K and V components, leading to the development of differentially compressed KV, and (2) propose augmented Q to expand the Q head dimension, which enhances the model's representation capacity with minimal impacts on the inference speed. Rigorous theoretical and empirical analyses reveal that DiffQKV attention significantly enhances efficiency, achieving up to a 33.36% improvement in inference speed over the conventional grouped-query attention (GQA) in long-context scenarios. We pre-train Sigma on 6T tokens from various sources, including 19.5B system domain data that we carefully collect and 1T tokens of synthesized and rewritten data. In general domains, Sigma achieves comparable performance to other state-of-arts models. In the system domain, we introduce the first comprehensive benchmark AIMicius, where Sigma demonstrates remarkable performance across all tasks, significantly outperforming GPT-4 with an absolute improvement up to 52.5%.
△ Less
Submitted 10 February, 2025; v1 submitted 23 January, 2025;
originally announced January 2025.
-
Observation of the $Λ_b^0 \to J/ψΞ^- K^+$ and $Ξ_b^0 \to J/ψΞ^- π^+$ decays
Authors:
LHCb collaboration,
R. Aaij,
A. S. W. Abdelmotteleb,
C. Abellan Beteta,
F. Abudinén,
T. Ackernley,
A. A. Adefisoye,
B. Adeva,
M. Adinolfi,
P. Adlarson,
C. Agapopoulou,
C. A. Aidala,
Z. Ajaltouni,
S. Akar,
K. Akiba,
P. Albicocco,
J. Albrecht,
F. Alessio,
M. Alexander,
Z. Aliouche,
P. Alvarez Cartelle,
R. Amalric,
S. Amato,
J. L. Amey,
Y. Amhis
, et al. (1126 additional authors not shown)
Abstract:
The first observation of the $Ξ_b^0 \to J/ψΞ^- π^+$ decay and the most precise measurement of the branching fraction of the $Λ_b^0 \to J/ψΞ^- K^+$ decay are reported, using proton-proton collision data from the LHCb experiment collected in 2016--2018 at a centre-of-mass energy of 13~TeV, corresponding to an integrated luminosity of 5.4~fb$^{-1}$. Using the $Λ_b^0 \to J/ψΛ$ and $Ξ_b^0 \to J/ψΞ^-$ d…
▽ More
The first observation of the $Ξ_b^0 \to J/ψΞ^- π^+$ decay and the most precise measurement of the branching fraction of the $Λ_b^0 \to J/ψΞ^- K^+$ decay are reported, using proton-proton collision data from the LHCb experiment collected in 2016--2018 at a centre-of-mass energy of 13~TeV, corresponding to an integrated luminosity of 5.4~fb$^{-1}$. Using the $Λ_b^0 \to J/ψΛ$ and $Ξ_b^0 \to J/ψΞ^-$ decays as normalisation channels, the ratios of branching fractions are measured to be: \[ \frac{\mathcal{B}(Λ_b^0 \to J/ψΞ^- K^+)}{\mathcal{B}(Λ_b^0 \to J/ψΛ)} = (1.17 \pm 0.14 \pm 0.08)\times 10^{-2} \, , \] \[ \frac{\mathcal{B}(Ξ_b^0 \to J/ψΞ^- π^+)}{\mathcal{B}(Ξ_b^0 \to J/ψΞ^-)} = (11.9 \pm 1.4 \pm 0.6)\times 10^{-2}\, , \] where the first uncertainty is statistical and the second systematic.
△ Less
Submitted 22 January, 2025;
originally announced January 2025.
-
Measurement of the multiplicity dependence of $\mitΥ$ production ratios in $pp$ collisions at $\sqrt{s}=13$ TeV
Authors:
LHCb collaboration,
R. Aaij,
A. S. W. Abdelmotteleb,
C. Abellan Beteta,
F. Abudinén,
T. Ackernley,
A. A. Adefisoye,
B. Adeva,
M. Adinolfi,
P. Adlarson,
C. Agapopoulou,
C. A. Aidala,
Z. Ajaltouni,
S. Akar,
K. Akiba,
P. Albicocco,
J. Albrecht,
F. Alessio,
M. Alexander,
Z. Aliouche,
P. Alvarez Cartelle,
R. Amalric,
S. Amato,
J. L. Amey,
Y. Amhis
, et al. (1127 additional authors not shown)
Abstract:
The $\mitΥ(\mathrm{2}S)$ and $\mitΥ(\mathrm{3}S)$ production cross-sections are measured relative to that of the $\mitΥ(\mathrm{1}S)$ meson, as a function of charged-particle multiplicity in proton-proton collisions at a centre-of-mass energy of $13$ TeV. The measurement uses data collected by the LHCb experiment in 2018 corresponding to an integrated luminosity of 2 $\text{fb}^{-1}$. Both the…
▽ More
The $\mitΥ(\mathrm{2}S)$ and $\mitΥ(\mathrm{3}S)$ production cross-sections are measured relative to that of the $\mitΥ(\mathrm{1}S)$ meson, as a function of charged-particle multiplicity in proton-proton collisions at a centre-of-mass energy of $13$ TeV. The measurement uses data collected by the LHCb experiment in 2018 corresponding to an integrated luminosity of 2 $\text{fb}^{-1}$. Both the $\mitΥ(\mathrm{2}S)$-to-$\mitΥ(\mathrm{1}S)$ and $\mitΥ(\mathrm{3}S)$-to-$\mitΥ(\mathrm{1}S)$ cross-section ratios are found to decrease significantly as a function of event multiplicity, with the $\mitΥ(\mathrm{3}S)$-to-$\mitΥ(\mathrm{1}S)$ ratio showing a steeper decline towards high multiplicity. This hierarchy is qualitatively consistent with the comover model predictions, indicating that final-state interactions play an important role in bottomonia production in high-multiplicity events.
△ Less
Submitted 23 January, 2025; v1 submitted 21 January, 2025;
originally announced January 2025.
-
Search for charge-parity violation in semileptonically tagged $D^{0} \to K^{+} π^{-}$ decays
Authors:
LHCb collaboration,
R. Aaij,
A. S. W. Abdelmotteleb,
C. Abellan Beteta,
F. Abudinén,
T. Ackernley,
A. A. Adefisoye,
B. Adeva,
M. Adinolfi,
P. Adlarson,
C. Agapopoulou,
C. A. Aidala,
Z. Ajaltouni,
S. Akar,
K. Akiba,
P. Albicocco,
J. Albrecht,
F. Alessio,
M. Alexander,
Z. Aliouche,
P. Alvarez Cartelle,
R. Amalric,
S. Amato,
J. L. Amey,
Y. Amhis
, et al. (1127 additional authors not shown)
Abstract:
An analysis of the flavour oscillations of the charmed neutral meson is presented. The ratio of $D^{0} \to K^{+} π^{-}$ and $D^{0} \to K^{-} π^{+}$ decay rates is measured as a function of the decay time of the $D^{0}$ meson and compared with the charge-conjugated system to search for charge-parity violation. The meson flavour at production is double-tagged by the charges of the muon and pion in t…
▽ More
An analysis of the flavour oscillations of the charmed neutral meson is presented. The ratio of $D^{0} \to K^{+} π^{-}$ and $D^{0} \to K^{-} π^{+}$ decay rates is measured as a function of the decay time of the $D^{0}$ meson and compared with the charge-conjugated system to search for charge-parity violation. The meson flavour at production is double-tagged by the charges of the muon and pion in the preceding $\overline{B} \to D^{*}(2010)^{+} μ^{-} X$ and ${{D^{*}(2010)^{+}} \to D^{0}π^{+}}$ decays, respectively. These decays are selected from proton-proton collision data collected by the LHCb experiment at a centre-of-mass energy of ${13\,\text{TeV}}$ and corresponding to an integrated luminosity of ${5.4\,\text{fb}^{-1}}$. The flavour oscillation parameters, relating to the differences in mass and width of the mass eigenstates, are found to be ${y^\prime=(5.8\pm1.6)\times10^{-3}}$ and ${(x^\prime)^2=(0.0\pm1.2)\times10^{-4}}$. No evidence for charge-parity violation is seen either in the flavour oscillations or in the decay, where the direct charge-parity asymmetry is measured to be ${A_{D}=(2.3\pm1.7)\,{\%}}$.
△ Less
Submitted 20 January, 2025;
originally announced January 2025.
-
Design-Agnostic Distributed Timing Fault Injection Monitor With End-to-End Design Automation
Authors:
Yan He,
Yumin Su,
Kaiyuan Yang
Abstract:
Fault injection attacks induce hardware failures in circuits and exploit these faults to compromise the security of the system. It has been demonstrated that FIAs can bypass system security mechanisms, cause faulty outputs, and gain access to secret information. Certain types of FIAs can be mounted with little effort by tampering with clock signals and or the chip operating conditions. To mitigate…
▽ More
Fault injection attacks induce hardware failures in circuits and exploit these faults to compromise the security of the system. It has been demonstrated that FIAs can bypass system security mechanisms, cause faulty outputs, and gain access to secret information. Certain types of FIAs can be mounted with little effort by tampering with clock signals and or the chip operating conditions. To mitigate such low cost, yet powerful attacks, we propose a fully synthesizable and distributable in situ fault injection monitor that employs a delay locked loop to track the pulsewidth of the clock. We further develop a fully automated design framework to optimize and implement the FIA monitors at any process node. Our design is fabricated and verified in 65 nm CMOS technology with a small footprint of 1500 um2. It can lock to clock frequencies from 2 MHz to 1.26 GHz while detecting all 12 types of possible clock glitches, as well as timing FIA injections via the supply voltage, electromagnetic signals, and chip temperature.
△ Less
Submitted 16 January, 2025;
originally announced January 2025.
-
Boosting Supermassive Black Hole Growth in the Early Universe by Fuzzy Dark Matter Solitons
Authors:
H. -H. Sandy Chiu,
Hsi-Yu Schive,
Hsiang-Yi Karen Yang,
Hsinhao Huang,
Massimo Gaspari
Abstract:
Observations of massive supermassive black holes (SMBHs) in the early universe challenge existing black hole formation models. We propose that soliton cores in fuzzy dark matter (FDM) offer a potential solution to this timing problem. Our FDM cosmological zoom-in simulations confirm that for a particle mass $m_{\rm FDM}\sim 10^{-22}~{\rm eV}$, solitons are well developed at redshift $z \sim 7$ wit…
▽ More
Observations of massive supermassive black holes (SMBHs) in the early universe challenge existing black hole formation models. We propose that soliton cores in fuzzy dark matter (FDM) offer a potential solution to this timing problem. Our FDM cosmological zoom-in simulations confirm that for a particle mass $m_{\rm FDM}\sim 10^{-22}~{\rm eV}$, solitons are well developed at redshift $z \sim 7$ with masses of $\sim10^9~M_\odot$, comparable to the observed SMBHs. We then demonstrate using hydrodynamic simulations that, compared to cold dark matter, these high-$z$ massive FDM solitons with mass $M_s$ can provide additional gravitational potential to accrete gas and boost the Bondi accretion rate of a growing black hole seed with mass $M_{\rm BH}$ by up to two to four orders of magnitude, in the regime of efficient cooling and negligible radiation pressure. This accretion boosting mechanism is effective for $10^{-22}~{\rm eV} \lesssim m_{\rm FDM} \lesssim 10^{-20}~{\rm eV}$ and potentially beyond as long as $M_s > M_{\rm BH}$. The simulation code GAMER is accessible at https://github.com/gamer-project/gamer.
△ Less
Submitted 15 January, 2025;
originally announced January 2025.
-
DomainDemo: a dataset of domain-sharing activities among different demographic groups on Twitter
Authors:
Kai-Cheng Yang,
Pranav Goel,
Alexi Quintana-Mathé,
Luke Horgan,
Stefan D. McCabe,
Nir Grinberg,
Kenneth Joseph,
David Lazer
Abstract:
Social media play a pivotal role in disseminating web content, particularly during elections, yet our understanding of the association between demographic factors and political discourse online remains limited. Here, we introduce a unique dataset, DomainDemo, linking domains shared on Twitter (X) with the demographic characteristics of associated users, including age, gender, race, political affil…
▽ More
Social media play a pivotal role in disseminating web content, particularly during elections, yet our understanding of the association between demographic factors and political discourse online remains limited. Here, we introduce a unique dataset, DomainDemo, linking domains shared on Twitter (X) with the demographic characteristics of associated users, including age, gender, race, political affiliation, and geolocation, from 2011 to 2022. This new resource was derived from a panel of over 1.5 million Twitter users matched against their U.S. voter registration records, facilitating a better understanding of a decade of information flows on one of the most prominent social media platforms and trends in political and public discourse among registered U.S. voters from different sociodemographic groups. By aggregating user demographic information onto the domains, we derive five metrics that provide critical insights into over 129,000 websites. In particular, the localness and partisan audience metrics quantify the domains' geographical reach and ideological orientation, respectively. These metrics show substantial agreement with existing classifications, suggesting the effectiveness and reliability of DomainDemo's approach.
△ Less
Submitted 14 January, 2025;
originally announced January 2025.
-
Fermion liquids as quantum Hall liquids in phase space: A unified approach for anomalies and responses
Authors:
Jaychandran Padayasi,
Ken K. W. Ma,
Kun Yang
Abstract:
The discovery of many strongly correlated metallic phases has inspired different routes to generalize or go beyond the celebrated Landau Fermi liquid theory. To this end, from universal consideration of symmetries and anomalies, Else, Thorngren and Senthil (ETS) have introduced a class of theories called ersatz Fermi liquids which possess a Fermi surface and satisfy a generalized Luttinger's theor…
▽ More
The discovery of many strongly correlated metallic phases has inspired different routes to generalize or go beyond the celebrated Landau Fermi liquid theory. To this end, from universal consideration of symmetries and anomalies, Else, Thorngren and Senthil (ETS) have introduced a class of theories called ersatz Fermi liquids which possess a Fermi surface and satisfy a generalized Luttinger's theorem. In this work, we view all such fermion liquids obeying the Luttinger theorem as incompressible quantum Hall liquids in higher-dimensional phase space and use it as the starting point to derive their effective low-energy field theory. The noncommutativity of phase space motivates us to use the Seiberg-Witten map to derive the field theory in an ordinary (commutative) space and naturally leads to terms that correspond to the correct topological Chern-Simons action postulated by ETS in one, two, and three dimensions. Additionally, our approach also reproduces all the non-topological terms that characterize important contributions to the response, including the semiclassical equations of motion. Finally, our derivations of Chern-Simons terms from the Seiberg-Witten map also verify a longstanding conjecture in noncommutative field theory.
△ Less
Submitted 11 April, 2025; v1 submitted 14 January, 2025;
originally announced January 2025.
-
Average Reward Reinforcement Learning for Wireless Radio Resource Management
Authors:
Kun Yang,
Jing Yang,
Cong Shen
Abstract:
In this paper, we address a crucial but often overlooked issue in applying reinforcement learning (RL) to radio resource management (RRM) in wireless communications: the mismatch between the discounted reward RL formulation and the undiscounted goal of wireless network optimization. To the best of our knowledge, we are the first to systematically investigate this discrepancy, starting with a discu…
▽ More
In this paper, we address a crucial but often overlooked issue in applying reinforcement learning (RL) to radio resource management (RRM) in wireless communications: the mismatch between the discounted reward RL formulation and the undiscounted goal of wireless network optimization. To the best of our knowledge, we are the first to systematically investigate this discrepancy, starting with a discussion of the problem formulation followed by simulations that quantify the extent of the gap. To bridge this gap, we introduce the use of average reward RL, a method that aligns more closely with the long-term objectives of RRM. We propose a new method called the Average Reward Off policy Soft Actor Critic (ARO SAC) is an adaptation of the well known Soft Actor Critic algorithm in the average reward framework. This new method achieves significant performance improvement our simulation results demonstrate a 15% gain in the system performance over the traditional discounted reward RL approach, underscoring the potential of average reward RL in enhancing the efficiency and effectiveness of wireless network optimization.
△ Less
Submitted 11 January, 2025;
originally announced January 2025.
-
Study of light-meson resonances decaying to $K^0_{\rm S} K π$ in the $B \to (K^0_{\rm S} K π) K$ channels
Authors:
LHCb collaboration,
R. Aaij,
A. S. W. Abdelmotteleb,
C. Abellan Beteta,
F. Abudinén,
T. Ackernley,
A. A. Adefisoye,
B. Adeva,
M. Adinolfi,
P. Adlarson,
C. Agapopoulou,
C. A. Aidala,
Z. Ajaltouni,
S. Akar,
K. Akiba,
P. Albicocco,
J. Albrecht,
F. Alessio,
M. Alexander,
Z. Aliouche,
P. Alvarez Cartelle,
R. Amalric,
S. Amato,
J. L. Amey,
Y. Amhis
, et al. (1127 additional authors not shown)
Abstract:
A study is presented of $B^+ \to K^0_{\rm S} K^- π^+ K^-$ and $B^+ \to K^0_{\rm S} K^+ π^- K^+$ decays based on the analysis of proton-proton collision data collected with the LHCb detector at centre-of-mass energies of 7, 8 and 13 TeV, corresponding to an integrated luminosity of $9 fb^{-1}$. The $K^0_{\rm S} K π$ invariant-mass distributions of both $B^+$ decay modes show, in the…
▽ More
A study is presented of $B^+ \to K^0_{\rm S} K^- π^+ K^-$ and $B^+ \to K^0_{\rm S} K^+ π^- K^+$ decays based on the analysis of proton-proton collision data collected with the LHCb detector at centre-of-mass energies of 7, 8 and 13 TeV, corresponding to an integrated luminosity of $9 fb^{-1}$. The $K^0_{\rm S} K π$ invariant-mass distributions of both $B^+$ decay modes show, in the $m(K^0_{\rm S} K π)<1.85$ GeV mass region, a rich spectrum of light-meson resonances, resolved using an amplitude analysis. A complex mixture of $J^{PC}=0^{-+}, 1^{++}$ and $1^{+-}$ resonances is observed, dominated by $η(1405)$, $η(1470)$, $η(1760)$, $f_1(1285)$, $f_1(1420)$ and $h_1(1405)$ resonances. The $K^0_{\rm S} K π$ Dalitz plots are dominated by asymmetric crossing $K^* \bar K$ bands which are different for the two $B^+$ decay modes. This is due to a different interference pattern between the $1^{++}$ and $1^{+-}$ amplitudes in the two channels. Branching fractions are measured for each resonant contribution.
△ Less
Submitted 11 January, 2025;
originally announced January 2025.
-
SELMA3D challenge: Self-supervised learning for 3D light-sheet microscopy image segmentation
Authors:
Ying Chen,
Rami Al-Maskari,
Izabela Horvath,
Mayar Ali,
Luciano Hoher,
Kaiyuan Yang,
Zengming Lin,
Zhiwei Zhai,
Mengzhe Shen,
Dejin Xun,
Yi Wang,
Tony Xu,
Maged Goubran,
Yunheng Wu,
Kensaku Mori,
Johannes C. Paetzold,
Ali Erturk
Abstract:
Recent innovations in light sheet microscopy, paired with developments in tissue clearing techniques, enable the 3D imaging of large mammalian tissues with cellular resolution. Combined with the progress in large-scale data analysis, driven by deep learning, these innovations empower researchers to rapidly investigate the morphological and functional properties of diverse biological samples. Segme…
▽ More
Recent innovations in light sheet microscopy, paired with developments in tissue clearing techniques, enable the 3D imaging of large mammalian tissues with cellular resolution. Combined with the progress in large-scale data analysis, driven by deep learning, these innovations empower researchers to rapidly investigate the morphological and functional properties of diverse biological samples. Segmentation, a crucial preliminary step in the analysis process, can be automated using domain-specific deep learning models with expert-level performance. However, these models exhibit high sensitivity to domain shifts, leading to a significant drop in accuracy when applied to data outside their training distribution. To address this limitation, and inspired by the recent success of self-supervised learning in training generalizable models, we organized the SELMA3D Challenge during the MICCAI 2024 conference. SELMA3D provides a vast collection of light-sheet images from cleared mice and human brains, comprising 35 large 3D images-each with over 1000^3 voxels-and 315 annotated small patches for finetuning, preliminary testing and final testing. The dataset encompasses diverse biological structures, including vessel-like and spot-like structures. Five teams participated in all phases of the challenge, and their proposed methods are reviewed in this paper. Quantitative and qualitative results from most participating teams demonstrate that self-supervised learning on large datasets improves segmentation model performance and generalization. We will continue to support and extend SELMA3D as an inaugural MICCAI challenge focused on self-supervised learning for 3D microscopy image segmentation.
△ Less
Submitted 12 January, 2025; v1 submitted 7 January, 2025;
originally announced January 2025.
-
Search for continuous gravitational waves from known pulsars in the first part of the fourth LIGO-Virgo-KAGRA observing run
Authors:
The LIGO Scientific Collaboration,
the Virgo Collaboration,
the KAGRA Collaboration,
A. G. Abac,
R. Abbott,
I. Abouelfettouh,
F. Acernese,
K. Ackley,
S. Adhicary,
N. Adhikari,
R. X. Adhikari,
V. K. Adkins,
D. Agarwal,
M. Agathos,
M. Aghaei Abchouyeh,
O. D. Aguiar,
I. Aguilar,
L. Aiello,
A. Ain,
P. Ajith,
T. Akutsu,
S. Albanesi,
R. A. Alfaidi,
A. Al-Jodah,
C. Alléné
, et al. (1794 additional authors not shown)
Abstract:
Continuous gravitational waves (CWs) emission from neutron stars carries information about their internal structure and equation of state, and it can provide tests of General Relativity. We present a search for CWs from a set of 45 known pulsars in the first part of the fourth LIGO--Virgo--KAGRA observing run, known as O4a. We conducted a targeted search for each pulsar using three independent ana…
▽ More
Continuous gravitational waves (CWs) emission from neutron stars carries information about their internal structure and equation of state, and it can provide tests of General Relativity. We present a search for CWs from a set of 45 known pulsars in the first part of the fourth LIGO--Virgo--KAGRA observing run, known as O4a. We conducted a targeted search for each pulsar using three independent analysis methods considering the single-harmonic and the dual-harmonic emission models. We find no evidence of a CW signal in O4a data for both models and set upper limits on the signal amplitude and on the ellipticity, which quantifies the asymmetry in the neutron star mass distribution. For the single-harmonic emission model, 29 targets have the upper limit on the amplitude below the theoretical spin-down limit. The lowest upper limit on the amplitude is $6.4\!\times\!10^{-27}$ for the young energetic pulsar J0537-6910, while the lowest constraint on the ellipticity is $8.8\!\times\!10^{-9}$ for the bright nearby millisecond pulsar J0437-4715. Additionally, for a subset of 16 targets we performed a narrowband search that is more robust regarding the emission model, with no evidence of a signal. We also found no evidence of non-standard polarizations as predicted by the Brans-Dicke theory.
△ Less
Submitted 2 January, 2025;
originally announced January 2025.
-
Is Segment Anything Model 2 All You Need for Surgery Video Segmentation? A Systematic Evaluation
Authors:
Cheng Yuan,
Jian Jiang,
Kunyi Yang,
Lv Wu,
Rui Wang,
Zi Meng,
Haonan Ping,
Ziyu Xu,
Yifan Zhou,
Wanli Song,
Hesheng Wang,
Qi Dou,
Yutong Ban
Abstract:
Surgery video segmentation is an important topic in the surgical AI field. It allows the AI model to understand the spatial information of a surgical scene. Meanwhile, due to the lack of annotated surgical data, surgery segmentation models suffer from limited performance. With the emergence of SAM2 model, a large foundation model for video segmentation trained on natural videos, zero-shot surgical…
▽ More
Surgery video segmentation is an important topic in the surgical AI field. It allows the AI model to understand the spatial information of a surgical scene. Meanwhile, due to the lack of annotated surgical data, surgery segmentation models suffer from limited performance. With the emergence of SAM2 model, a large foundation model for video segmentation trained on natural videos, zero-shot surgical video segmentation became more realistic but meanwhile remains to be explored. In this paper, we systematically evaluate the performance of SAM2 model in zero-shot surgery video segmentation task. We conducted experiments under different configurations, including different prompting strategies, robustness, etc. Moreover, we conducted an empirical evaluation over the performance, including 9 datasets with 17 different types of surgeries.
△ Less
Submitted 31 December, 2024;
originally announced January 2025.
-
TinyHelen's First Curriculum: Training and Evaluating Tiny Language Models in a Simpler Language Environment
Authors:
Ke Yang,
Volodymyr Kindratenko,
ChengXiang Zhai
Abstract:
Training language models (LMs) and their application agents is increasingly costly due to large datasets and models, making test failures difficult to bear. Simplified language environments serve as primordial training and testing grounds, retaining essential commonsense and communication skills but in a more digestible form, potentially enhancing the learning efficiency of LMs, and thus reducing…
▽ More
Training language models (LMs) and their application agents is increasingly costly due to large datasets and models, making test failures difficult to bear. Simplified language environments serve as primordial training and testing grounds, retaining essential commonsense and communication skills but in a more digestible form, potentially enhancing the learning efficiency of LMs, and thus reducing the required model size and data volume for effective training and evaluation. In these simplified language environments, workable strategies for small models, datasets, and agents may be adaptable to larger models, datasets, and agents in complex language environments.
To create such environments, we focus on two aspects: i) minimizing language dataset noise and complexity, and ii) preserving the essential text distribution characteristics. Unlike previous methods, we propose a pipeline to refine text data by eliminating noise, minimizing vocabulary, and maintaining genre-specific patterns (e.g., for books, conversation, code, etc.). Implementing this pipeline with large LMs, we have created a leaner suite of LM training and evaluation datasets: 71M Leaner-Pretrain, 7M Leaner-Instruct, Leaner-Glue for assessing linguistic proficiency, and Leaner-Eval for testing instruction-following ability.
Our experiments show that leaner pre-training boosts LM learning efficiency. Tiny LMs trained on these datasets outperform those trained on original datasets in instruction-following across different language granularity levels. Moreover, the Leaner-Pretrain dataset's alignment with conventional large LM training sets enables resource-optimized analysis of how learning objectives, model architectures, and training techniques impact performance on language modeling and downstream tasks. Our code and datasets are available at https://github.com/EmpathYang/TinyHelen.git.
△ Less
Submitted 31 December, 2024;
originally announced January 2025.
-
Explainable Semantic Federated Learning Enabled Industrial Edge Network for Fire Surveillance
Authors:
Li Dong,
Yubo Peng,
Feibo Jiang,
Kezhi Wang,
Kun Yang
Abstract:
In fire surveillance, Industrial Internet of Things (IIoT) devices require transmitting large monitoring data frequently, which leads to huge consumption of spectrum resources. Hence, we propose an Industrial Edge Semantic Network (IESN) to allow IIoT devices to send warnings through Semantic communication (SC). Thus, we should consider (1) Data privacy and security. (2) SC model adaptation for he…
▽ More
In fire surveillance, Industrial Internet of Things (IIoT) devices require transmitting large monitoring data frequently, which leads to huge consumption of spectrum resources. Hence, we propose an Industrial Edge Semantic Network (IESN) to allow IIoT devices to send warnings through Semantic communication (SC). Thus, we should consider (1) Data privacy and security. (2) SC model adaptation for heterogeneous devices. (3) Explainability of semantics. Therefore, first, we present an eXplainable Semantic Federated Learning (XSFL) to train the SC model, thus ensuring data privacy and security. Then, we present an Adaptive Client Training (ACT) strategy to provide a specific SC model for each device according to its Fisher information matrix, thus overcoming the heterogeneity. Next, an Explainable SC (ESC) mechanism is designed, which introduces a leakyReLU-based activation mapping to explain the relationship between the extracted semantics and monitoring data. Finally, simulation results demonstrate the effectiveness of XSFL.
△ Less
Submitted 27 December, 2024;
originally announced December 2024.
-
CoheDancers: Enhancing Interactive Group Dance Generation through Music-Driven Coherence Decomposition
Authors:
Kaixing Yang,
Xulong Tang,
Haoyu Wu,
Qinliang Xue,
Biao Qin,
Hongyan Liu,
Zhaoxin Fan
Abstract:
Dance generation is crucial and challenging, particularly in domains like dance performance and virtual gaming. In the current body of literature, most methodologies focus on Solo Music2Dance. While there are efforts directed towards Group Music2Dance, these often suffer from a lack of coherence, resulting in aesthetically poor dance performances. Thus, we introduce CoheDancers, a novel framework…
▽ More
Dance generation is crucial and challenging, particularly in domains like dance performance and virtual gaming. In the current body of literature, most methodologies focus on Solo Music2Dance. While there are efforts directed towards Group Music2Dance, these often suffer from a lack of coherence, resulting in aesthetically poor dance performances. Thus, we introduce CoheDancers, a novel framework for Music-Driven Interactive Group Dance Generation. CoheDancers aims to enhance group dance generation coherence by decomposing it into three key aspects: synchronization, naturalness, and fluidity. Correspondingly, we develop a Cycle Consistency based Dance Synchronization strategy to foster music-dance correspondences, an Auto-Regressive-based Exposure Bias Correction strategy to enhance the fluidity of the generated dances, and an Adversarial Training Strategy to augment the naturalness of the group dance output. Collectively, these strategies enable CohdeDancers to produce highly coherent group dances with superior quality. Furthermore, to establish better benchmarks for Group Music2Dance, we construct the most diverse and comprehensive open-source dataset to date, I-Dancers, featuring rich dancer interactions, and create comprehensive evaluation metrics. Experimental evaluations on I-Dancers and other extant datasets substantiate that CoheDancers achieves unprecedented state-of-the-art performance. Code will be released.
△ Less
Submitted 26 December, 2024;
originally announced December 2024.
-
Mitigating Label Noise using Prompt-Based Hyperbolic Meta-Learning in Open-Set Domain Generalization
Authors:
Kunyu Peng,
Di Wen,
Sarfraz M. Saquib,
Yufan Chen,
Junwei Zheng,
David Schneider,
Kailun Yang,
Jiamin Wu,
Alina Roitberg,
Rainer Stiefelhagen
Abstract:
Open-Set Domain Generalization (OSDG) is a challenging task requiring models to accurately predict familiar categories while minimizing confidence for unknown categories to effectively reject them in unseen domains. While the OSDG field has seen considerable advancements, the impact of label noise--a common issue in real-world datasets--has been largely overlooked. Label noise can mislead model op…
▽ More
Open-Set Domain Generalization (OSDG) is a challenging task requiring models to accurately predict familiar categories while minimizing confidence for unknown categories to effectively reject them in unseen domains. While the OSDG field has seen considerable advancements, the impact of label noise--a common issue in real-world datasets--has been largely overlooked. Label noise can mislead model optimization, thereby exacerbating the challenges of open-set recognition in novel domains. In this study, we take the first step towards addressing Open-Set Domain Generalization under Noisy Labels (OSDG-NL) by constructing dedicated benchmarks derived from widely used OSDG datasets, including PACS and DigitsDG. We evaluate baseline approaches by integrating techniques from both label denoising and OSDG methodologies, highlighting the limitations of existing strategies in handling label noise effectively. To address these limitations, we propose HyProMeta, a novel framework that integrates hyperbolic category prototypes for label noise-aware meta-learning alongside a learnable new-category agnostic prompt designed to enhance generalization to unseen classes. Our extensive experiments demonstrate the superior performance of HyProMeta compared to state-of-the-art methods across the newly established benchmarks. The source code of this work is released at https://github.com/KPeng9510/HyProMeta.
△ Less
Submitted 24 December, 2024;
originally announced December 2024.
-
Observation of Thouless pumping of light in quasiperiodic photonic crystals
Authors:
Kai Yang,
Qidong Fu,
Henrique C. Prates,
Peng Wang,
Yaroslav V. Kartashov,
Vladimir V. Konotop,
Fangwei Ye
Abstract:
Topological transport is determined by global properties of physical media where it occurs and is characterized by quantized amounts of adiabatically transported quantities. Discovered for periodic potentials it was also explored in disordered and discrete quasi-periodic systems. Here we report on experimental observation of pumping of a light beam in a genuinely continuous incommensurate photoref…
▽ More
Topological transport is determined by global properties of physical media where it occurs and is characterized by quantized amounts of adiabatically transported quantities. Discovered for periodic potentials it was also explored in disordered and discrete quasi-periodic systems. Here we report on experimental observation of pumping of a light beam in a genuinely continuous incommensurate photorefractive quasi-crystal emulated by its periodic approximants. We observe a universal character of the transport which is determined by the ratio between periods of the constitutive sublattices, by the sliding angle between them, and by Chern numbers of the excited bands (in the time-coordinate space) of the approximant, for which pumping is adiabatic. This reveals that the properties of quasi-periodic systems determining the topological transport are tightly related to those of their periodic approximants and can be observed and studied in a large variety of physical systems. Our results suggest that the links between quasi periodic systems and their periodic approximants go beyond the pure mathematical relations: they manifest themselves in physical phenomena which can be explored experimentally.
△ Less
Submitted 24 December, 2024;
originally announced December 2024.
-
Ask-Before-Detection: Identifying and Mitigating Conformity Bias in LLM-Powered Error Detector for Math Word Problem Solutions
Authors:
Hang Li,
Tianlong Xu,
Kaiqi Yang,
Yucheng Chu,
Yanling Chen,
Yichi Song,
Qingsong Wen,
Hui Liu
Abstract:
The rise of large language models (LLMs) offers new opportunities for automatic error detection in education, particularly for math word problems (MWPs). While prior studies demonstrate the promise of LLMs as error detectors, they overlook the presence of multiple valid solutions for a single MWP. Our preliminary analysis reveals a significant performance gap between conventional and alternative s…
▽ More
The rise of large language models (LLMs) offers new opportunities for automatic error detection in education, particularly for math word problems (MWPs). While prior studies demonstrate the promise of LLMs as error detectors, they overlook the presence of multiple valid solutions for a single MWP. Our preliminary analysis reveals a significant performance gap between conventional and alternative solutions in MWPs, a phenomenon we term conformity bias in this work. To mitigate this bias, we introduce the Ask-Before-Detect (AskBD) framework, which generates adaptive reference solutions using LLMs to enhance error detection. Experiments on 200 examples of GSM8K show that AskBD effectively mitigates bias and improves performance, especially when combined with reasoning-enhancing techniques like chain-of-thought prompting.
△ Less
Submitted 21 December, 2024;
originally announced December 2024.
-
Beyond Partisan Leaning: A Comparative Analysis of Political Bias in Large Language Models
Authors:
Tai-Quan Peng,
Kaiqi Yang,
Sanguk Lee,
Hang Li,
Yucheng Chu,
Yuping Lin,
Hui Liu
Abstract:
As large language models (LLMs) become increasingly embedded in civic, educational, and political information environments, concerns about their potential political bias have grown. Prior research often evaluates such bias through simulated personas or predefined ideological typologies, which may introduce artificial framing effects or overlook how models behave in general use scenarios. This stud…
▽ More
As large language models (LLMs) become increasingly embedded in civic, educational, and political information environments, concerns about their potential political bias have grown. Prior research often evaluates such bias through simulated personas or predefined ideological typologies, which may introduce artificial framing effects or overlook how models behave in general use scenarios. This study adopts a persona-free, topic-specific approach to evaluate political behavior in LLMs, reflecting how users typically interact with these systems-without ideological role-play or conditioning. We introduce a two-dimensional framework: one axis captures partisan orientation on highly polarized topics (e.g., abortion, immigration), and the other assesses sociopolitical engagement on less polarized issues (e.g., climate change, foreign policy). Using survey-style prompts drawn from the ANES and Pew Research Center, we analyze responses from 43 LLMs developed in the U.S., Europe, China, and the Middle East. We propose an entropy-weighted bias score to quantify both the direction and consistency of partisan alignment, and identify four behavioral clusters through engagement profiles. Findings show most models lean center-left or left ideologically and vary in their nonpartisan engagement patterns. Model scale and openness are not strong predictors of behavior, suggesting that alignment strategy and institutional context play a more decisive role in shaping political expression.
△ Less
Submitted 10 May, 2025; v1 submitted 21 December, 2024;
originally announced December 2024.
-
Formal Mathematical Reasoning: A New Frontier in AI
Authors:
Kaiyu Yang,
Gabriel Poesia,
Jingxuan He,
Wenda Li,
Kristin Lauter,
Swarat Chaudhuri,
Dawn Song
Abstract:
AI for Mathematics (AI4Math) is not only intriguing intellectually but also crucial for AI-driven discovery in science, engineering, and beyond. Extensive efforts on AI4Math have mirrored techniques in NLP, in particular, training large language models on carefully curated math datasets in text form. As a complementary yet less explored avenue, formal mathematical reasoning is grounded in formal s…
▽ More
AI for Mathematics (AI4Math) is not only intriguing intellectually but also crucial for AI-driven discovery in science, engineering, and beyond. Extensive efforts on AI4Math have mirrored techniques in NLP, in particular, training large language models on carefully curated math datasets in text form. As a complementary yet less explored avenue, formal mathematical reasoning is grounded in formal systems such as proof assistants, which can verify the correctness of reasoning and provide automatic feedback. In this position paper, we advocate for formal mathematical reasoning and argue that it is indispensable for advancing AI4Math to the next level. In recent years, we have seen steady progress in using AI to perform formal reasoning, including core tasks such as theorem proving and autoformalization, as well as emerging applications such as verifiable generation of code and hardware designs. However, significant challenges remain to be solved for AI to truly master mathematics and achieve broader impact. We summarize existing progress, discuss open challenges, and envision critical milestones to measure future success. At this inflection point for formal mathematical reasoning, we call on the research community to come together to drive transformative advancements in this field.
△ Less
Submitted 20 December, 2024;
originally announced December 2024.