-
CoRT: Code-integrated Reasoning within Thinking
Authors:
Chengpeng Li,
Zhengyang Tang,
Ziniu Li,
Mingfeng Xue,
Keqin Bao,
Tian Ding,
Ruoyu Sun,
Benyou Wang,
Xiang Wang,
Junyang Lin,
Dayiheng Liu
Abstract:
Large Reasoning Models (LRMs) like o1 and DeepSeek-R1 have shown remarkable progress in natural language reasoning with long chain-of-thought (CoT), yet they remain inefficient or inaccurate when handling complex mathematical operations. Addressing these limitations through computational tools (e.g., computation libraries and symbolic solvers) is promising, but it introduces a technical challenge:…
▽ More
Large Reasoning Models (LRMs) like o1 and DeepSeek-R1 have shown remarkable progress in natural language reasoning with long chain-of-thought (CoT), yet they remain inefficient or inaccurate when handling complex mathematical operations. Addressing these limitations through computational tools (e.g., computation libraries and symbolic solvers) is promising, but it introduces a technical challenge: Code Interpreter (CI) brings external knowledge beyond the model's internal text representations, thus the direct combination is not efficient. This paper introduces CoRT, a post-training framework for teaching LRMs to leverage CI effectively and efficiently. As a first step, we address the data scarcity issue by synthesizing code-integrated reasoning data through Hint-Engineering, which strategically inserts different hints at appropriate positions to optimize LRM-CI interaction. We manually create 30 high-quality samples, upon which we post-train models ranging from 1.5B to 32B parameters, with supervised fine-tuning, rejection fine-tuning and reinforcement learning. Our experimental results demonstrate that Hint-Engineering models achieve 4\% and 8\% absolute improvements on DeepSeek-R1-Distill-Qwen-32B and DeepSeek-R1-Distill-Qwen-1.5B respectively, across five challenging mathematical reasoning datasets. Furthermore, Hint-Engineering models use about 30\% fewer tokens for the 32B model and 50\% fewer tokens for the 1.5B model compared with the natural language models. The models and code are available at https://github.com/ChengpengLi1003/CoRT.
△ Less
Submitted 12 June, 2025; v1 submitted 11 June, 2025;
originally announced June 2025.
-
LG-ANNA-Embedding technical report
Authors:
Jooyoung Choi,
Hyun Kim,
Hansol Jang,
Changwook Jun,
Kyunghoon Bae,
Hyewon Choi,
Stanley Jungkyu Choi,
Honglak Lee,
Chulmin Yun
Abstract:
This report presents a unified instruction-based framework for learning generalized text embeddings optimized for both information retrieval (IR) and non-IR tasks. Built upon a decoder-only large language model (Mistral-7B), our approach combines in-context learning, soft supervision, and adaptive hard-negative mining to generate context-aware embeddings without task-specific fine-tuning. Structur…
▽ More
This report presents a unified instruction-based framework for learning generalized text embeddings optimized for both information retrieval (IR) and non-IR tasks. Built upon a decoder-only large language model (Mistral-7B), our approach combines in-context learning, soft supervision, and adaptive hard-negative mining to generate context-aware embeddings without task-specific fine-tuning. Structured instructions and few-shot examples are used to guide the model across diverse tasks, enabling strong performance on classification, semantic similarity, clustering, and reranking benchmarks. To improve semantic discrimination, we employ a soft labeling framework where continuous relevance scores, distilled from a high-performance dense retriever and reranker, serve as fine-grained supervision signals. In addition, we introduce adaptive margin-based hard-negative mining, which filters out semantically ambiguous negatives based on their similarity to positive examples, thereby enhancing training stability and retrieval robustness. Our model is evaluated on the newly introduced MTEB (English, v2) benchmark, covering 41 tasks across seven categories. Results show that our method achieves strong generalization and ranks among the top-performing models by Borda score, outperforming several larger or fully fine-tuned baselines. These findings highlight the effectiveness of combining in-context prompting, soft supervision, and adaptive sampling for scalable, high-quality embedding generation.
△ Less
Submitted 9 June, 2025;
originally announced June 2025.
-
MiMo-VL Technical Report
Authors:
Xiaomi LLM-Core Team,
:,
Zihao Yue,
Zhenru Lin,
Yifan Song,
Weikun Wang,
Shuhuai Ren,
Shuhao Gu,
Shicheng Li,
Peidian Li,
Liang Zhao,
Lei Li,
Kainan Bao,
Hao Tian,
Hailin Zhang,
Gang Wang,
Dawei Zhu,
Cici,
Chenhong He,
Bowen Ye,
Bowen Shen,
Zihan Zhang,
Zihan Jiang,
Zhixian Zheng,
Zhichao Song
, et al. (50 additional authors not shown)
Abstract:
We open-source MiMo-VL-7B-SFT and MiMo-VL-7B-RL, two powerful vision-language models delivering state-of-the-art performance in both general visual understanding and multimodal reasoning. MiMo-VL-7B-RL outperforms Qwen2.5-VL-7B on 35 out of 40 evaluated tasks, and scores 59.4 on OlympiadBench, surpassing models with up to 78B parameters. For GUI grounding applications, it sets a new standard with…
▽ More
We open-source MiMo-VL-7B-SFT and MiMo-VL-7B-RL, two powerful vision-language models delivering state-of-the-art performance in both general visual understanding and multimodal reasoning. MiMo-VL-7B-RL outperforms Qwen2.5-VL-7B on 35 out of 40 evaluated tasks, and scores 59.4 on OlympiadBench, surpassing models with up to 78B parameters. For GUI grounding applications, it sets a new standard with 56.1 on OSWorld-G, even outperforming specialized models such as UI-TARS. Our training combines four-stage pre-training (2.4 trillion tokens) with Mixed On-policy Reinforcement Learning (MORL) integrating diverse reward signals. We identify the importance of incorporating high-quality reasoning data with long Chain-of-Thought into pre-training stages, and the benefits of mixed RL despite challenges in simultaneous multi-domain optimization. We also contribute a comprehensive evaluation suite covering 50+ tasks to promote reproducibility and advance the field. The model checkpoints and full evaluation suite are available at https://github.com/XiaomiMiMo/MiMo-VL.
△ Less
Submitted 4 June, 2025;
originally announced June 2025.
-
K-order Ranking Preference Optimization for Large Language Models
Authors:
Shihao Cai,
Chongming Gao,
Yang Zhang,
Wentao Shi,
Jizhi Zhang,
Keqin Bao,
Qifan Wang,
Fuli Feng
Abstract:
To adapt large language models (LLMs) to ranking tasks, existing list-wise methods, represented by list-wise Direct Preference Optimization (DPO), focus on optimizing partial-order or full-order list ranking consistency for LLMs to enhance their ranking abilities. However, we argue that optimizing top-K ranking consistency could be more appropriate for real-world applications. There are two main r…
▽ More
To adapt large language models (LLMs) to ranking tasks, existing list-wise methods, represented by list-wise Direct Preference Optimization (DPO), focus on optimizing partial-order or full-order list ranking consistency for LLMs to enhance their ranking abilities. However, we argue that optimizing top-K ranking consistency could be more appropriate for real-world applications. There are two main reasons: (1) users are typically concerned with only the top-K results, making top-K ranking more important, and (2) tail items often lack precise feedback, making top-K ranking more reliable. Based on this, we propose K-order Ranking Preference Optimization (KPO) by extending the DPO's Plackett-Luce model to accommodate top-K rankings. Additionally, recognizing that the number of important items can vary across queries, we extend KPO to dynamically determine appropriate K for different samples and introduce a curriculum learning strategy to boost training efficiency. Extensive experiments demonstrate the effectiveness of KPO, highlighting its high sample efficiency and robustness to noise. The code is available at https://github.com/Lanyu0303/KPO.
△ Less
Submitted 31 May, 2025;
originally announced June 2025.
-
SafeDPO: A Simple Approach to Direct Preference Optimization with Enhanced Safety
Authors:
Geon-Hyeong Kim,
Youngsoo Jang,
Yu Jin Kim,
Byoungjip Kim,
Honglak Lee,
Kyunghoon Bae,
Moontae Lee
Abstract:
As Large Language Models (LLMs) continue to advance and find applications across a growing number of fields, ensuring the safety of LLMs has become increasingly critical. To address safety concerns, recent studies have proposed integrating safety constraints into Reinforcement Learning from Human Feedback (RLHF). However, these approaches tend to be complex, as they encompass complicated procedure…
▽ More
As Large Language Models (LLMs) continue to advance and find applications across a growing number of fields, ensuring the safety of LLMs has become increasingly critical. To address safety concerns, recent studies have proposed integrating safety constraints into Reinforcement Learning from Human Feedback (RLHF). However, these approaches tend to be complex, as they encompass complicated procedures in RLHF along with additional steps required by the safety constraints. Inspired by Direct Preference Optimization (DPO), we introduce a new algorithm called SafeDPO, which is designed to directly optimize the safety alignment objective in a single stage of policy learning, without requiring relaxation. SafeDPO introduces only one additional hyperparameter to further enhance safety and requires only minor modifications to standard DPO. As a result, it eliminates the need to fit separate reward and cost models or to sample from the language model during fine-tuning, while still enhancing the safety of LLMs. Finally, we demonstrate that SafeDPO achieves competitive performance compared to state-of-the-art safety alignment algorithms, both in terms of aligning with human preferences and improving safety.
△ Less
Submitted 26 May, 2025;
originally announced May 2025.
-
MTR-Bench: A Comprehensive Benchmark for Multi-Turn Reasoning Evaluation
Authors:
Xiaoyuan Li,
Keqin Bao,
Yubo Ma,
Moxin Li,
Wenjie Wang,
Rui Men,
Yichang Zhang,
Fuli Feng,
Dayiheng Liu,
Junyang Lin
Abstract:
Recent advances in Large Language Models (LLMs) have shown promising results in complex reasoning tasks. However, current evaluations predominantly focus on single-turn reasoning scenarios, leaving interactive tasks largely unexplored. We attribute it to the absence of comprehensive datasets and scalable automatic evaluation protocols. To fill these gaps, we present MTR-Bench for LLMs' Multi-Turn…
▽ More
Recent advances in Large Language Models (LLMs) have shown promising results in complex reasoning tasks. However, current evaluations predominantly focus on single-turn reasoning scenarios, leaving interactive tasks largely unexplored. We attribute it to the absence of comprehensive datasets and scalable automatic evaluation protocols. To fill these gaps, we present MTR-Bench for LLMs' Multi-Turn Reasoning evaluation. Comprising 4 classes, 40 tasks, and 3600 instances, MTR-Bench covers diverse reasoning capabilities, fine-grained difficulty granularity, and necessitates multi-turn interactions with the environments. Moreover, MTR-Bench features fully-automated framework spanning both dataset constructions and model evaluations, which enables scalable assessment without human interventions. Extensive experiments reveal that even the cutting-edge reasoning models fall short of multi-turn, interactive reasoning tasks. And the further analysis upon these results brings valuable insights for future research in interactive AI systems.
△ Less
Submitted 25 May, 2025; v1 submitted 21 May, 2025;
originally announced May 2025.
-
Scalable Video-to-Dataset Generation for Cross-Platform Mobile Agents
Authors:
Yunseok Jang,
Yeda Song,
Sungryull Sohn,
Lajanugen Logeswaran,
Tiange Luo,
Dong-Ki Kim,
Kyunghoon Bae,
Honglak Lee
Abstract:
Recent advancements in Large Language Models (LLMs) and Vision-Language Models (VLMs) have sparked significant interest in developing GUI visual agents. We introduce MONDAY (Mobile OS Navigation Task Dataset for Agents from YouTube), a large-scale dataset of 313K annotated frames from 20K instructional videos capturing diverse real-world mobile OS navigation across multiple platforms. Models that…
▽ More
Recent advancements in Large Language Models (LLMs) and Vision-Language Models (VLMs) have sparked significant interest in developing GUI visual agents. We introduce MONDAY (Mobile OS Navigation Task Dataset for Agents from YouTube), a large-scale dataset of 313K annotated frames from 20K instructional videos capturing diverse real-world mobile OS navigation across multiple platforms. Models that include MONDAY in their pre-training phases demonstrate robust cross-platform generalization capabilities, consistently outperforming models trained on existing single OS datasets while achieving an average performance gain of 18.11%p on an unseen mobile OS platform. To enable continuous dataset expansion as mobile platforms evolve, we present an automated framework that leverages publicly available video content to create comprehensive task datasets without manual annotation. Our framework comprises robust OCR-based scene detection (95.04% F1score), near-perfect UI element detection (99.87% hit ratio), and novel multi-step action identification to extract reliable action sequences across diverse interface configurations. We contribute both the MONDAY dataset and our automated collection framework to facilitate future research in mobile OS navigation.
△ Less
Submitted 18 May, 2025;
originally announced May 2025.
-
Qwen3 Technical Report
Authors:
An Yang,
Anfeng Li,
Baosong Yang,
Beichen Zhang,
Binyuan Hui,
Bo Zheng,
Bowen Yu,
Chang Gao,
Chengen Huang,
Chenxu Lv,
Chujie Zheng,
Dayiheng Liu,
Fan Zhou,
Fei Huang,
Feng Hu,
Hao Ge,
Haoran Wei,
Huan Lin,
Jialong Tang,
Jian Yang,
Jianhong Tu,
Jianwei Zhang,
Jianxin Yang,
Jiaxi Yang,
Jing Zhou
, et al. (35 additional authors not shown)
Abstract:
In this work, we present Qwen3, the latest version of the Qwen model family. Qwen3 comprises a series of large language models (LLMs) designed to advance performance, efficiency, and multilingual capabilities. The Qwen3 series includes models of both dense and Mixture-of-Expert (MoE) architectures, with parameter scales ranging from 0.6 to 235 billion. A key innovation in Qwen3 is the integration…
▽ More
In this work, we present Qwen3, the latest version of the Qwen model family. Qwen3 comprises a series of large language models (LLMs) designed to advance performance, efficiency, and multilingual capabilities. The Qwen3 series includes models of both dense and Mixture-of-Expert (MoE) architectures, with parameter scales ranging from 0.6 to 235 billion. A key innovation in Qwen3 is the integration of thinking mode (for complex, multi-step reasoning) and non-thinking mode (for rapid, context-driven responses) into a unified framework. This eliminates the need to switch between different models--such as chat-optimized models (e.g., GPT-4o) and dedicated reasoning models (e.g., QwQ-32B)--and enables dynamic mode switching based on user queries or chat templates. Meanwhile, Qwen3 introduces a thinking budget mechanism, allowing users to allocate computational resources adaptively during inference, thereby balancing latency and performance based on task complexity. Moreover, by leveraging the knowledge from the flagship models, we significantly reduce the computational resources required to build smaller-scale models, while ensuring their highly competitive performance. Empirical evaluations demonstrate that Qwen3 achieves state-of-the-art results across diverse benchmarks, including tasks in code generation, mathematical reasoning, agent tasks, etc., competitive against larger MoE models and proprietary models. Compared to its predecessor Qwen2.5, Qwen3 expands multilingual support from 29 to 119 languages and dialects, enhancing global accessibility through improved cross-lingual understanding and generation capabilities. To facilitate reproducibility and community-driven research and development, all Qwen3 models are publicly accessible under Apache 2.0.
△ Less
Submitted 14 May, 2025;
originally announced May 2025.
-
MiMo: Unlocking the Reasoning Potential of Language Model -- From Pretraining to Posttraining
Authors:
LLM-Core Xiaomi,
:,
Bingquan Xia,
Bowen Shen,
Cici,
Dawei Zhu,
Di Zhang,
Gang Wang,
Hailin Zhang,
Huaqiu Liu,
Jiebao Xiao,
Jinhao Dong,
Liang Zhao,
Peidian Li,
Peng Wang,
Shihua Yu,
Shimao Chen,
Weikun Wang,
Wenhan Ma,
Xiangwei Deng,
Yi Huang,
Yifan Song,
Zihan Jiang,
Bowen Ye,
Can Cai
, et al. (40 additional authors not shown)
Abstract:
We present MiMo-7B, a large language model born for reasoning tasks, with optimization across both pre-training and post-training stages. During pre-training, we enhance the data preprocessing pipeline and employ a three-stage data mixing strategy to strengthen the base model's reasoning potential. MiMo-7B-Base is pre-trained on 25 trillion tokens, with additional Multi-Token Prediction objective…
▽ More
We present MiMo-7B, a large language model born for reasoning tasks, with optimization across both pre-training and post-training stages. During pre-training, we enhance the data preprocessing pipeline and employ a three-stage data mixing strategy to strengthen the base model's reasoning potential. MiMo-7B-Base is pre-trained on 25 trillion tokens, with additional Multi-Token Prediction objective for enhanced performance and accelerated inference speed. During post-training, we curate a dataset of 130K verifiable mathematics and programming problems for reinforcement learning, integrating a test-difficulty-driven code-reward scheme to alleviate sparse-reward issues and employing strategic data resampling to stabilize training. Extensive evaluations show that MiMo-7B-Base possesses exceptional reasoning potential, outperforming even much larger 32B models. The final RL-tuned model, MiMo-7B-RL, achieves superior performance on mathematics, code and general reasoning tasks, surpassing the performance of OpenAI o1-mini. The model checkpoints are available at https://github.com/xiaomimimo/MiMo.
△ Less
Submitted 5 June, 2025; v1 submitted 12 May, 2025;
originally announced May 2025.
-
Prism: Unleashing GPU Sharing for Cost-Efficient Multi-LLM Serving
Authors:
Shan Yu,
Jiarong Xing,
Yifan Qiao,
Mingyuan Ma,
Yangmin Li,
Yang Wang,
Shuo Yang,
Zhiqiang Xie,
Shiyi Cao,
Ke Bao,
Ion Stoica,
Harry Xu,
Ying Sheng
Abstract:
Serving large language models (LLMs) is expensive, especially for providers hosting many models, making cost reduction essential. The unique workload patterns of serving multiple LLMs (i.e., multi-LLM serving) create new opportunities and challenges for this task. The long-tail popularity of models and their long idle periods present opportunities to improve utilization through GPU sharing. Howeve…
▽ More
Serving large language models (LLMs) is expensive, especially for providers hosting many models, making cost reduction essential. The unique workload patterns of serving multiple LLMs (i.e., multi-LLM serving) create new opportunities and challenges for this task. The long-tail popularity of models and their long idle periods present opportunities to improve utilization through GPU sharing. However, existing GPU sharing systems lack the ability to adjust their resource allocation and sharing policies at runtime, making them ineffective at meeting latency service-level objectives (SLOs) under rapidly fluctuating workloads.
This paper presents Prism, a multi-LLM serving system that unleashes the full potential of GPU sharing to achieve both cost efficiency and SLO attainment. At its core, Prism tackles a key limitation of existing systems$\unicode{x2014}$the lack of $\textit{cross-model memory coordination}$, which is essential for flexibly sharing GPU memory across models under dynamic workloads. Prism achieves this with two key designs. First, it supports on-demand memory allocation by dynamically mapping physical to virtual memory pages, allowing flexible memory redistribution among models that space- and time-share a GPU. Second, it improves memory efficiency through a two-level scheduling policy that dynamically adjusts sharing strategies based on models' runtime demands. Evaluations on real-world traces show that Prism achieves more than $2\times$ cost savings and $3.3\times$ SLO attainment compared to state-of-the-art systems.
△ Less
Submitted 12 May, 2025; v1 submitted 6 May, 2025;
originally announced May 2025.
-
MolMole: Molecule Mining from Scientific Literature
Authors:
LG AI Research,
Sehyun Chun,
Jiye Kim,
Ahra Jo,
Yeonsik Jo,
Seungyul Oh,
Seungjun Lee,
Kwangrok Ryoo,
Jongmin Lee,
Seung Hwan Kim,
Byung Jun Kang,
Soonyoung Lee,
Jun Ha Park,
Chanwoo Moon,
Jiwon Ham,
Haein Lee,
Heejae Han,
Jaeseung Byun,
Soojong Do,
Minju Ha,
Dongyun Kim,
Kyunghoon Bae,
Woohyung Lim,
Edward Hwayoung Lee,
Yongmin Park
, et al. (9 additional authors not shown)
Abstract:
The extraction of molecular structures and reaction data from scientific documents is challenging due to their varied, unstructured chemical formats and complex document layouts. To address this, we introduce MolMole, a vision-based deep learning framework that unifies molecule detection, reaction diagram parsing, and optical chemical structure recognition (OCSR) into a single pipeline for automat…
▽ More
The extraction of molecular structures and reaction data from scientific documents is challenging due to their varied, unstructured chemical formats and complex document layouts. To address this, we introduce MolMole, a vision-based deep learning framework that unifies molecule detection, reaction diagram parsing, and optical chemical structure recognition (OCSR) into a single pipeline for automating the extraction of chemical data directly from page-level documents. Recognizing the lack of a standard page-level benchmark and evaluation metric, we also present a testset of 550 pages annotated with molecule bounding boxes, reaction labels, and MOLfiles, along with a novel evaluation metric. Experimental results demonstrate that MolMole outperforms existing toolkits on both our benchmark and public datasets. The benchmark testset will be publicly available, and the MolMole toolkit will be accessible soon through an interactive demo on the LG AI Research website. For commercial inquiries, please contact us at \href{mailto:[email protected]}{contact\[email protected]}.
△ Less
Submitted 7 May, 2025; v1 submitted 30 April, 2025;
originally announced May 2025.
-
The Muon Collider
Authors:
Carlotta Accettura,
Simon Adrian,
Rohit Agarwal,
Claudia Ahdida,
Chiara Aime',
Avni Aksoy,
Gian Luigi Alberghi,
Siobhan Alden,
Luca Alfonso,
Muhammad Ali,
Anna Rita Altamura,
Nicola Amapane,
Kathleen Amm,
David Amorim,
Paolo Andreetto,
Fabio Anulli,
Ludovica Aperio Bella,
Rob Appleby,
Artur Apresyan,
Pouya Asadi,
Mohammed Attia Mahmoud,
Bernhard Auchmann,
John Back,
Anthony Badea,
Kyu Jung Bae
, et al. (433 additional authors not shown)
Abstract:
Muons offer a unique opportunity to build a compact high-energy electroweak collider at the 10 TeV scale. A Muon Collider enables direct access to the underlying simplicity of the Standard Model and unparalleled reach beyond it. It will be a paradigm-shifting tool for particle physics representing the first collider to combine the high-energy reach of a proton collider and the high precision of an…
▽ More
Muons offer a unique opportunity to build a compact high-energy electroweak collider at the 10 TeV scale. A Muon Collider enables direct access to the underlying simplicity of the Standard Model and unparalleled reach beyond it. It will be a paradigm-shifting tool for particle physics representing the first collider to combine the high-energy reach of a proton collider and the high precision of an electron-positron collider, yielding a physics potential significantly greater than the sum of its individual parts. A high-energy muon collider is the natural next step in the exploration of fundamental physics after the HL-LHC and a natural complement to a future low-energy Higgs factory. Such a facility would significantly broaden the scope of particle colliders, engaging the many frontiers of the high energy community.
The last European Strategy for Particle Physics Update and later the Particle Physics Project Prioritisation Panel in the US requested a study of the muon collider, which is being carried on by the International Muon Collider Collaboration. In this comprehensive document we present the physics case, the state of the work on accelerator design and technology, and propose an R\&D project that can make the muon collider a reality.
△ Less
Submitted 30 April, 2025;
originally announced April 2025.
-
Demonstration of highly scaled AlScN ferroelectric diode memory with storage density > 100 Mbit/mm$^2$
Authors:
Zekun Hu,
Hyunmin Cho,
Rajeev Kumar Rai,
Kefei Bao,
Yinuo Zhang,
Yunfei He,
Yaoyang Ji,
Chloe Leblanc,
Kwan-Ho Kim,
Zirun Han,
Zhen Qiu,
Xingyu Du,
Eric A. Stach,
Roy Olsson,
Deep Jariwala
Abstract:
Wurtzite nitride ferroelectric materials have emerged as promising candidates for next-generation memory applications due to their exceptional polarization properties and compatibility with conventional semiconductor processing techniques. Here, we demonstrate the first successful scaling of Aluminum Scandium Nitride (AlScN) ferroelectric diode (FeDiode) memory down to 50 nm device diameters while…
▽ More
Wurtzite nitride ferroelectric materials have emerged as promising candidates for next-generation memory applications due to their exceptional polarization properties and compatibility with conventional semiconductor processing techniques. Here, we demonstrate the first successful scaling of Aluminum Scandium Nitride (AlScN) ferroelectric diode (FeDiode) memory down to 50 nm device diameters while maintaining functional performance. Using a 20 nm Al0.64Sc0.36N ferroelectric layer, we investigate both metal-insulator-ferroelectric-metal (MIFM) and metal-ferroelectric-metal (MFM) architectures to optimize device performance. Our scaled devices exhibit a previously unreported size-dependent behavior, where switching voltage decreases while breakdown field increases with miniaturization, resulting in an enhanced breakdown-to-coercive field ratio exceeding 2.6 for the smallest structures. This favorable scaling behavior enables reliable operation at reduced dimensions critical for high-density applications. The MIFM devices demonstrate stable 3-bit non-volatile multistate behavior with clearly distinguishable resistance states and retention exceeding $5\times 10^5$ seconds. This combination of scalability and simple structure enables an effective memory density of 100 Mbit/mm$^2$ under feature size of 50 nm. By achieving 50 nm scaling with enhanced performance metrics, this work establishes AlScN-based FeDiode memory as a highly promising platform for next-generation non-volatile storage with potential for direct integration into CMOS technology.
△ Less
Submitted 17 April, 2025;
originally announced April 2025.
-
MASH-VLM: Mitigating Action-Scene Hallucination in Video-LLMs through Disentangled Spatial-Temporal Representations
Authors:
Kyungho Bae,
Jinhyung Kim,
Sihaeng Lee,
Soonyoung Lee,
Gunhee Lee,
Jinwoo Choi
Abstract:
In this work, we tackle action-scene hallucination in Video Large Language Models (Video-LLMs), where models incorrectly predict actions based on the scene context or scenes based on observed actions. We observe that existing Video-LLMs often suffer from action-scene hallucination due to two main factors. First, existing Video-LLMs intermingle spatial and temporal features by applying an attention…
▽ More
In this work, we tackle action-scene hallucination in Video Large Language Models (Video-LLMs), where models incorrectly predict actions based on the scene context or scenes based on observed actions. We observe that existing Video-LLMs often suffer from action-scene hallucination due to two main factors. First, existing Video-LLMs intermingle spatial and temporal features by applying an attention operation across all tokens. Second, they use the standard Rotary Position Embedding (RoPE), which causes the text tokens to overemphasize certain types of tokens depending on their sequential orders. To address these issues, we introduce MASH-VLM, Mitigating Action-Scene Hallucination in Video-LLMs through disentangled spatial-temporal representations. Our approach includes two key innovations: (1) DST-attention, a novel attention mechanism that disentangles the spatial and temporal tokens within the LLM by using masked attention to restrict direct interactions between the spatial and temporal tokens; (2) Harmonic-RoPE, which extends the dimensionality of the positional IDs, allowing the spatial and temporal tokens to maintain balanced positions relative to the text tokens. To evaluate the action-scene hallucination in Video-LLMs, we introduce the UNSCENE benchmark with 1,320 videos and 4,078 QA pairs. Extensive experiments demonstrate that MASH-VLM achieves state-of-the-art results on the UNSCENE benchmark, as well as on existing video understanding benchmarks.
△ Less
Submitted 20 March, 2025;
originally announced March 2025.
-
EXAONE Deep: Reasoning Enhanced Language Models
Authors:
LG AI Research,
Kyunghoon Bae,
Eunbi Choi,
Kibong Choi,
Stanley Jungkyu Choi,
Yemuk Choi,
Seokhee Hong,
Junwon Hwang,
Hyojin Jeon,
Kijeong Jeon,
Gerrard Jeongwon Jo,
Hyunjik Jo,
Jiyeon Jung,
Hyosang Kim,
Joonkee Kim,
Seonghwan Kim,
Soyeon Kim,
Sunkyoung Kim,
Yireun Kim,
Yongil Kim,
Youchul Kim,
Edward Hwayoung Lee,
Haeju Lee,
Honglak Lee,
Jinsik Lee
, et al. (7 additional authors not shown)
Abstract:
We present EXAONE Deep series, which exhibits superior capabilities in various reasoning tasks, including math and coding benchmarks. We train our models mainly on the reasoning-specialized dataset that incorporates long streams of thought processes. Evaluation results show that our smaller models, EXAONE Deep 2.4B and 7.8B, outperform other models of comparable size, while the largest model, EXAO…
▽ More
We present EXAONE Deep series, which exhibits superior capabilities in various reasoning tasks, including math and coding benchmarks. We train our models mainly on the reasoning-specialized dataset that incorporates long streams of thought processes. Evaluation results show that our smaller models, EXAONE Deep 2.4B and 7.8B, outperform other models of comparable size, while the largest model, EXAONE Deep 32B, demonstrates competitive performance against leading open-weight models. All EXAONE Deep models are openly available for research purposes and can be downloaded from https://huggingface.co/LGAI-EXAONE
△ Less
Submitted 19 March, 2025; v1 submitted 16 March, 2025;
originally announced March 2025.
-
Do Not Trust Licenses You See: Dataset Compliance Requires Massive-Scale AI-Powered Lifecycle Tracing
Authors:
Jaekyeom Kim,
Sungryull Sohn,
Gerrard Jeongwon Jo,
Jihoon Choi,
Kyunghoon Bae,
Hwayoung Lee,
Yongmin Park,
Honglak Lee
Abstract:
This paper argues that a dataset's legal risk cannot be accurately assessed by its license terms alone; instead, tracking dataset redistribution and its full lifecycle is essential. However, this process is too complex for legal experts to handle manually at scale. Tracking dataset provenance, verifying redistribution rights, and assessing evolving legal risks across multiple stages require a leve…
▽ More
This paper argues that a dataset's legal risk cannot be accurately assessed by its license terms alone; instead, tracking dataset redistribution and its full lifecycle is essential. However, this process is too complex for legal experts to handle manually at scale. Tracking dataset provenance, verifying redistribution rights, and assessing evolving legal risks across multiple stages require a level of precision and efficiency that exceeds human capabilities. Addressing this challenge effectively demands AI agents that can systematically trace dataset redistribution, analyze compliance, and identify legal risks. We develop an automated data compliance system called NEXUS and show that AI can perform these tasks with higher accuracy, efficiency, and cost-effectiveness than human experts. Our massive legal analysis of 17,429 unique entities and 8,072 license terms using this approach reveals the discrepancies in legal rights between the original datasets before redistribution and their redistributed subsets, underscoring the necessity of the data lifecycle-aware compliance. For instance, we find that out of 2,852 datasets with commercially viable individual license terms, only 605 (21%) are legally permissible for commercialization. This work sets a new standard for AI data governance, advocating for a framework that systematically examines the entire lifecycle of dataset redistribution to ensure transparent, legal, and responsible dataset management.
△ Less
Submitted 14 March, 2025; v1 submitted 4 March, 2025;
originally announced March 2025.
-
Safeguarding AI in Medical Imaging: Post-Hoc Out-of-Distribution Detection with Normalizing Flows
Authors:
Dariush Lotfi,
Mohammad-Ali Nikouei Mahani,
Mohamad Koohi-Moghadam,
Kyongtae Ty Bae
Abstract:
In AI-driven medical imaging, the failure to detect out-of-distribution (OOD) data poses a severe risk to clinical reliability, potentially leading to critical diagnostic errors. Current OOD detection methods often demand impractical retraining or modifications to pre-trained models, hindering their adoption in regulated clinical environments. To address this challenge, we propose a post-hoc norma…
▽ More
In AI-driven medical imaging, the failure to detect out-of-distribution (OOD) data poses a severe risk to clinical reliability, potentially leading to critical diagnostic errors. Current OOD detection methods often demand impractical retraining or modifications to pre-trained models, hindering their adoption in regulated clinical environments. To address this challenge, we propose a post-hoc normalizing flow-based approach that seamlessly integrates with existing pre-trained models without altering their weights. Our evaluation used a novel in-house built dataset, MedOOD, meticulously curated to simulate clinically relevant distributional shifts, alongside the MedMNIST benchmark dataset. On our in-house MedOOD dataset, our method achieved an AUROC of 84.61%, outperforming state-of-the-art methods like ViM (80.65%) and MDS (80.87%). Similarly, on MedMNIST, it reached an exceptional AUROC of 93.8%, surpassing leading approaches such as ViM (88.08%) and ReAct (87.05%). This superior performance, coupled with its post-hoc integration capability, positions our method as a vital safeguard for enhancing safety in medical imaging workflows. The model and code to build OOD datasets are publicly accessible at https://github.com/dlotfi/MedOODFlow.
△ Less
Submitted 28 May, 2025; v1 submitted 17 February, 2025;
originally announced February 2025.
-
HellaSwag-Pro: A Large-Scale Bilingual Benchmark for Evaluating the Robustness of LLMs in Commonsense Reasoning
Authors:
Xiaoyuan Li,
Moxin Li,
Rui Men,
Yichang Zhang,
Keqin Bao,
Wenjie Wang,
Fuli Feng,
Dayiheng Liu,
Junyang Lin
Abstract:
Large language models (LLMs) have shown remarkable capabilities in commonsense reasoning; however, some variations in questions can trigger incorrect responses. Do these models truly understand commonsense knowledge, or just memorize expression patterns? To investigate this question, we present the first extensive robustness evaluation of LLMs in commonsense reasoning. We introduce HellaSwag-Pro,…
▽ More
Large language models (LLMs) have shown remarkable capabilities in commonsense reasoning; however, some variations in questions can trigger incorrect responses. Do these models truly understand commonsense knowledge, or just memorize expression patterns? To investigate this question, we present the first extensive robustness evaluation of LLMs in commonsense reasoning. We introduce HellaSwag-Pro, a large-scale bilingual benchmark consisting of 11,200 cases, by designing and compiling seven types of question variants. To construct this benchmark, we propose a two-stage method to develop Chinese HellaSwag, a finely annotated dataset comprising 12,000 instances across 56 categories. We conduct extensive experiments on 41 representative LLMs, revealing that these LLMs are far from robust in commonsense reasoning. Furthermore, this robustness varies depending on the language in which the LLM is tested. This work establishes a high-quality evaluation benchmark, with extensive experiments offering valuable insights to the community in commonsense reasoning for LLMs.
△ Less
Submitted 25 May, 2025; v1 submitted 16 February, 2025;
originally announced February 2025.
-
ATLAS: Autoformalizing Theorems through Lifting, Augmentation, and Synthesis of Data
Authors:
Xiaoyang Liu,
Kangjie Bao,
Jiashuo Zhang,
Yunqi Liu,
Yuntian Liu,
Yu Chen,
Yang Jiao,
Tao Luo
Abstract:
Autoformalization, the automatic translation of mathematical content from natural language into machine-verifiable formal languages, has seen significant progress driven by advances in large language models (LLMs). Nonetheless, a primary barrier to further improvements is the limited availability of parallel corpora that map informal mathematical text to its formal counterpart. To address this lim…
▽ More
Autoformalization, the automatic translation of mathematical content from natural language into machine-verifiable formal languages, has seen significant progress driven by advances in large language models (LLMs). Nonetheless, a primary barrier to further improvements is the limited availability of parallel corpora that map informal mathematical text to its formal counterpart. To address this limitation, we propose ATLAS (Autoformalizing Theorems through Lifting, Augmentation, and Synthesis of Data), a novel data generation framework designed to produce large-scale, high-quality parallel corpora of theorem statements. Distinct from prior approaches, ATLAS begins with a concept repository, accelerates the improvement of student model through expert iteration combined with knowledge distillation, and introduces two novel augmentation strategies that exploit the structural characteristics of formal languages. With the proposed ATLAS running for 10 iterations, we construct an undergraduate-level dataset comprising 117k theorem statements and develop ATLAS Translator, which demonstrates statistically significant improvements over both the HERALD Translator and the Kimina-Autoformalizer across all benchmarks ($p<0.05$, two-sided t-test), achieving a new state of the art. The datasets, model, and code will be released to the public soon.
△ Less
Submitted 19 May, 2025; v1 submitted 8 February, 2025;
originally announced February 2025.
-
Mol-LLM: Multimodal Generalist Molecular LLM with Improved Graph Utilization
Authors:
Chanhui Lee,
Hanbum Ko,
Yuheon Song,
YongJun Jeong,
Rodrigo Hormazabal,
Sehui Han,
Kyunghoon Bae,
Sungbin Lim,
Sungwoong Kim
Abstract:
Recent advances in large language models (LLMs) have led to models that tackle diverse molecular tasks, such as chemical reaction prediction and molecular property prediction. Large-scale molecular instruction-tuning datasets have enabled sequence-only (e.g., SMILES or SELFIES) generalist molecular LLMs, and researchers are now exploring multimodal approaches that incorporate molecular structural…
▽ More
Recent advances in large language models (LLMs) have led to models that tackle diverse molecular tasks, such as chemical reaction prediction and molecular property prediction. Large-scale molecular instruction-tuning datasets have enabled sequence-only (e.g., SMILES or SELFIES) generalist molecular LLMs, and researchers are now exploring multimodal approaches that incorporate molecular structural information for further gains. However, a genuinely multimodal, generalist LLM that covers a broad spectrum of molecular tasks has yet to be fully investigated. We observe that naive next token prediction training ignores graph-structural information, limiting an LLM's ability to exploit molecular graphs. To address this, we propose (i) Molecular structure Preference Optimization (MolPO), which facilitates graph usage by optimizing preferences between pairs of correct and perturbed molecular structures, and (ii) an advanced graph encoder with a tailored pre-training strategy to improve the effect of graph utilization by MolPO. Building on these contributions, we introduce Mol-LLM, the first multimodal generalist model that (a) handles a broad spectrum of molecular tasks among molecular LLMs, (b) explicitly leverages molecular-structure information, and (c) takes advantage of extensive instruction tuning. Mol-LLM attains state-of-the-art or comparable results across the most comprehensive molecular-LLM benchmark-even on out-of-distribution datasets for reaction and property prediction, where it surpasses prior generalist molecular LLMs by a large margin.
△ Less
Submitted 26 May, 2025; v1 submitted 4 February, 2025;
originally announced February 2025.
-
Toward Task Generalization via Memory Augmentation in Meta-Reinforcement Learning
Authors:
Kaixi Bao,
Chenhao Li,
Yarden As,
Andreas Krause,
Marco Hutter
Abstract:
Agents trained via reinforcement learning (RL) often struggle to perform well on tasks that differ from those encountered during training. This limitation presents a challenge to the broader deployment of RL in diverse and dynamic task settings. In this work, we introduce memory augmentation, a memory-based RL approach to improve task generalization. Our approach leverages task-structured augmenta…
▽ More
Agents trained via reinforcement learning (RL) often struggle to perform well on tasks that differ from those encountered during training. This limitation presents a challenge to the broader deployment of RL in diverse and dynamic task settings. In this work, we introduce memory augmentation, a memory-based RL approach to improve task generalization. Our approach leverages task-structured augmentations to simulate plausible out-of-distribution scenarios and incorporates memory mechanisms to enable context-aware policy adaptation. Trained on a predefined set of tasks, our policy demonstrates the ability to generalize to unseen tasks through memory augmentation without requiring additional interactions with the environment. Through extensive simulation experiments and real-world hardware evaluations on legged locomotion tasks, we demonstrate that our approach achieves zero-shot generalization to unseen tasks while maintaining robust in-distribution performance and high sample efficiency.
△ Less
Submitted 7 May, 2025; v1 submitted 3 February, 2025;
originally announced February 2025.
-
Simons Observatory: Characterization of the Large Aperture Telescope Receiver
Authors:
Tanay Bhandarkar,
Saianeesh K. Haridas,
Jeff Iuliano,
Anna Kofman,
Alex Manduca,
Karen Perez Sarmiento,
John Orlowski-Scherer,
Thomas P. Satterthwaite,
Yuhan Wang,
Zeeshan Ahmed,
Jason E. Austermann,
Kyuyoung Bae,
Gabriele Coppi,
Mark J. Devlin,
Simon R Dicker,
Peter N. Dow,
Shannon M. Duff,
Daniel Dutcher,
Nicholas Galitzki,
Jon E. Gudmundsson,
Shawn W. Henderson,
Johannes Hubmayr,
Bradley R. Johnson,
Matthew A. Koc,
Brian J. Koopman
, et al. (19 additional authors not shown)
Abstract:
The Simons Observatory (SO) is a ground-based cosmic microwave background (CMB) survey experiment that currently consists of three 0.42m small-aperture telescopes (SATs) and one 6m large-aperture telescope (LAT), located at an elevation of 5200m in the Atacama Desert in Chile. At the LAT's focal plane, SO will install >62,000 transition-edge sensor detectors across 13 optics tubes (OTs) within the…
▽ More
The Simons Observatory (SO) is a ground-based cosmic microwave background (CMB) survey experiment that currently consists of three 0.42m small-aperture telescopes (SATs) and one 6m large-aperture telescope (LAT), located at an elevation of 5200m in the Atacama Desert in Chile. At the LAT's focal plane, SO will install >62,000 transition-edge sensor detectors across 13 optics tubes (OTs) within the Large Aperture Telescope Receiver (LATR), the largest cryogenic camera ever built to observe the CMB. Here we report on the validation of the LATR in the laboratory and the subsequent dark testing and validation within the LAT. We show that the LATR meets cryogenic, optical, and detector specifications required for high-sensitivity measurements of the CMB. At the time of writing, the LATR is installed in the LAT with six OTs (corresponding to >31,000 detectors), and the LAT mirrors and remaining seven OTs are undergoing development.
△ Less
Submitted 15 January, 2025;
originally announced January 2025.
-
Anisotropic Band Flattening in Twisted Bilayer of M-Valley MXenes
Authors:
Kejie Bao,
Huan Wang,
Zhaochen Liu,
jing Wang
Abstract:
Experimental studies on moiré materials have predominantly focused on twisted hexagonal lattice with low-energy states near the $Γ$- or K-points. These materials, characterized by isotropic low-energy dispersion, are fundamentally distinct from those with anisotropic properties. Here we introduce a series of semiconducting transition metal carbides (MXenes) $M_2$C$T_2$ ($M$ = Ti, Zr, Hf, Sc, Y;…
▽ More
Experimental studies on moiré materials have predominantly focused on twisted hexagonal lattice with low-energy states near the $Γ$- or K-points. These materials, characterized by isotropic low-energy dispersion, are fundamentally distinct from those with anisotropic properties. Here we introduce a series of semiconducting transition metal carbides (MXenes) $M_2$C$T_2$ ($M$ = Ti, Zr, Hf, Sc, Y; $T$ = O, F, Cl) as a novel platform for M-valley moiré materials. Take Ti$_2$CO$_2$ and Zr$_2$CO$_2$ as representative examples, large-scale \emph{ab initio} calculations show that their AB-stacked twisted homobilayer features three three-fold rotational symmetry related M-valleys with time-reserval symmetry and giant anisotropic band flattening. We derive a simplified moiré Hamiltonian for these systems and conduct a detailed analysis of their band structures, where the origins of anisotropic band flattening are clearly elucidated. This research broadens the scope of moiré materials, where the valley- and spin-degenerate two-dimensional array of quasi-one-dimensional system could serve as a potential platform for realizing many interesting correlated phases.
△ Less
Submitted 27 December, 2024;
originally announced December 2024.
-
Qwen2.5 Technical Report
Authors:
Qwen,
:,
An Yang,
Baosong Yang,
Beichen Zhang,
Binyuan Hui,
Bo Zheng,
Bowen Yu,
Chengyuan Li,
Dayiheng Liu,
Fei Huang,
Haoran Wei,
Huan Lin,
Jian Yang,
Jianhong Tu,
Jianwei Zhang,
Jianxin Yang,
Jiaxi Yang,
Jingren Zhou,
Junyang Lin,
Kai Dang,
Keming Lu,
Keqin Bao,
Kexin Yang,
Le Yu
, et al. (19 additional authors not shown)
Abstract:
In this report, we introduce Qwen2.5, a comprehensive series of large language models (LLMs) designed to meet diverse needs. Compared to previous iterations, Qwen 2.5 has been significantly improved during both the pre-training and post-training stages. In terms of pre-training, we have scaled the high-quality pre-training datasets from the previous 7 trillion tokens to 18 trillion tokens. This pr…
▽ More
In this report, we introduce Qwen2.5, a comprehensive series of large language models (LLMs) designed to meet diverse needs. Compared to previous iterations, Qwen 2.5 has been significantly improved during both the pre-training and post-training stages. In terms of pre-training, we have scaled the high-quality pre-training datasets from the previous 7 trillion tokens to 18 trillion tokens. This provides a strong foundation for common sense, expert knowledge, and reasoning capabilities. In terms of post-training, we implement intricate supervised finetuning with over 1 million samples, as well as multistage reinforcement learning. Post-training techniques enhance human preference, and notably improve long text generation, structural data analysis, and instruction following. To handle diverse and varied use cases effectively, we present Qwen2.5 LLM series in rich sizes. Open-weight offerings include base and instruction-tuned models, with quantized versions available. In addition, for hosted solutions, the proprietary models currently include two mixture-of-experts (MoE) variants: Qwen2.5-Turbo and Qwen2.5-Plus, both available from Alibaba Cloud Model Studio. Qwen2.5 has demonstrated top-tier performance on a wide range of benchmarks evaluating language understanding, reasoning, mathematics, coding, human preference alignment, etc. Specifically, the open-weight flagship Qwen2.5-72B-Instruct outperforms a number of open and proprietary models and demonstrates competitive performance to the state-of-the-art open-weight model, Llama-3-405B-Instruct, which is around 5 times larger. Qwen2.5-Turbo and Qwen2.5-Plus offer superior cost-effectiveness while performing competitively against GPT-4o-mini and GPT-4o respectively. Additionally, as the foundation, Qwen2.5 models have been instrumental in training specialized models such as Qwen2.5-Math, Qwen2.5-Coder, QwQ, and multimodal models.
△ Less
Submitted 2 January, 2025; v1 submitted 19 December, 2024;
originally announced December 2024.
-
EXAONE 3.5: Series of Large Language Models for Real-world Use Cases
Authors:
LG AI Research,
Soyoung An,
Kyunghoon Bae,
Eunbi Choi,
Kibong Choi,
Stanley Jungkyu Choi,
Seokhee Hong,
Junwon Hwang,
Hyojin Jeon,
Gerrard Jeongwon Jo,
Hyunjik Jo,
Jiyeon Jung,
Yountae Jung,
Hyosang Kim,
Joonkee Kim,
Seonghwan Kim,
Soyeon Kim,
Sunkyoung Kim,
Yireun Kim,
Yongil Kim,
Youchul Kim,
Edward Hwayoung Lee,
Haeju Lee,
Honglak Lee,
Jinsik Lee
, et al. (8 additional authors not shown)
Abstract:
This technical report introduces the EXAONE 3.5 instruction-tuned language models, developed and released by LG AI Research. The EXAONE 3.5 language models are offered in three configurations: 32B, 7.8B, and 2.4B. These models feature several standout capabilities: 1) exceptional instruction following capabilities in real-world scenarios, achieving the highest scores across seven benchmarks, 2) ou…
▽ More
This technical report introduces the EXAONE 3.5 instruction-tuned language models, developed and released by LG AI Research. The EXAONE 3.5 language models are offered in three configurations: 32B, 7.8B, and 2.4B. These models feature several standout capabilities: 1) exceptional instruction following capabilities in real-world scenarios, achieving the highest scores across seven benchmarks, 2) outstanding long-context comprehension, attaining the top performance in four benchmarks, and 3) competitive results compared to state-of-the-art open models of similar sizes across nine general benchmarks. The EXAONE 3.5 language models are open to anyone for research purposes and can be downloaded from https://huggingface.co/LGAI-EXAONE. For commercial use, please reach out to the official contact point of LG AI Research: [email protected].
△ Less
Submitted 9 December, 2024; v1 submitted 6 December, 2024;
originally announced December 2024.
-
Uncorrectable-error-injection based reliable and secure quantum communication
Authors:
IlKwon Sohn,
Boseon Kim,
Kwangil Bae,
Wooyeong Song,
Chankyun Lee,
Kabgyun Jeong,
Wonhyuk Lee
Abstract:
Quantum networks aim to communicate distant quantum devices, such as quantum computers. In this context, a critical requirement is the secure and reliable transmission of arbitrary quantum states. Quantum teleportation is widely used to transmit arbitrary quantum states. However, it requires entanglement swapping and purification to distribute entanglements over long distances, introducing signifi…
▽ More
Quantum networks aim to communicate distant quantum devices, such as quantum computers. In this context, a critical requirement is the secure and reliable transmission of arbitrary quantum states. Quantum teleportation is widely used to transmit arbitrary quantum states. However, it requires entanglement swapping and purification to distribute entanglements over long distances, introducing significant overhead and complexity. These challenges limit its practicality for real-world quantum communication networks. To address this limitation, we propose a novel scheme for directly transmitting quantum states encoded using error-correction codes. The proposed scheme leverages the robustness of quantum error correction codes to ensure secure and reliable quantum communication. By encoding quantum states with error-correction codes and strategically injecting uncorrectable errors, we enhance the security and reliability of the transmission process. Our approach reduces the overhead associated with entanglement distribution and provides a high tolerance for transmission errors. This study presents an advancement in practical and scalable quantum communication networks.
△ Less
Submitted 21 November, 2024;
originally announced November 2024.
-
MuCol Milestone Report No. 5: Preliminary Parameters
Authors:
Carlotta Accettura,
Simon Adrian,
Rohit Agarwal,
Claudia Ahdida,
Chiara Aimé,
Avni Aksoy,
Gian Luigi Alberghi,
Siobhan Alden,
Luca Alfonso,
Nicola Amapane,
David Amorim,
Paolo Andreetto,
Fabio Anulli,
Rob Appleby,
Artur Apresyan,
Pouya Asadi,
Mohammed Attia Mahmoud,
Bernhard Auchmann,
John Back,
Anthony Badea,
Kyu Jung Bae,
E. J. Bahng,
Lorenzo Balconi,
Fabrice Balli,
Laura Bandiera
, et al. (369 additional authors not shown)
Abstract:
This document is comprised of a collection of updated preliminary parameters for the key parts of the muon collider. The updated preliminary parameters follow on from the October 2023 Tentative Parameters Report. Particular attention has been given to regions of the facility that are believed to hold greater technical uncertainty in their design and that have a strong impact on the cost and power…
▽ More
This document is comprised of a collection of updated preliminary parameters for the key parts of the muon collider. The updated preliminary parameters follow on from the October 2023 Tentative Parameters Report. Particular attention has been given to regions of the facility that are believed to hold greater technical uncertainty in their design and that have a strong impact on the cost and power consumption of the facility. The data is collected from a collaborative spreadsheet and transferred to overleaf.
△ Less
Submitted 5 November, 2024;
originally announced November 2024.
-
Real-Time Personalization for LLM-based Recommendation with Customized In-Context Learning
Authors:
Keqin Bao,
Ming Yan,
Yang Zhang,
Jizhi Zhang,
Wenjie Wang,
Fuli Feng,
Xiangnan He
Abstract:
Frequently updating Large Language Model (LLM)-based recommender systems to adapt to new user interests -- as done for traditional ones -- is impractical due to high training costs, even with acceleration methods. This work explores adapting to dynamic user interests without any model updates by leveraging In-Context Learning (ICL), which allows LLMs to learn new tasks from few-shot examples provi…
▽ More
Frequently updating Large Language Model (LLM)-based recommender systems to adapt to new user interests -- as done for traditional ones -- is impractical due to high training costs, even with acceleration methods. This work explores adapting to dynamic user interests without any model updates by leveraging In-Context Learning (ICL), which allows LLMs to learn new tasks from few-shot examples provided in the input. Using new-interest examples as the ICL few-shot examples, LLMs may learn real-time interest directly, avoiding the need for model updates. However, existing LLM-based recommenders often lose the in-context learning ability during recommendation tuning, while the original LLM's in-context learning lacks recommendation-specific focus. To address this, we propose RecICL, which customizes recommendation-specific in-context learning for real-time recommendations. RecICL organizes training examples in an in-context learning format, ensuring that in-context learning ability is preserved and aligned with the recommendation task during tuning.
Extensive experiments demonstrate RecICL's effectiveness in delivering real-time recommendations without requiring model updates. Our code is available at https://github.com/ym689/rec_icl.
△ Less
Submitted 30 October, 2024;
originally announced October 2024.
-
Causality-Enhanced Behavior Sequence Modeling in LLMs for Personalized Recommendation
Authors:
Yang Zhang,
Juntao You,
Yimeng Bai,
Jizhi Zhang,
Keqin Bao,
Wenjie Wang,
Tat-Seng Chua
Abstract:
Recent advancements in recommender systems have focused on leveraging Large Language Models (LLMs) to improve user preference modeling, yielding promising outcomes. However, current LLM-based approaches struggle to fully leverage user behavior sequences, resulting in suboptimal preference modeling for personalized recommendations. In this study, we propose a novel Counterfactual Fine-Tuning (CFT)…
▽ More
Recent advancements in recommender systems have focused on leveraging Large Language Models (LLMs) to improve user preference modeling, yielding promising outcomes. However, current LLM-based approaches struggle to fully leverage user behavior sequences, resulting in suboptimal preference modeling for personalized recommendations. In this study, we propose a novel Counterfactual Fine-Tuning (CFT) method to address this issue by explicitly emphasizing the role of behavior sequences when generating recommendations. Specifically, we employ counterfactual reasoning to identify the causal effects of behavior sequences on model output and introduce a task that directly fits the ground-truth labels based on these effects, achieving the goal of explicit emphasis. Additionally, we develop a token-level weighting mechanism to adjust the emphasis strength for different item tokens, reflecting the diminishing influence of behavior sequences from earlier to later tokens during predicting an item. Extensive experiments on real-world datasets demonstrate that CFT effectively improves behavior sequence modeling. Our codes are available at https://github.com/itsmeyjt/CFT.
△ Less
Submitted 30 October, 2024;
originally announced October 2024.
-
Agentic Feedback Loop Modeling Improves Recommendation and User Simulation
Authors:
Shihao Cai,
Jizhi Zhang,
Keqin Bao,
Chongming Gao,
Qifan Wang,
Fuli Feng,
Xiangnan He
Abstract:
Large language model-based agents are increasingly applied in the recommendation field due to their extensive knowledge and strong planning capabilities. While prior research has primarily focused on enhancing either the recommendation agent or the user agent individually, the collaborative interaction between the two has often been overlooked. Towards this research gap, we propose a novel framewo…
▽ More
Large language model-based agents are increasingly applied in the recommendation field due to their extensive knowledge and strong planning capabilities. While prior research has primarily focused on enhancing either the recommendation agent or the user agent individually, the collaborative interaction between the two has often been overlooked. Towards this research gap, we propose a novel framework that emphasizes the feedback loop process to facilitate the collaboration between the recommendation agent and the user agent. Specifically, the recommendation agent refines its understanding of user preferences by analyzing the feedback from the user agent on the item recommendation. Conversely, the user agent further identifies potential user interests based on the items and recommendation reasons provided by the recommendation agent. This iterative process enhances the ability of both agents to infer user behaviors, enabling more effective item recommendations and more accurate user simulations. Extensive experiments on three datasets demonstrate the effectiveness of the agentic feedback loop: the agentic feedback loop yields an average improvement of 11.52% over the single recommendation agent and 21.12% over the single user agent. Furthermore, the results show that the agentic feedback loop does not exacerbate popularity or position bias, which are typically amplified by the real-world feedback loop, highlighting its robustness. The source code is available at https://github.com/Lanyu0303/AFL.
△ Less
Submitted 1 May, 2025; v1 submitted 25 October, 2024;
originally announced October 2024.
-
Federated Learning with Label-Masking Distillation
Authors:
Jianghu Lu,
Shikun Li,
Kexin Bao,
Pengju Wang,
Zhenxing Qian,
Shiming Ge
Abstract:
Federated learning provides a privacy-preserving manner to collaboratively train models on data distributed over multiple local clients via the coordination of a global server. In this paper, we focus on label distribution skew in federated learning, where due to the different user behavior of the client, label distributions between different clients are significantly different. When faced with su…
▽ More
Federated learning provides a privacy-preserving manner to collaboratively train models on data distributed over multiple local clients via the coordination of a global server. In this paper, we focus on label distribution skew in federated learning, where due to the different user behavior of the client, label distributions between different clients are significantly different. When faced with such cases, most existing methods will lead to a suboptimal optimization due to the inadequate utilization of label distribution information in clients. Inspired by this, we propose a label-masking distillation approach termed FedLMD to facilitate federated learning via perceiving the various label distributions of each client. We classify the labels into majority and minority labels based on the number of examples per class during training. The client model learns the knowledge of majority labels from local data. The process of distillation masks out the predictions of majority labels from the global model, so that it can focus more on preserving the minority label knowledge of the client. A series of experiments show that the proposed approach can achieve state-of-the-art performance in various cases. Moreover, considering the limited resources of the clients, we propose a variant FedLMD-Tf that does not require an additional teacher, which outperforms previous lightweight approaches without increasing computational costs. Our code is available at https://github.com/wnma3mz/FedLMD.
△ Less
Submitted 19 September, 2024;
originally announced September 2024.
-
AutoPET III Challenge: PET/CT Semantic Segmentation
Authors:
Reza Safdari,
Mohammad Koohi-Moghaddam,
Kyongtae Tyler Bae
Abstract:
In this study, we implemented a two-stage deep learning-based approach to segment lesions in PET/CT images for the AutoPET III challenge. The first stage utilized a DynUNet model for coarse segmentation, identifying broad regions of interest. The second stage refined this segmentation using an ensemble of SwinUNETR, SegResNet, and UNet models. Preprocessing involved resampling images to a common r…
▽ More
In this study, we implemented a two-stage deep learning-based approach to segment lesions in PET/CT images for the AutoPET III challenge. The first stage utilized a DynUNet model for coarse segmentation, identifying broad regions of interest. The second stage refined this segmentation using an ensemble of SwinUNETR, SegResNet, and UNet models. Preprocessing involved resampling images to a common resolution and normalization, while data augmentation techniques such as affine transformations and intensity adjustments were applied to enhance model generalization. The dataset was split into 80% training and 20% validation, excluding healthy cases. This method leverages multi-stage segmentation and model ensembling to achieve precise lesion segmentation, aiming to improve robustness and overall performance.
△ Less
Submitted 19 September, 2024;
originally announced September 2024.
-
Charged Higgs Boson Phenomenology in the Dark Z mediated Fermionic Dark Matter Model
Authors:
Kyu Jung Bae,
Jinn-Ouk Gong,
Dong-Won Jung,
Kang Young Lee,
Chaehyun Yu,
Chan Beom Park
Abstract:
We study the phenomenology of the charged Higgs boson, $H^\pm$,appearing in the fermionic dark matter model mediated by the dark $Z$ boson. This model is in favor of the light dark $Z$ boson, $Z'$, and the light additional neutral Higgs boson, $h$. We find that $H^\pm \to W^\pm h$ and the $H^\pm \to W^\pm Z'$ are dominant decay channels. Thus the promising final states are trilepton signals,…
▽ More
We study the phenomenology of the charged Higgs boson, $H^\pm$,appearing in the fermionic dark matter model mediated by the dark $Z$ boson. This model is in favor of the light dark $Z$ boson, $Z'$, and the light additional neutral Higgs boson, $h$. We find that $H^\pm \to W^\pm h$ and the $H^\pm \to W^\pm Z'$ are dominant decay channels. Thus the promising final states are trilepton signals, $e μμ$ or $μμμ$ following $Z' \to μ^+ μ^-$ decays and leptonic decays of the $W^\pm$ boson. The charged Higgs boson will be produced from the top quark decays $t \to b H^\pm$ following $t \bar{t}$ production, if $H^\pm$ is light. Whereas $H^\pm$ is heavier than the top quark, the dominant production processes are associated productions with either $Z'$ or $h$, $pp \to W^\star \to H^\pm h$ and $pp \to W^\star \to H^\pm Z'$. We explore the discovery potential of the charged Higgs boson at the LHC. We also discuss the implications of dark matter in relation with the charged Higgs phenomenology.
△ Less
Submitted 19 September, 2024; v1 submitted 11 September, 2024;
originally announced September 2024.
-
Consecutive Flat Chern Bands and Correlated States in Monolayer ReAg$_2$Cl$_6$
Authors:
Kejie Bao,
Rui Shi,
Huan Wang,
Jiaxuan Guo,
Jing Wang
Abstract:
We theoretically propose that van der Waals monolayer ReAg$_2$Cl$_6$ have four consecutive flat Chern bands in the 120$^\circ$ spiral antiferromagnetic ground state. The nontrivial topology of these Chern bands emerges from the synergy between Re $t_{2g}$ band folding with non-collinear spin configuration and spin-orbit coupling. By constructing maximally localized Wannier functions directly from…
▽ More
We theoretically propose that van der Waals monolayer ReAg$_2$Cl$_6$ have four consecutive flat Chern bands in the 120$^\circ$ spiral antiferromagnetic ground state. The nontrivial topology of these Chern bands emerges from the synergy between Re $t_{2g}$ band folding with non-collinear spin configuration and spin-orbit coupling. By constructing maximally localized Wannier functions directly from first-principles calculations, the tight-binding model is developed to describe the consecutive Chern bands. Interestingly, many-body exact diagonalization and entanglement spectrum analysis suggest that correlated states such as fractional Chern insulator and charge density wave may appear in these Chern bands with $1/3$ filling. Furthermore, the spin configurations and band topology of Chern bands are tunable by external magnetic field. The general physics from the $d$ orbitals here applies to a large class of materials such as ReAg$_2$Br$_6$, ReAu$_2$I$_6$ and ReCu$_2X_6$ ($X$=Cl, Br, I). These notable predictions in pristine 2D materials, if realized experimentally, could offer a new playground for exploring correlated topological states at elevated temperature.
△ Less
Submitted 21 December, 2024; v1 submitted 3 September, 2024;
originally announced September 2024.
-
Designing generalized elegant Bell inequalities in high dimension from a quantum bound
Authors:
Kwangil Bae,
Junghee Ryu,
Ilkwon Sohn,
Wonhyuk Lee
Abstract:
Elegant Bell inequality is well known for its distinctive property, being maximally violated by maximal entanglement, mutually unbiased bases, and symmetric informationally complete positive operator-valued measure elements. Despite its significance in quantum information theory demonstrated based on its unique violation feature, it remains the only known one with the characteristic. We present a…
▽ More
Elegant Bell inequality is well known for its distinctive property, being maximally violated by maximal entanglement, mutually unbiased bases, and symmetric informationally complete positive operator-valued measure elements. Despite its significance in quantum information theory demonstrated based on its unique violation feature, it remains the only known one with the characteristic. We present a method to construct Bell inequalities with violation feature analogous to elegant Bell inequality in higher local dimension from a simple analytic quantum bound. A Bell inequality with the generalized violation feature is derived in three dimension for the first time. It exhibits larger violation than existing Bell inequalities of similar classes, including the original elegant Bell inequality, while requiring arguably small number of measurements.
△ Less
Submitted 12 February, 2025; v1 submitted 21 August, 2024;
originally announced August 2024.
-
Multi-task Heterogeneous Graph Learning on Electronic Health Records
Authors:
Tsai Hor Chan,
Guosheng Yin,
Kyongtae Bae,
Lequan Yu
Abstract:
Learning electronic health records (EHRs) has received emerging attention because of its capability to facilitate accurate medical diagnosis. Since the EHRs contain enriched information specifying complex interactions between entities, modeling EHRs with graphs is shown to be effective in practice. The EHRs, however, present a great degree of heterogeneity, sparsity, and complexity, which hamper t…
▽ More
Learning electronic health records (EHRs) has received emerging attention because of its capability to facilitate accurate medical diagnosis. Since the EHRs contain enriched information specifying complex interactions between entities, modeling EHRs with graphs is shown to be effective in practice. The EHRs, however, present a great degree of heterogeneity, sparsity, and complexity, which hamper the performance of most of the models applied to them. Moreover, existing approaches modeling EHRs often focus on learning the representations for a single task, overlooking the multi-task nature of EHR analysis problems and resulting in limited generalizability across different tasks. In view of these limitations, we propose a novel framework for EHR modeling, namely MulT-EHR (Multi-Task EHR), which leverages a heterogeneous graph to mine the complex relations and model the heterogeneity in the EHRs. To mitigate the large degree of noise, we introduce a denoising module based on the causal inference framework to adjust for severe confounding effects and reduce noise in the EHR data. Additionally, since our model adopts a single graph neural network for simultaneous multi-task prediction, we design a multi-task learning module to leverage the inter-task knowledge to regularize the training process. Extensive empirical studies on MIMIC-III and MIMIC-IV datasets validate that the proposed method consistently outperforms the state-of-the-art designs in four popular EHR analysis tasks -- drug recommendation, and predictions of the length of stay, mortality, and readmission. Thorough ablation studies demonstrate the robustness of our method upon variations to key components and hyperparameters.
△ Less
Submitted 14 August, 2024;
originally announced August 2024.
-
EXAONE 3.0 7.8B Instruction Tuned Language Model
Authors:
LG AI Research,
:,
Soyoung An,
Kyunghoon Bae,
Eunbi Choi,
Stanley Jungkyu Choi,
Yemuk Choi,
Seokhee Hong,
Yeonjung Hong,
Junwon Hwang,
Hyojin Jeon,
Gerrard Jeongwon Jo,
Hyunjik Jo,
Jiyeon Jung,
Yountae Jung,
Euisoon Kim,
Hyosang Kim,
Joonkee Kim,
Seonghwan Kim,
Soyeon Kim,
Sunkyoung Kim,
Yireun Kim,
Youchul Kim,
Edward Hwayoung Lee,
Haeju Lee
, et al. (14 additional authors not shown)
Abstract:
We introduce EXAONE 3.0 instruction-tuned language model, the first open model in the family of Large Language Models (LLMs) developed by LG AI Research. Among different model sizes, we publicly release the 7.8B instruction-tuned model to promote open research and innovations. Through extensive evaluations across a wide range of public and in-house benchmarks, EXAONE 3.0 demonstrates highly compet…
▽ More
We introduce EXAONE 3.0 instruction-tuned language model, the first open model in the family of Large Language Models (LLMs) developed by LG AI Research. Among different model sizes, we publicly release the 7.8B instruction-tuned model to promote open research and innovations. Through extensive evaluations across a wide range of public and in-house benchmarks, EXAONE 3.0 demonstrates highly competitive real-world performance with instruction-following capability against other state-of-the-art open models of similar size. Our comparative analysis shows that EXAONE 3.0 excels particularly in Korean, while achieving compelling performance across general tasks and complex reasoning. With its strong real-world effectiveness and bilingual proficiency, we hope that EXAONE keeps contributing to advancements in Expert AI. Our EXAONE 3.0 instruction-tuned model is available at https://huggingface.co/LGAI-EXAONE/EXAONE-3.0-7.8B-Instruct
△ Less
Submitted 13 August, 2024; v1 submitted 7 August, 2024;
originally announced August 2024.
-
HADES: Detecting Active Directory Attacks via Whole Network Provenance Analytics
Authors:
Qi Liu,
Kaibin Bao,
Wajih Ul Hassan,
Veit Hagenmeyer
Abstract:
Due to its crucial role in identity and access management in modern enterprise networks, Active Directory (AD) is a top target of Advanced Persistence Threat (APT) actors. Conventional intrusion detection systems (IDS) excel at identifying malicious behaviors caused by malware, but often fail to detect stealthy attacks launched by APT actors. Recent advance in provenance-based IDS (PIDS) shows pro…
▽ More
Due to its crucial role in identity and access management in modern enterprise networks, Active Directory (AD) is a top target of Advanced Persistence Threat (APT) actors. Conventional intrusion detection systems (IDS) excel at identifying malicious behaviors caused by malware, but often fail to detect stealthy attacks launched by APT actors. Recent advance in provenance-based IDS (PIDS) shows promises by exposing malicious system activities in causal attack graphs. However, existing approaches are restricted to intra-machine tracing, and unable to reveal the scope of attackers' traversal inside a network. We propose HADES, the first PIDS capable of performing accurate causality-based cross-machine tracing by leveraging a novel concept called logon session based execution partitioning to overcome several challenges in cross-machine tracing. We design HADES as an efficient on-demand tracing system, which performs whole-network tracing only when it first identifies an authentication anomaly signifying an ongoing AD attack, for which we introduce a novel lightweight authentication anomaly detection model rooted in our extensive analysis of AD attacks. To triage attack alerts, we present a new algorithm integrating two key insights we identified in AD attacks. Our evaluations show that HADES outperforms both popular open source detection systems and a prominent commercial AD attack detector.
△ Less
Submitted 26 July, 2024;
originally announced July 2024.
-
Accurate and Scalable Detection and Investigation of Cyber Persistence Threats
Authors:
Qi Liu,
Muhammad Shoaib,
Mati Ur Rehman,
Kaibin Bao,
Veit Hagenmeyer,
Wajih Ul Hassan
Abstract:
In Advanced Persistent Threat (APT) attacks, achieving stealthy persistence within target systems is often crucial for an attacker's success. This persistence allows adversaries to maintain prolonged access, often evading detection mechanisms. Recognizing its pivotal role in the APT lifecycle, this paper introduces Cyber Persistence Detector (CPD), a novel system dedicated to detecting cyber persi…
▽ More
In Advanced Persistent Threat (APT) attacks, achieving stealthy persistence within target systems is often crucial for an attacker's success. This persistence allows adversaries to maintain prolonged access, often evading detection mechanisms. Recognizing its pivotal role in the APT lifecycle, this paper introduces Cyber Persistence Detector (CPD), a novel system dedicated to detecting cyber persistence through provenance analytics. CPD is founded on the insight that persistent operations typically manifest in two phases: the "persistence setup" and the subsequent "persistence execution". By causally relating these phases, we enhance our ability to detect persistent threats. First, CPD discerns setups signaling an impending persistent threat and then traces processes linked to remote connections to identify persistence execution activities. A key feature of our system is the introduction of pseudo-dependency edges (pseudo-edges), which effectively connect these disjoint phases using data provenance analysis, and expert-guided edges, which enable faster tracing and reduced log size. These edges empower us to detect persistence threats accurately and efficiently. Moreover, we propose a novel alert triage algorithm that further reduces false positives associated with persistence threats. Evaluations conducted on well-known datasets demonstrate that our system reduces the average false positive rate by 93% compared to state-of-the-art methods.
△ Less
Submitted 26 July, 2024;
originally announced July 2024.
-
Label Alignment and Reassignment with Generalist Large Language Model for Enhanced Cross-Domain Named Entity Recognition
Authors:
Ke Bao,
Chonghuan Yang
Abstract:
Named entity recognition on the in-domain supervised and few-shot settings have been extensively discussed in the NLP community and made significant progress. However, cross-domain NER, a more common task in practical scenarios, still poses a challenge for most NER methods. Previous research efforts in that area primarily focus on knowledge transfer such as correlate label information from source…
▽ More
Named entity recognition on the in-domain supervised and few-shot settings have been extensively discussed in the NLP community and made significant progress. However, cross-domain NER, a more common task in practical scenarios, still poses a challenge for most NER methods. Previous research efforts in that area primarily focus on knowledge transfer such as correlate label information from source to target domains but few works pay attention to the problem of label conflict. In this study, we introduce a label alignment and reassignment approach, namely LAR, to address this issue for enhanced cross-domain named entity recognition, which includes two core procedures: label alignment between source and target domains and label reassignment for type inference. The process of label reassignment can significantly be enhanced by integrating with an advanced large-scale language model such as ChatGPT. We conduct an extensive range of experiments on NER datasets involving both supervised and zero-shot scenarios. Empirical experimental results demonstrate the validation of our method with remarkable performance under the supervised and zero-shot out-of-domain settings compared to SOTA methods.
△ Less
Submitted 24 July, 2024;
originally announced July 2024.
-
Interim report for the International Muon Collider Collaboration (IMCC)
Authors:
C. Accettura,
S. Adrian,
R. Agarwal,
C. Ahdida,
C. Aimé,
A. Aksoy,
G. L. Alberghi,
S. Alden,
N. Amapane,
D. Amorim,
P. Andreetto,
F. Anulli,
R. Appleby,
A. Apresyan,
P. Asadi,
M. Attia Mahmoud,
B. Auchmann,
J. Back,
A. Badea,
K. J. Bae,
E. J. Bahng,
L. Balconi,
F. Balli,
L. Bandiera,
C. Barbagallo
, et al. (362 additional authors not shown)
Abstract:
The International Muon Collider Collaboration (IMCC) [1] was established in 2020 following the recommendations of the European Strategy for Particle Physics (ESPP) and the implementation of the European Strategy for Particle Physics-Accelerator R&D Roadmap by the Laboratory Directors Group [2], hereinafter referred to as the the European LDG roadmap. The Muon Collider Study (MuC) covers the accele…
▽ More
The International Muon Collider Collaboration (IMCC) [1] was established in 2020 following the recommendations of the European Strategy for Particle Physics (ESPP) and the implementation of the European Strategy for Particle Physics-Accelerator R&D Roadmap by the Laboratory Directors Group [2], hereinafter referred to as the the European LDG roadmap. The Muon Collider Study (MuC) covers the accelerator complex, detectors and physics for a future muon collider. In 2023, European Commission support was obtained for a design study of a muon collider (MuCol) [3]. This project started on 1st March 2023, with work-packages aligned with the overall muon collider studies. In preparation of and during the 2021-22 U.S. Snowmass process, the muon collider project parameters, technical studies and physics performance studies were performed and presented in great detail. Recently, the P5 panel [4] in the U.S. recommended a muon collider R&D, proposed to join the IMCC and envisages that the U.S. should prepare to host a muon collider, calling this their "muon shot". In the past, the U.S. Muon Accelerator Programme (MAP) [5] has been instrumental in studies of concepts and technologies for a muon collider.
△ Less
Submitted 28 January, 2025; v1 submitted 17 July, 2024;
originally announced July 2024.
-
Decoding Matters: Addressing Amplification Bias and Homogeneity Issue for LLM-based Recommendation
Authors:
Keqin Bao,
Jizhi Zhang,
Yang Zhang,
Xinyue Huo,
Chong Chen,
Fuli Feng
Abstract:
Adapting Large Language Models (LLMs) for recommendation requires careful consideration of the decoding process, given the inherent differences between generating items and natural language. Existing approaches often directly apply LLMs' original decoding methods. However, we find these methods encounter significant challenges: 1) amplification bias -- where standard length normalization inflates…
▽ More
Adapting Large Language Models (LLMs) for recommendation requires careful consideration of the decoding process, given the inherent differences between generating items and natural language. Existing approaches often directly apply LLMs' original decoding methods. However, we find these methods encounter significant challenges: 1) amplification bias -- where standard length normalization inflates scores for items containing tokens with generation probabilities close to 1 (termed ghost tokens), and 2) homogeneity issue -- generating multiple similar or repetitive items for a user. To tackle these challenges, we introduce a new decoding approach named Debiasing-Diversifying Decoding (D3). D3 disables length normalization for ghost tokens to alleviate amplification bias, and it incorporates a text-free assistant model to encourage tokens less frequently generated by LLMs for counteracting recommendation homogeneity. Extensive experiments on real-world datasets demonstrate the method's effectiveness in enhancing accuracy and diversity. The code is available at https://github.com/SAI990323/DecodingMatters.
△ Less
Submitted 5 November, 2024; v1 submitted 21 June, 2024;
originally announced June 2024.
-
GeoGPT4V: Towards Geometric Multi-modal Large Language Models with Geometric Image Generation
Authors:
Shihao Cai,
Keqin Bao,
Hangyu Guo,
Jizhi Zhang,
Jun Song,
Bo Zheng
Abstract:
Large language models have seen widespread adoption in math problem-solving. However, in geometry problems that usually require visual aids for better understanding, even the most advanced multi-modal models currently still face challenges in effectively using image information. High-quality data is crucial for enhancing the geometric capabilities of multi-modal models, yet existing open-source da…
▽ More
Large language models have seen widespread adoption in math problem-solving. However, in geometry problems that usually require visual aids for better understanding, even the most advanced multi-modal models currently still face challenges in effectively using image information. High-quality data is crucial for enhancing the geometric capabilities of multi-modal models, yet existing open-source datasets and related efforts are either too challenging for direct model learning or suffer from misalignment between text and images. To overcome this issue, we introduce a novel pipeline that leverages GPT-4 and GPT-4V to generate relatively basic geometry problems with aligned text and images, facilitating model learning. We have produced a dataset of 4.9K geometry problems and combined it with 19K open-source data to form our GeoGPT4V dataset. Experimental results demonstrate that the GeoGPT4V dataset significantly improves the geometry performance of various models on the MathVista and MathVision benchmarks. The code is available at https://github.com/Lanyu0303/GeoGPT4V_Project
△ Less
Submitted 17 June, 2024;
originally announced June 2024.
-
Deep Exploration of Cross-Lingual Zero-Shot Generalization in Instruction Tuning
Authors:
Janghoon Han,
Changho Lee,
Joongbo Shin,
Stanley Jungkyu Choi,
Honglak Lee,
Kynghoon Bae
Abstract:
Instruction tuning has emerged as a powerful technique, significantly boosting zero-shot performance on unseen tasks. While recent work has explored cross-lingual generalization by applying instruction tuning to multilingual models, previous studies have primarily focused on English, with a limited exploration of non-English tasks. For an in-depth exploration of cross-lingual generalization in ins…
▽ More
Instruction tuning has emerged as a powerful technique, significantly boosting zero-shot performance on unseen tasks. While recent work has explored cross-lingual generalization by applying instruction tuning to multilingual models, previous studies have primarily focused on English, with a limited exploration of non-English tasks. For an in-depth exploration of cross-lingual generalization in instruction tuning, we perform instruction tuning individually for two distinct language meta-datasets. Subsequently, we assess the performance on unseen tasks in a language different from the one used for training. To facilitate this investigation, we introduce a novel non-English meta-dataset named "KORANI" (Korean Natural Instruction), comprising 51 Korean benchmarks. Moreover, we design cross-lingual templates to mitigate discrepancies in language and instruction-format of the template between training and inference within the cross-lingual setting. Our experiments reveal consistent improvements through cross-lingual generalization in both English and Korean, outperforming baseline by average scores of 20.7\% and 13.6\%, respectively. Remarkably, these enhancements are comparable to those achieved by monolingual instruction tuning and even surpass them in some tasks. The result underscores the significance of relevant data acquisition across languages over linguistic congruence with unseen tasks during instruction tuning.
△ Less
Submitted 13 June, 2024;
originally announced June 2024.
-
Emergent Moiré fringes in direct-grown quasicrystal
Authors:
Jingwei Li,
Kejie Bao,
Honglin Sun,
Xingxu Yan,
Ting Huang,
Qicheng Zhang,
Yaoqiang Zhou,
Zhenjing Liu,
Paul Masih Das,
Jiawen You,
Jiong Zhao,
Jianbin Xu,
Xiaoqing Pan,
Yongli Mi,
Junyi Zhu,
Zhaoli Gao
Abstract:
Quasicrystals represent a category of rarely structured solids that challenge traditional periodicity in crystal materials. Recent advancements in the synthesis of two-dimensional (2D) van der Waals materials have paved the way for exploring the unique physical properties of these systems. Here, we report on the synthesis of 2D quasicrystals featuring 30° alternating twist angles between multiple…
▽ More
Quasicrystals represent a category of rarely structured solids that challenge traditional periodicity in crystal materials. Recent advancements in the synthesis of two-dimensional (2D) van der Waals materials have paved the way for exploring the unique physical properties of these systems. Here, we report on the synthesis of 2D quasicrystals featuring 30° alternating twist angles between multiple graphene layers, using chemical vapor deposition (CVD). Strikingly, we observed periodic Moiré patterns in the quasicrystal, a finding that has not been previously reported in traditional alloy-based quasicrystals. The Moiré periodicity, varying with the parity of the constituent layers, aligns with the theoretical predictions that suggest a stress cancellation mechanism in force. The emergence of Moiré fringes is attributed to the spontaneous mismatched lattice constant in the oriented graphene layers, proving the existence of atomic relaxation. This phenomenon, which has been largely understudied in graphene systems with large twist angles, has now been validated through our use of scanning transmission electron microscopy (STEM). Our CVD-grown Moiré quasicrystal provides an ideal platform for exploring the unusual physical properties that arise from Moiré periodicity within quasicrystals.
△ Less
Submitted 11 June, 2024;
originally announced June 2024.
-
Text-like Encoding of Collaborative Information in Large Language Models for Recommendation
Authors:
Yang Zhang,
Keqin Bao,
Ming Yan,
Wenjie Wang,
Fuli Feng,
Xiangnan He
Abstract:
When adapting Large Language Models for Recommendation (LLMRec), it is crucial to integrate collaborative information. Existing methods achieve this by learning collaborative embeddings in LLMs' latent space from scratch or by mapping from external models. However, they fail to represent the information in a text-like format, which may not align optimally with LLMs. To bridge this gap, we introduc…
▽ More
When adapting Large Language Models for Recommendation (LLMRec), it is crucial to integrate collaborative information. Existing methods achieve this by learning collaborative embeddings in LLMs' latent space from scratch or by mapping from external models. However, they fail to represent the information in a text-like format, which may not align optimally with LLMs. To bridge this gap, we introduce BinLLM, a novel LLMRec method that seamlessly integrates collaborative information through text-like encoding. BinLLM converts collaborative embeddings from external models into binary sequences -- a specific text format that LLMs can understand and operate on directly, facilitating the direct usage of collaborative information in text-like format by LLMs. Additionally, BinLLM provides options to compress the binary sequence using dot-decimal notation to avoid excessively long lengths. Extensive experiments validate that BinLLM introduces collaborative information in a manner better aligned with LLMs, resulting in enhanced performance. We release our code at https://github.com/zyang1580/BinLLM.
△ Less
Submitted 5 June, 2024;
originally announced June 2024.
-
The Simons Observatory: Studies of Detector Yield and Readout Noise From the First Large-Scale Deployment of Microwave Multiplexing at the Large Aperture Telescope
Authors:
Thomas P. Satterthwaite,
Zeeshan Ahmed,
Kyuyoung Bae,
Mark Devlin,
Simon Dicker,
Shannon M. Duff,
Daniel Dutcher,
Saianeesh K. Haridas,
Shawn W. Henderson,
Johannes Hubmayr,
Bradley R. Johnson,
Anna Kofman,
Jack Lashner,
Michael J. Link,
Tammy J. Lucas,
Alex Manduca,
Michael D. Niemack,
John Orlowski-Scherer,
Tristan Pinsonneault-Marotte,
Max Silva-Feaver,
Suzanne Staggs,
Eve M. Vavagiakis,
Yuhan Wang,
Kaiwen Zheng
Abstract:
The Simons Observatory is a new ground-based cosmic microwave background experiment, which is currently being commissioned in Chile's Atacama Desert. During its survey, the observatory's small aperture telescopes will map 10% of the sky in bands centered at frequencies ranging from 27 to 280 GHz to constrain cosmic inflation models, and its large aperture telescope will map 40% of the sky in the s…
▽ More
The Simons Observatory is a new ground-based cosmic microwave background experiment, which is currently being commissioned in Chile's Atacama Desert. During its survey, the observatory's small aperture telescopes will map 10% of the sky in bands centered at frequencies ranging from 27 to 280 GHz to constrain cosmic inflation models, and its large aperture telescope will map 40% of the sky in the same bands to constrain cosmological parameters and use weak lensing to study large-scale structure. To achieve these science goals, the Simons Observatory is deploying these telescopes' receivers with 60,000 state-of-the-art superconducting transition-edge sensor bolometers for its first five year survey. Reading out this unprecedented number of cryogenic sensors, however, required the development of a novel readout system. The SMuRF electronics were developed to enable high-density readout of superconducting sensors using cryogenic microwave SQUID multiplexing technology. The commissioning of the SMuRF systems at the Simons Observatory is the largest deployment to date of microwave multiplexing technology for transition-edge sensors. In this paper, we show that a significant fraction of the systems deployed so far to the Simons Observatory's large aperture telescope meet baseline specifications for detector yield and readout noise in this early phase of commissioning.
△ Less
Submitted 3 June, 2024;
originally announced June 2024.
-
Optimality conditions at infinity for nonsmooth minimax programming
Authors:
Nguyen Van Tuyen,
Kwan Deok Bae,
Do Sang Kim
Abstract:
This paper is devoted to study of optimality conditions at infinity in nonsmooth minimax programming problems and applications. By means of the limiting subdifferential and normal cone at infinity, we dirive necessary and sufficient optimality conditions of Karush--Kuhn--Tucker type for nonsmooth minimax programming problems with constraint. The obtained results are applied to a nonsmooth vector o…
▽ More
This paper is devoted to study of optimality conditions at infinity in nonsmooth minimax programming problems and applications. By means of the limiting subdifferential and normal cone at infinity, we dirive necessary and sufficient optimality conditions of Karush--Kuhn--Tucker type for nonsmooth minimax programming problems with constraint. The obtained results are applied to a nonsmooth vector optimization problem.
△ Less
Submitted 16 May, 2024;
originally announced May 2024.
-
A Mixture of Experts Approach to 3D Human Motion Prediction
Authors:
Edmund Shieh,
Joshua Lee Franco,
Kang Min Bae,
Tej Lalvani
Abstract:
This project addresses the challenge of human motion prediction, a critical area for applications such as au- tonomous vehicle movement detection. Previous works have emphasized the need for low inference times to provide real time performance for applications like these. Our primary objective is to critically evaluate existing model ar- chitectures, identifying their advantages and opportunities…
▽ More
This project addresses the challenge of human motion prediction, a critical area for applications such as au- tonomous vehicle movement detection. Previous works have emphasized the need for low inference times to provide real time performance for applications like these. Our primary objective is to critically evaluate existing model ar- chitectures, identifying their advantages and opportunities for improvement by replicating the state-of-the-art (SOTA) Spatio-Temporal Transformer model as best as possible given computational con- straints. These models have surpassed the limitations of RNN-based models and have demonstrated the ability to generate plausible motion sequences over both short and long term horizons through the use of spatio-temporal rep- resentations. We also propose a novel architecture to ad- dress challenges of real time inference speed by incorpo- rating a Mixture of Experts (MoE) block within the Spatial- Temporal (ST) attention layer. The particular variation that is used is Soft MoE, a fully-differentiable sparse Transformer that has shown promising ability to enable larger model capacity at lower inference cost. We make out code publicly available at https://github.com/edshieh/motionprediction
△ Less
Submitted 9 May, 2024;
originally announced May 2024.
-
Instruction Matters: A Simple yet Effective Task Selection for Optimized Instruction Tuning of Specific Tasks
Authors:
Changho Lee,
Janghoon Han,
Seonghyeon Ye,
Stanley Jungkyu Choi,
Honglak Lee,
Kyunghoon Bae
Abstract:
Instruction tuning has been proven effective in enhancing zero-shot generalization across various tasks and in improving the performance of specific tasks. For task-specific improvements, strategically selecting and training on related tasks that provide meaningful supervision is crucial, as this approach enhances efficiency and prevents performance degradation from learning irrelevant tasks. In t…
▽ More
Instruction tuning has been proven effective in enhancing zero-shot generalization across various tasks and in improving the performance of specific tasks. For task-specific improvements, strategically selecting and training on related tasks that provide meaningful supervision is crucial, as this approach enhances efficiency and prevents performance degradation from learning irrelevant tasks. In this light, we introduce a simple yet effective task selection method that leverages instruction information alone to identify relevant tasks, optimizing instruction tuning for specific tasks. Our method is significantly more efficient than traditional approaches, which require complex measurements of pairwise transferability between tasks or the creation of data samples for the target task. Additionally, by aligning the model with the unique instructional template style of the meta-dataset, we enhance its ability to granularly discern relevant tasks, leading to improved overall performance. Experimental results demonstrate that training on a small set of tasks, chosen solely based on the instructions, results in substantial improvements in performance on benchmarks such as P3, Big-Bench, NIV2, and Big-Bench Hard. Significantly, these improvements surpass those achieved by prior task selection methods, highlighting the superiority of our approach.
△ Less
Submitted 16 October, 2024; v1 submitted 25 April, 2024;
originally announced April 2024.