Skip to main content

Showing 1–50 of 161 results for author: Bao, K

.
  1. arXiv:2506.09820  [pdf, ps, other

    cs.CL cs.AI cs.LG

    CoRT: Code-integrated Reasoning within Thinking

    Authors: Chengpeng Li, Zhengyang Tang, Ziniu Li, Mingfeng Xue, Keqin Bao, Tian Ding, Ruoyu Sun, Benyou Wang, Xiang Wang, Junyang Lin, Dayiheng Liu

    Abstract: Large Reasoning Models (LRMs) like o1 and DeepSeek-R1 have shown remarkable progress in natural language reasoning with long chain-of-thought (CoT), yet they remain inefficient or inaccurate when handling complex mathematical operations. Addressing these limitations through computational tools (e.g., computation libraries and symbolic solvers) is promising, but it introduces a technical challenge:… ▽ More

    Submitted 12 June, 2025; v1 submitted 11 June, 2025; originally announced June 2025.

    Comments: work in progress

  2. arXiv:2506.07438  [pdf, other

    cs.CL

    LG-ANNA-Embedding technical report

    Authors: Jooyoung Choi, Hyun Kim, Hansol Jang, Changwook Jun, Kyunghoon Bae, Hyewon Choi, Stanley Jungkyu Choi, Honglak Lee, Chulmin Yun

    Abstract: This report presents a unified instruction-based framework for learning generalized text embeddings optimized for both information retrieval (IR) and non-IR tasks. Built upon a decoder-only large language model (Mistral-7B), our approach combines in-context learning, soft supervision, and adaptive hard-negative mining to generate context-aware embeddings without task-specific fine-tuning. Structur… ▽ More

    Submitted 9 June, 2025; originally announced June 2025.

    Comments: 10 pages

  3. arXiv:2506.03569  [pdf, ps, other

    cs.CL

    MiMo-VL Technical Report

    Authors: Xiaomi LLM-Core Team, :, Zihao Yue, Zhenru Lin, Yifan Song, Weikun Wang, Shuhuai Ren, Shuhao Gu, Shicheng Li, Peidian Li, Liang Zhao, Lei Li, Kainan Bao, Hao Tian, Hailin Zhang, Gang Wang, Dawei Zhu, Cici, Chenhong He, Bowen Ye, Bowen Shen, Zihan Zhang, Zihan Jiang, Zhixian Zheng, Zhichao Song , et al. (50 additional authors not shown)

    Abstract: We open-source MiMo-VL-7B-SFT and MiMo-VL-7B-RL, two powerful vision-language models delivering state-of-the-art performance in both general visual understanding and multimodal reasoning. MiMo-VL-7B-RL outperforms Qwen2.5-VL-7B on 35 out of 40 evaluated tasks, and scores 59.4 on OlympiadBench, surpassing models with up to 78B parameters. For GUI grounding applications, it sets a new standard with… ▽ More

    Submitted 4 June, 2025; originally announced June 2025.

    Comments: 32 pages

  4. arXiv:2506.00441  [pdf, ps, other

    cs.IR

    K-order Ranking Preference Optimization for Large Language Models

    Authors: Shihao Cai, Chongming Gao, Yang Zhang, Wentao Shi, Jizhi Zhang, Keqin Bao, Qifan Wang, Fuli Feng

    Abstract: To adapt large language models (LLMs) to ranking tasks, existing list-wise methods, represented by list-wise Direct Preference Optimization (DPO), focus on optimizing partial-order or full-order list ranking consistency for LLMs to enhance their ranking abilities. However, we argue that optimizing top-K ranking consistency could be more appropriate for real-world applications. There are two main r… ▽ More

    Submitted 31 May, 2025; originally announced June 2025.

  5. arXiv:2505.20065  [pdf, ps, other

    cs.LG cs.AI

    SafeDPO: A Simple Approach to Direct Preference Optimization with Enhanced Safety

    Authors: Geon-Hyeong Kim, Youngsoo Jang, Yu Jin Kim, Byoungjip Kim, Honglak Lee, Kyunghoon Bae, Moontae Lee

    Abstract: As Large Language Models (LLMs) continue to advance and find applications across a growing number of fields, ensuring the safety of LLMs has become increasingly critical. To address safety concerns, recent studies have proposed integrating safety constraints into Reinforcement Learning from Human Feedback (RLHF). However, these approaches tend to be complex, as they encompass complicated procedure… ▽ More

    Submitted 26 May, 2025; originally announced May 2025.

    Comments: 34 pages

  6. arXiv:2505.17123  [pdf, ps, other

    cs.CL

    MTR-Bench: A Comprehensive Benchmark for Multi-Turn Reasoning Evaluation

    Authors: Xiaoyuan Li, Keqin Bao, Yubo Ma, Moxin Li, Wenjie Wang, Rui Men, Yichang Zhang, Fuli Feng, Dayiheng Liu, Junyang Lin

    Abstract: Recent advances in Large Language Models (LLMs) have shown promising results in complex reasoning tasks. However, current evaluations predominantly focus on single-turn reasoning scenarios, leaving interactive tasks largely unexplored. We attribute it to the absence of comprehensive datasets and scalable automatic evaluation protocols. To fill these gaps, we present MTR-Bench for LLMs' Multi-Turn… ▽ More

    Submitted 25 May, 2025; v1 submitted 21 May, 2025; originally announced May 2025.

    Comments: Under Review

  7. arXiv:2505.12632  [pdf, other

    cs.CV cs.AI cs.CL cs.LG

    Scalable Video-to-Dataset Generation for Cross-Platform Mobile Agents

    Authors: Yunseok Jang, Yeda Song, Sungryull Sohn, Lajanugen Logeswaran, Tiange Luo, Dong-Ki Kim, Kyunghoon Bae, Honglak Lee

    Abstract: Recent advancements in Large Language Models (LLMs) and Vision-Language Models (VLMs) have sparked significant interest in developing GUI visual agents. We introduce MONDAY (Mobile OS Navigation Task Dataset for Agents from YouTube), a large-scale dataset of 313K annotated frames from 20K instructional videos capturing diverse real-world mobile OS navigation across multiple platforms. Models that… ▽ More

    Submitted 18 May, 2025; originally announced May 2025.

    Comments: CVPR 2025

  8. arXiv:2505.09388  [pdf, other

    cs.CL

    Qwen3 Technical Report

    Authors: An Yang, Anfeng Li, Baosong Yang, Beichen Zhang, Binyuan Hui, Bo Zheng, Bowen Yu, Chang Gao, Chengen Huang, Chenxu Lv, Chujie Zheng, Dayiheng Liu, Fan Zhou, Fei Huang, Feng Hu, Hao Ge, Haoran Wei, Huan Lin, Jialong Tang, Jian Yang, Jianhong Tu, Jianwei Zhang, Jianxin Yang, Jiaxi Yang, Jing Zhou , et al. (35 additional authors not shown)

    Abstract: In this work, we present Qwen3, the latest version of the Qwen model family. Qwen3 comprises a series of large language models (LLMs) designed to advance performance, efficiency, and multilingual capabilities. The Qwen3 series includes models of both dense and Mixture-of-Expert (MoE) architectures, with parameter scales ranging from 0.6 to 235 billion. A key innovation in Qwen3 is the integration… ▽ More

    Submitted 14 May, 2025; originally announced May 2025.

  9. arXiv:2505.07608  [pdf, ps, other

    cs.CL cs.AI cs.LG

    MiMo: Unlocking the Reasoning Potential of Language Model -- From Pretraining to Posttraining

    Authors: LLM-Core Xiaomi, :, Bingquan Xia, Bowen Shen, Cici, Dawei Zhu, Di Zhang, Gang Wang, Hailin Zhang, Huaqiu Liu, Jiebao Xiao, Jinhao Dong, Liang Zhao, Peidian Li, Peng Wang, Shihua Yu, Shimao Chen, Weikun Wang, Wenhan Ma, Xiangwei Deng, Yi Huang, Yifan Song, Zihan Jiang, Bowen Ye, Can Cai , et al. (40 additional authors not shown)

    Abstract: We present MiMo-7B, a large language model born for reasoning tasks, with optimization across both pre-training and post-training stages. During pre-training, we enhance the data preprocessing pipeline and employ a three-stage data mixing strategy to strengthen the base model's reasoning potential. MiMo-7B-Base is pre-trained on 25 trillion tokens, with additional Multi-Token Prediction objective… ▽ More

    Submitted 5 June, 2025; v1 submitted 12 May, 2025; originally announced May 2025.

  10. arXiv:2505.04021  [pdf, other

    cs.DC cs.AI cs.LG cs.PF

    Prism: Unleashing GPU Sharing for Cost-Efficient Multi-LLM Serving

    Authors: Shan Yu, Jiarong Xing, Yifan Qiao, Mingyuan Ma, Yangmin Li, Yang Wang, Shuo Yang, Zhiqiang Xie, Shiyi Cao, Ke Bao, Ion Stoica, Harry Xu, Ying Sheng

    Abstract: Serving large language models (LLMs) is expensive, especially for providers hosting many models, making cost reduction essential. The unique workload patterns of serving multiple LLMs (i.e., multi-LLM serving) create new opportunities and challenges for this task. The long-tail popularity of models and their long idle periods present opportunities to improve utilization through GPU sharing. Howeve… ▽ More

    Submitted 12 May, 2025; v1 submitted 6 May, 2025; originally announced May 2025.

  11. arXiv:2505.03777  [pdf, other

    cs.LG

    MolMole: Molecule Mining from Scientific Literature

    Authors: LG AI Research, Sehyun Chun, Jiye Kim, Ahra Jo, Yeonsik Jo, Seungyul Oh, Seungjun Lee, Kwangrok Ryoo, Jongmin Lee, Seung Hwan Kim, Byung Jun Kang, Soonyoung Lee, Jun Ha Park, Chanwoo Moon, Jiwon Ham, Haein Lee, Heejae Han, Jaeseung Byun, Soojong Do, Minju Ha, Dongyun Kim, Kyunghoon Bae, Woohyung Lim, Edward Hwayoung Lee, Yongmin Park , et al. (9 additional authors not shown)

    Abstract: The extraction of molecular structures and reaction data from scientific documents is challenging due to their varied, unstructured chemical formats and complex document layouts. To address this, we introduce MolMole, a vision-based deep learning framework that unifies molecule detection, reaction diagram parsing, and optical chemical structure recognition (OCSR) into a single pipeline for automat… ▽ More

    Submitted 7 May, 2025; v1 submitted 30 April, 2025; originally announced May 2025.

    Comments: 15 pages, 12 figures

  12. arXiv:2504.21417  [pdf, other

    physics.acc-ph hep-ex hep-ph physics.ins-det

    The Muon Collider

    Authors: Carlotta Accettura, Simon Adrian, Rohit Agarwal, Claudia Ahdida, Chiara Aime', Avni Aksoy, Gian Luigi Alberghi, Siobhan Alden, Luca Alfonso, Muhammad Ali, Anna Rita Altamura, Nicola Amapane, Kathleen Amm, David Amorim, Paolo Andreetto, Fabio Anulli, Ludovica Aperio Bella, Rob Appleby, Artur Apresyan, Pouya Asadi, Mohammed Attia Mahmoud, Bernhard Auchmann, John Back, Anthony Badea, Kyu Jung Bae , et al. (433 additional authors not shown)

    Abstract: Muons offer a unique opportunity to build a compact high-energy electroweak collider at the 10 TeV scale. A Muon Collider enables direct access to the underlying simplicity of the Standard Model and unparalleled reach beyond it. It will be a paradigm-shifting tool for particle physics representing the first collider to combine the high-energy reach of a proton collider and the high precision of an… ▽ More

    Submitted 30 April, 2025; originally announced April 2025.

    Comments: 406 pages, supplementary report to the European Strategy for Particle Physics - 2026 update

  13. arXiv:2504.13283  [pdf

    cond-mat.mes-hall

    Demonstration of highly scaled AlScN ferroelectric diode memory with storage density > 100 Mbit/mm$^2$

    Authors: Zekun Hu, Hyunmin Cho, Rajeev Kumar Rai, Kefei Bao, Yinuo Zhang, Yunfei He, Yaoyang Ji, Chloe Leblanc, Kwan-Ho Kim, Zirun Han, Zhen Qiu, Xingyu Du, Eric A. Stach, Roy Olsson, Deep Jariwala

    Abstract: Wurtzite nitride ferroelectric materials have emerged as promising candidates for next-generation memory applications due to their exceptional polarization properties and compatibility with conventional semiconductor processing techniques. Here, we demonstrate the first successful scaling of Aluminum Scandium Nitride (AlScN) ferroelectric diode (FeDiode) memory down to 50 nm device diameters while… ▽ More

    Submitted 17 April, 2025; originally announced April 2025.

    Comments: 4 figures and 1 table

  14. arXiv:2503.15871  [pdf, other

    cs.CV

    MASH-VLM: Mitigating Action-Scene Hallucination in Video-LLMs through Disentangled Spatial-Temporal Representations

    Authors: Kyungho Bae, Jinhyung Kim, Sihaeng Lee, Soonyoung Lee, Gunhee Lee, Jinwoo Choi

    Abstract: In this work, we tackle action-scene hallucination in Video Large Language Models (Video-LLMs), where models incorrectly predict actions based on the scene context or scenes based on observed actions. We observe that existing Video-LLMs often suffer from action-scene hallucination due to two main factors. First, existing Video-LLMs intermingle spatial and temporal features by applying an attention… ▽ More

    Submitted 20 March, 2025; originally announced March 2025.

    Comments: Accepted for CVPR 2025

  15. arXiv:2503.12524  [pdf, other

    cs.CL cs.AI

    EXAONE Deep: Reasoning Enhanced Language Models

    Authors: LG AI Research, Kyunghoon Bae, Eunbi Choi, Kibong Choi, Stanley Jungkyu Choi, Yemuk Choi, Seokhee Hong, Junwon Hwang, Hyojin Jeon, Kijeong Jeon, Gerrard Jeongwon Jo, Hyunjik Jo, Jiyeon Jung, Hyosang Kim, Joonkee Kim, Seonghwan Kim, Soyeon Kim, Sunkyoung Kim, Yireun Kim, Yongil Kim, Youchul Kim, Edward Hwayoung Lee, Haeju Lee, Honglak Lee, Jinsik Lee , et al. (7 additional authors not shown)

    Abstract: We present EXAONE Deep series, which exhibits superior capabilities in various reasoning tasks, including math and coding benchmarks. We train our models mainly on the reasoning-specialized dataset that incorporates long streams of thought processes. Evaluation results show that our smaller models, EXAONE Deep 2.4B and 7.8B, outperform other models of comparable size, while the largest model, EXAO… ▽ More

    Submitted 19 March, 2025; v1 submitted 16 March, 2025; originally announced March 2025.

    Comments: arXiv admin note: substantial text overlap with arXiv:2412.04862, arXiv:2408.03541

  16. arXiv:2503.02784  [pdf, other

    cs.CY cs.AI

    Do Not Trust Licenses You See: Dataset Compliance Requires Massive-Scale AI-Powered Lifecycle Tracing

    Authors: Jaekyeom Kim, Sungryull Sohn, Gerrard Jeongwon Jo, Jihoon Choi, Kyunghoon Bae, Hwayoung Lee, Yongmin Park, Honglak Lee

    Abstract: This paper argues that a dataset's legal risk cannot be accurately assessed by its license terms alone; instead, tracking dataset redistribution and its full lifecycle is essential. However, this process is too complex for legal experts to handle manually at scale. Tracking dataset provenance, verifying redistribution rights, and assessing evolving legal risks across multiple stages require a leve… ▽ More

    Submitted 14 March, 2025; v1 submitted 4 March, 2025; originally announced March 2025.

  17. arXiv:2502.11638  [pdf

    cs.CV

    Safeguarding AI in Medical Imaging: Post-Hoc Out-of-Distribution Detection with Normalizing Flows

    Authors: Dariush Lotfi, Mohammad-Ali Nikouei Mahani, Mohamad Koohi-Moghadam, Kyongtae Ty Bae

    Abstract: In AI-driven medical imaging, the failure to detect out-of-distribution (OOD) data poses a severe risk to clinical reliability, potentially leading to critical diagnostic errors. Current OOD detection methods often demand impractical retraining or modifications to pre-trained models, hindering their adoption in regulated clinical environments. To address this challenge, we propose a post-hoc norma… ▽ More

    Submitted 28 May, 2025; v1 submitted 17 February, 2025; originally announced February 2025.

  18. arXiv:2502.11393  [pdf, other

    cs.CL

    HellaSwag-Pro: A Large-Scale Bilingual Benchmark for Evaluating the Robustness of LLMs in Commonsense Reasoning

    Authors: Xiaoyuan Li, Moxin Li, Rui Men, Yichang Zhang, Keqin Bao, Wenjie Wang, Fuli Feng, Dayiheng Liu, Junyang Lin

    Abstract: Large language models (LLMs) have shown remarkable capabilities in commonsense reasoning; however, some variations in questions can trigger incorrect responses. Do these models truly understand commonsense knowledge, or just memorize expression patterns? To investigate this question, we present the first extensive robustness evaluation of LLMs in commonsense reasoning. We introduce HellaSwag-Pro,… ▽ More

    Submitted 25 May, 2025; v1 submitted 16 February, 2025; originally announced February 2025.

    Comments: ACL 2025 Findings

  19. arXiv:2502.05567  [pdf, other

    cs.CL cs.AI cs.LG

    ATLAS: Autoformalizing Theorems through Lifting, Augmentation, and Synthesis of Data

    Authors: Xiaoyang Liu, Kangjie Bao, Jiashuo Zhang, Yunqi Liu, Yuntian Liu, Yu Chen, Yang Jiao, Tao Luo

    Abstract: Autoformalization, the automatic translation of mathematical content from natural language into machine-verifiable formal languages, has seen significant progress driven by advances in large language models (LLMs). Nonetheless, a primary barrier to further improvements is the limited availability of parallel corpora that map informal mathematical text to its formal counterpart. To address this lim… ▽ More

    Submitted 19 May, 2025; v1 submitted 8 February, 2025; originally announced February 2025.

  20. arXiv:2502.02810  [pdf, other

    cs.LG cs.AI physics.chem-ph q-bio.BM

    Mol-LLM: Multimodal Generalist Molecular LLM with Improved Graph Utilization

    Authors: Chanhui Lee, Hanbum Ko, Yuheon Song, YongJun Jeong, Rodrigo Hormazabal, Sehui Han, Kyunghoon Bae, Sungbin Lim, Sungwoong Kim

    Abstract: Recent advances in large language models (LLMs) have led to models that tackle diverse molecular tasks, such as chemical reaction prediction and molecular property prediction. Large-scale molecular instruction-tuning datasets have enabled sequence-only (e.g., SMILES or SELFIES) generalist molecular LLMs, and researchers are now exploring multimodal approaches that incorporate molecular structural… ▽ More

    Submitted 26 May, 2025; v1 submitted 4 February, 2025; originally announced February 2025.

    Comments: 9 pages, 5 figures

  21. arXiv:2502.01521  [pdf, other

    cs.LG cs.AI cs.RO

    Toward Task Generalization via Memory Augmentation in Meta-Reinforcement Learning

    Authors: Kaixi Bao, Chenhao Li, Yarden As, Andreas Krause, Marco Hutter

    Abstract: Agents trained via reinforcement learning (RL) often struggle to perform well on tasks that differ from those encountered during training. This limitation presents a challenge to the broader deployment of RL in diverse and dynamic task settings. In this work, we introduce memory augmentation, a memory-based RL approach to improve task generalization. Our approach leverages task-structured augmenta… ▽ More

    Submitted 7 May, 2025; v1 submitted 3 February, 2025; originally announced February 2025.

  22. arXiv:2501.09241  [pdf, other

    astro-ph.IM

    Simons Observatory: Characterization of the Large Aperture Telescope Receiver

    Authors: Tanay Bhandarkar, Saianeesh K. Haridas, Jeff Iuliano, Anna Kofman, Alex Manduca, Karen Perez Sarmiento, John Orlowski-Scherer, Thomas P. Satterthwaite, Yuhan Wang, Zeeshan Ahmed, Jason E. Austermann, Kyuyoung Bae, Gabriele Coppi, Mark J. Devlin, Simon R Dicker, Peter N. Dow, Shannon M. Duff, Daniel Dutcher, Nicholas Galitzki, Jon E. Gudmundsson, Shawn W. Henderson, Johannes Hubmayr, Bradley R. Johnson, Matthew A. Koc, Brian J. Koopman , et al. (19 additional authors not shown)

    Abstract: The Simons Observatory (SO) is a ground-based cosmic microwave background (CMB) survey experiment that currently consists of three 0.42m small-aperture telescopes (SATs) and one 6m large-aperture telescope (LAT), located at an elevation of 5200m in the Atacama Desert in Chile. At the LAT's focal plane, SO will install >62,000 transition-edge sensor detectors across 13 optics tubes (OTs) within the… ▽ More

    Submitted 15 January, 2025; originally announced January 2025.

  23. arXiv:2412.19613  [pdf, other

    cond-mat.mes-hall cond-mat.mtrl-sci

    Anisotropic Band Flattening in Twisted Bilayer of M-Valley MXenes

    Authors: Kejie Bao, Huan Wang, Zhaochen Liu, jing Wang

    Abstract: Experimental studies on moiré materials have predominantly focused on twisted hexagonal lattice with low-energy states near the $Γ$- or K-points. These materials, characterized by isotropic low-energy dispersion, are fundamentally distinct from those with anisotropic properties. Here we introduce a series of semiconducting transition metal carbides (MXenes) $M_2$C$T_2$ ($M$ = Ti, Zr, Hf, Sc, Y;… ▽ More

    Submitted 27 December, 2024; originally announced December 2024.

  24. arXiv:2412.15115  [pdf, other

    cs.CL

    Qwen2.5 Technical Report

    Authors: Qwen, :, An Yang, Baosong Yang, Beichen Zhang, Binyuan Hui, Bo Zheng, Bowen Yu, Chengyuan Li, Dayiheng Liu, Fei Huang, Haoran Wei, Huan Lin, Jian Yang, Jianhong Tu, Jianwei Zhang, Jianxin Yang, Jiaxi Yang, Jingren Zhou, Junyang Lin, Kai Dang, Keming Lu, Keqin Bao, Kexin Yang, Le Yu , et al. (19 additional authors not shown)

    Abstract: In this report, we introduce Qwen2.5, a comprehensive series of large language models (LLMs) designed to meet diverse needs. Compared to previous iterations, Qwen 2.5 has been significantly improved during both the pre-training and post-training stages. In terms of pre-training, we have scaled the high-quality pre-training datasets from the previous 7 trillion tokens to 18 trillion tokens. This pr… ▽ More

    Submitted 2 January, 2025; v1 submitted 19 December, 2024; originally announced December 2024.

  25. arXiv:2412.04862  [pdf, other

    cs.CL

    EXAONE 3.5: Series of Large Language Models for Real-world Use Cases

    Authors: LG AI Research, Soyoung An, Kyunghoon Bae, Eunbi Choi, Kibong Choi, Stanley Jungkyu Choi, Seokhee Hong, Junwon Hwang, Hyojin Jeon, Gerrard Jeongwon Jo, Hyunjik Jo, Jiyeon Jung, Yountae Jung, Hyosang Kim, Joonkee Kim, Seonghwan Kim, Soyeon Kim, Sunkyoung Kim, Yireun Kim, Yongil Kim, Youchul Kim, Edward Hwayoung Lee, Haeju Lee, Honglak Lee, Jinsik Lee , et al. (8 additional authors not shown)

    Abstract: This technical report introduces the EXAONE 3.5 instruction-tuned language models, developed and released by LG AI Research. The EXAONE 3.5 language models are offered in three configurations: 32B, 7.8B, and 2.4B. These models feature several standout capabilities: 1) exceptional instruction following capabilities in real-world scenarios, achieving the highest scores across seven benchmarks, 2) ou… ▽ More

    Submitted 9 December, 2024; v1 submitted 6 December, 2024; originally announced December 2024.

    Comments: arXiv admin note: text overlap with arXiv:2408.03541

  26. arXiv:2411.14710  [pdf, ps, other

    quant-ph

    Uncorrectable-error-injection based reliable and secure quantum communication

    Authors: IlKwon Sohn, Boseon Kim, Kwangil Bae, Wooyeong Song, Chankyun Lee, Kabgyun Jeong, Wonhyuk Lee

    Abstract: Quantum networks aim to communicate distant quantum devices, such as quantum computers. In this context, a critical requirement is the secure and reliable transmission of arbitrary quantum states. Quantum teleportation is widely used to transmit arbitrary quantum states. However, it requires entanglement swapping and purification to distribute entanglements over long distances, introducing signifi… ▽ More

    Submitted 21 November, 2024; originally announced November 2024.

    Comments: 7 pages, 4 figures

  27. MuCol Milestone Report No. 5: Preliminary Parameters

    Authors: Carlotta Accettura, Simon Adrian, Rohit Agarwal, Claudia Ahdida, Chiara Aimé, Avni Aksoy, Gian Luigi Alberghi, Siobhan Alden, Luca Alfonso, Nicola Amapane, David Amorim, Paolo Andreetto, Fabio Anulli, Rob Appleby, Artur Apresyan, Pouya Asadi, Mohammed Attia Mahmoud, Bernhard Auchmann, John Back, Anthony Badea, Kyu Jung Bae, E. J. Bahng, Lorenzo Balconi, Fabrice Balli, Laura Bandiera , et al. (369 additional authors not shown)

    Abstract: This document is comprised of a collection of updated preliminary parameters for the key parts of the muon collider. The updated preliminary parameters follow on from the October 2023 Tentative Parameters Report. Particular attention has been given to regions of the facility that are believed to hold greater technical uncertainty in their design and that have a strong impact on the cost and power… ▽ More

    Submitted 5 November, 2024; originally announced November 2024.

  28. arXiv:2410.23136  [pdf, other

    cs.IR

    Real-Time Personalization for LLM-based Recommendation with Customized In-Context Learning

    Authors: Keqin Bao, Ming Yan, Yang Zhang, Jizhi Zhang, Wenjie Wang, Fuli Feng, Xiangnan He

    Abstract: Frequently updating Large Language Model (LLM)-based recommender systems to adapt to new user interests -- as done for traditional ones -- is impractical due to high training costs, even with acceleration methods. This work explores adapting to dynamic user interests without any model updates by leveraging In-Context Learning (ICL), which allows LLMs to learn new tasks from few-shot examples provi… ▽ More

    Submitted 30 October, 2024; originally announced October 2024.

  29. arXiv:2410.22809  [pdf, other

    cs.IR cs.AI

    Causality-Enhanced Behavior Sequence Modeling in LLMs for Personalized Recommendation

    Authors: Yang Zhang, Juntao You, Yimeng Bai, Jizhi Zhang, Keqin Bao, Wenjie Wang, Tat-Seng Chua

    Abstract: Recent advancements in recommender systems have focused on leveraging Large Language Models (LLMs) to improve user preference modeling, yielding promising outcomes. However, current LLM-based approaches struggle to fully leverage user behavior sequences, resulting in suboptimal preference modeling for personalized recommendations. In this study, we propose a novel Counterfactual Fine-Tuning (CFT)… ▽ More

    Submitted 30 October, 2024; originally announced October 2024.

  30. arXiv:2410.20027  [pdf, other

    cs.IR cs.AI

    Agentic Feedback Loop Modeling Improves Recommendation and User Simulation

    Authors: Shihao Cai, Jizhi Zhang, Keqin Bao, Chongming Gao, Qifan Wang, Fuli Feng, Xiangnan He

    Abstract: Large language model-based agents are increasingly applied in the recommendation field due to their extensive knowledge and strong planning capabilities. While prior research has primarily focused on enhancing either the recommendation agent or the user agent individually, the collaborative interaction between the two has often been overlooked. Towards this research gap, we propose a novel framewo… ▽ More

    Submitted 1 May, 2025; v1 submitted 25 October, 2024; originally announced October 2024.

  31. arXiv:2409.13136  [pdf, other

    cs.LG cs.CR cs.CV

    Federated Learning with Label-Masking Distillation

    Authors: Jianghu Lu, Shikun Li, Kexin Bao, Pengju Wang, Zhenxing Qian, Shiming Ge

    Abstract: Federated learning provides a privacy-preserving manner to collaboratively train models on data distributed over multiple local clients via the coordination of a global server. In this paper, we focus on label distribution skew in federated learning, where due to the different user behavior of the client, label distributions between different clients are significantly different. When faced with su… ▽ More

    Submitted 19 September, 2024; originally announced September 2024.

    Comments: Accepted by ACM MM 2023

  32. arXiv:2409.13006  [pdf

    eess.IV cs.CV

    AutoPET III Challenge: PET/CT Semantic Segmentation

    Authors: Reza Safdari, Mohammad Koohi-Moghaddam, Kyongtae Tyler Bae

    Abstract: In this study, we implemented a two-stage deep learning-based approach to segment lesions in PET/CT images for the AutoPET III challenge. The first stage utilized a DynUNet model for coarse segmentation, identifying broad regions of interest. The second stage refined this segmentation using an ensemble of SwinUNETR, SegResNet, and UNet models. Preprocessing involved resampling images to a common r… ▽ More

    Submitted 19 September, 2024; originally announced September 2024.

  33. arXiv:2409.07688  [pdf, other

    hep-ph

    Charged Higgs Boson Phenomenology in the Dark Z mediated Fermionic Dark Matter Model

    Authors: Kyu Jung Bae, Jinn-Ouk Gong, Dong-Won Jung, Kang Young Lee, Chaehyun Yu, Chan Beom Park

    Abstract: We study the phenomenology of the charged Higgs boson, $H^\pm$,appearing in the fermionic dark matter model mediated by the dark $Z$ boson. This model is in favor of the light dark $Z$ boson, $Z'$, and the light additional neutral Higgs boson, $h$. We find that $H^\pm \to W^\pm h$ and the $H^\pm \to W^\pm Z'$ are dominant decay channels. Thus the promising final states are trilepton signals,… ▽ More

    Submitted 19 September, 2024; v1 submitted 11 September, 2024; originally announced September 2024.

    Comments: 12 pages, 4 figures

    Report number: APCTP Pre2024-015

  34. arXiv:2409.01741  [pdf, other

    cond-mat.mes-hall cond-mat.mtrl-sci cond-mat.str-el

    Consecutive Flat Chern Bands and Correlated States in Monolayer ReAg$_2$Cl$_6$

    Authors: Kejie Bao, Rui Shi, Huan Wang, Jiaxuan Guo, Jing Wang

    Abstract: We theoretically propose that van der Waals monolayer ReAg$_2$Cl$_6$ have four consecutive flat Chern bands in the 120$^\circ$ spiral antiferromagnetic ground state. The nontrivial topology of these Chern bands emerges from the synergy between Re $t_{2g}$ band folding with non-collinear spin configuration and spin-orbit coupling. By constructing maximally localized Wannier functions directly from… ▽ More

    Submitted 21 December, 2024; v1 submitted 3 September, 2024; originally announced September 2024.

  35. arXiv:2408.11391  [pdf, other

    quant-ph

    Designing generalized elegant Bell inequalities in high dimension from a quantum bound

    Authors: Kwangil Bae, Junghee Ryu, Ilkwon Sohn, Wonhyuk Lee

    Abstract: Elegant Bell inequality is well known for its distinctive property, being maximally violated by maximal entanglement, mutually unbiased bases, and symmetric informationally complete positive operator-valued measure elements. Despite its significance in quantum information theory demonstrated based on its unique violation feature, it remains the only known one with the characteristic. We present a… ▽ More

    Submitted 12 February, 2025; v1 submitted 21 August, 2024; originally announced August 2024.

  36. arXiv:2408.07569  [pdf, other

    cs.LG cs.AI

    Multi-task Heterogeneous Graph Learning on Electronic Health Records

    Authors: Tsai Hor Chan, Guosheng Yin, Kyongtae Bae, Lequan Yu

    Abstract: Learning electronic health records (EHRs) has received emerging attention because of its capability to facilitate accurate medical diagnosis. Since the EHRs contain enriched information specifying complex interactions between entities, modeling EHRs with graphs is shown to be effective in practice. The EHRs, however, present a great degree of heterogeneity, sparsity, and complexity, which hamper t… ▽ More

    Submitted 14 August, 2024; originally announced August 2024.

    Comments: Accepted by Neural Networks

  37. arXiv:2408.03541  [pdf, ps, other

    cs.CL cs.AI

    EXAONE 3.0 7.8B Instruction Tuned Language Model

    Authors: LG AI Research, :, Soyoung An, Kyunghoon Bae, Eunbi Choi, Stanley Jungkyu Choi, Yemuk Choi, Seokhee Hong, Yeonjung Hong, Junwon Hwang, Hyojin Jeon, Gerrard Jeongwon Jo, Hyunjik Jo, Jiyeon Jung, Yountae Jung, Euisoon Kim, Hyosang Kim, Joonkee Kim, Seonghwan Kim, Soyeon Kim, Sunkyoung Kim, Yireun Kim, Youchul Kim, Edward Hwayoung Lee, Haeju Lee , et al. (14 additional authors not shown)

    Abstract: We introduce EXAONE 3.0 instruction-tuned language model, the first open model in the family of Large Language Models (LLMs) developed by LG AI Research. Among different model sizes, we publicly release the 7.8B instruction-tuned model to promote open research and innovations. Through extensive evaluations across a wide range of public and in-house benchmarks, EXAONE 3.0 demonstrates highly compet… ▽ More

    Submitted 13 August, 2024; v1 submitted 7 August, 2024; originally announced August 2024.

  38. arXiv:2407.18858  [pdf, other

    cs.CR

    HADES: Detecting Active Directory Attacks via Whole Network Provenance Analytics

    Authors: Qi Liu, Kaibin Bao, Wajih Ul Hassan, Veit Hagenmeyer

    Abstract: Due to its crucial role in identity and access management in modern enterprise networks, Active Directory (AD) is a top target of Advanced Persistence Threat (APT) actors. Conventional intrusion detection systems (IDS) excel at identifying malicious behaviors caused by malware, but often fail to detect stealthy attacks launched by APT actors. Recent advance in provenance-based IDS (PIDS) shows pro… ▽ More

    Submitted 26 July, 2024; originally announced July 2024.

    Comments: 13 pages

  39. arXiv:2407.18832  [pdf, other

    cs.CR

    Accurate and Scalable Detection and Investigation of Cyber Persistence Threats

    Authors: Qi Liu, Muhammad Shoaib, Mati Ur Rehman, Kaibin Bao, Veit Hagenmeyer, Wajih Ul Hassan

    Abstract: In Advanced Persistent Threat (APT) attacks, achieving stealthy persistence within target systems is often crucial for an attacker's success. This persistence allows adversaries to maintain prolonged access, often evading detection mechanisms. Recognizing its pivotal role in the APT lifecycle, this paper introduces Cyber Persistence Detector (CPD), a novel system dedicated to detecting cyber persi… ▽ More

    Submitted 26 July, 2024; originally announced July 2024.

    Comments: 16 pages

  40. arXiv:2407.17344  [pdf, other

    cs.CL

    Label Alignment and Reassignment with Generalist Large Language Model for Enhanced Cross-Domain Named Entity Recognition

    Authors: Ke Bao, Chonghuan Yang

    Abstract: Named entity recognition on the in-domain supervised and few-shot settings have been extensively discussed in the NLP community and made significant progress. However, cross-domain NER, a more common task in practical scenarios, still poses a challenge for most NER methods. Previous research efforts in that area primarily focus on knowledge transfer such as correlate label information from source… ▽ More

    Submitted 24 July, 2024; originally announced July 2024.

    Comments: 9 pages, 4 figures

  41. arXiv:2407.12450  [pdf, other

    physics.acc-ph hep-ex

    Interim report for the International Muon Collider Collaboration (IMCC)

    Authors: C. Accettura, S. Adrian, R. Agarwal, C. Ahdida, C. Aimé, A. Aksoy, G. L. Alberghi, S. Alden, N. Amapane, D. Amorim, P. Andreetto, F. Anulli, R. Appleby, A. Apresyan, P. Asadi, M. Attia Mahmoud, B. Auchmann, J. Back, A. Badea, K. J. Bae, E. J. Bahng, L. Balconi, F. Balli, L. Bandiera, C. Barbagallo , et al. (362 additional authors not shown)

    Abstract: The International Muon Collider Collaboration (IMCC) [1] was established in 2020 following the recommendations of the European Strategy for Particle Physics (ESPP) and the implementation of the European Strategy for Particle Physics-Accelerator R&D Roadmap by the Laboratory Directors Group [2], hereinafter referred to as the the European LDG roadmap. The Muon Collider Study (MuC) covers the accele… ▽ More

    Submitted 28 January, 2025; v1 submitted 17 July, 2024; originally announced July 2024.

    Comments: This document summarises the International Muon Collider Collaboration (IMCC) progress and status of the Muon Collider R&D programme

  42. arXiv:2406.14900  [pdf, other

    cs.IR

    Decoding Matters: Addressing Amplification Bias and Homogeneity Issue for LLM-based Recommendation

    Authors: Keqin Bao, Jizhi Zhang, Yang Zhang, Xinyue Huo, Chong Chen, Fuli Feng

    Abstract: Adapting Large Language Models (LLMs) for recommendation requires careful consideration of the decoding process, given the inherent differences between generating items and natural language. Existing approaches often directly apply LLMs' original decoding methods. However, we find these methods encounter significant challenges: 1) amplification bias -- where standard length normalization inflates… ▽ More

    Submitted 5 November, 2024; v1 submitted 21 June, 2024; originally announced June 2024.

    Comments: Accepted at EMNLP 2024 Main Conference

  43. arXiv:2406.11503  [pdf, other

    cs.CV cs.CL

    GeoGPT4V: Towards Geometric Multi-modal Large Language Models with Geometric Image Generation

    Authors: Shihao Cai, Keqin Bao, Hangyu Guo, Jizhi Zhang, Jun Song, Bo Zheng

    Abstract: Large language models have seen widespread adoption in math problem-solving. However, in geometry problems that usually require visual aids for better understanding, even the most advanced multi-modal models currently still face challenges in effectively using image information. High-quality data is crucial for enhancing the geometric capabilities of multi-modal models, yet existing open-source da… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

  44. arXiv:2406.08796  [pdf, other

    cs.CL

    Deep Exploration of Cross-Lingual Zero-Shot Generalization in Instruction Tuning

    Authors: Janghoon Han, Changho Lee, Joongbo Shin, Stanley Jungkyu Choi, Honglak Lee, Kynghoon Bae

    Abstract: Instruction tuning has emerged as a powerful technique, significantly boosting zero-shot performance on unseen tasks. While recent work has explored cross-lingual generalization by applying instruction tuning to multilingual models, previous studies have primarily focused on English, with a limited exploration of non-English tasks. For an in-depth exploration of cross-lingual generalization in ins… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

    Comments: Findings of ACL 2024 (Camera-ready), by Janghoon Han and Changho Lee, with equal contribution

  45. arXiv:2406.07068  [pdf

    cond-mat.mtrl-sci

    Emergent Moiré fringes in direct-grown quasicrystal

    Authors: Jingwei Li, Kejie Bao, Honglin Sun, Xingxu Yan, Ting Huang, Qicheng Zhang, Yaoqiang Zhou, Zhenjing Liu, Paul Masih Das, Jiawen You, Jiong Zhao, Jianbin Xu, Xiaoqing Pan, Yongli Mi, Junyi Zhu, Zhaoli Gao

    Abstract: Quasicrystals represent a category of rarely structured solids that challenge traditional periodicity in crystal materials. Recent advancements in the synthesis of two-dimensional (2D) van der Waals materials have paved the way for exploring the unique physical properties of these systems. Here, we report on the synthesis of 2D quasicrystals featuring 30° alternating twist angles between multiple… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

  46. arXiv:2406.03210  [pdf, other

    cs.IR

    Text-like Encoding of Collaborative Information in Large Language Models for Recommendation

    Authors: Yang Zhang, Keqin Bao, Ming Yan, Wenjie Wang, Fuli Feng, Xiangnan He

    Abstract: When adapting Large Language Models for Recommendation (LLMRec), it is crucial to integrate collaborative information. Existing methods achieve this by learning collaborative embeddings in LLMs' latent space from scratch or by mapping from external models. However, they fail to represent the information in a text-like format, which may not align optimally with LLMs. To bridge this gap, we introduc… ▽ More

    Submitted 5 June, 2024; originally announced June 2024.

    Comments: Accepted by ACL 2024

    ACM Class: H.3.3

  47. The Simons Observatory: Studies of Detector Yield and Readout Noise From the First Large-Scale Deployment of Microwave Multiplexing at the Large Aperture Telescope

    Authors: Thomas P. Satterthwaite, Zeeshan Ahmed, Kyuyoung Bae, Mark Devlin, Simon Dicker, Shannon M. Duff, Daniel Dutcher, Saianeesh K. Haridas, Shawn W. Henderson, Johannes Hubmayr, Bradley R. Johnson, Anna Kofman, Jack Lashner, Michael J. Link, Tammy J. Lucas, Alex Manduca, Michael D. Niemack, John Orlowski-Scherer, Tristan Pinsonneault-Marotte, Max Silva-Feaver, Suzanne Staggs, Eve M. Vavagiakis, Yuhan Wang, Kaiwen Zheng

    Abstract: The Simons Observatory is a new ground-based cosmic microwave background experiment, which is currently being commissioned in Chile's Atacama Desert. During its survey, the observatory's small aperture telescopes will map 10% of the sky in bands centered at frequencies ranging from 27 to 280 GHz to constrain cosmic inflation models, and its large aperture telescope will map 40% of the sky in the s… ▽ More

    Submitted 3 June, 2024; originally announced June 2024.

    Comments: 10 pages, 5 figures, 1 table. To be presented at SPIE Astronomical Telescopes + Instrumentation 2024

    Journal ref: Proc. SPIE 13102, Millimeter, Submillimeter, and Far-Infrared Detectors and Instrumentation for Astronomy XII. 1310223 (2024)

  48. arXiv:2405.09869  [pdf, ps, other

    math.OC

    Optimality conditions at infinity for nonsmooth minimax programming

    Authors: Nguyen Van Tuyen, Kwan Deok Bae, Do Sang Kim

    Abstract: This paper is devoted to study of optimality conditions at infinity in nonsmooth minimax programming problems and applications. By means of the limiting subdifferential and normal cone at infinity, we dirive necessary and sufficient optimality conditions of Karush--Kuhn--Tucker type for nonsmooth minimax programming problems with constraint. The obtained results are applied to a nonsmooth vector o… ▽ More

    Submitted 16 May, 2024; originally announced May 2024.

    Comments: 18 pages

    MSC Class: 90C47; 49K35; 49J52; 90C29

  49. arXiv:2405.06088  [pdf, other

    cs.CV

    A Mixture of Experts Approach to 3D Human Motion Prediction

    Authors: Edmund Shieh, Joshua Lee Franco, Kang Min Bae, Tej Lalvani

    Abstract: This project addresses the challenge of human motion prediction, a critical area for applications such as au- tonomous vehicle movement detection. Previous works have emphasized the need for low inference times to provide real time performance for applications like these. Our primary objective is to critically evaluate existing model ar- chitectures, identifying their advantages and opportunities… ▽ More

    Submitted 9 May, 2024; originally announced May 2024.

    Comments: 16 pages, 6 figures

  50. arXiv:2404.16418  [pdf, other

    cs.CL

    Instruction Matters: A Simple yet Effective Task Selection for Optimized Instruction Tuning of Specific Tasks

    Authors: Changho Lee, Janghoon Han, Seonghyeon Ye, Stanley Jungkyu Choi, Honglak Lee, Kyunghoon Bae

    Abstract: Instruction tuning has been proven effective in enhancing zero-shot generalization across various tasks and in improving the performance of specific tasks. For task-specific improvements, strategically selecting and training on related tasks that provide meaningful supervision is crucial, as this approach enhances efficiency and prevents performance degradation from learning irrelevant tasks. In t… ▽ More

    Submitted 16 October, 2024; v1 submitted 25 April, 2024; originally announced April 2024.

    Comments: EMNLP 2024 (Camera-ready), by Janghoon Han and Changho Lee, with equal contribution