Skip to main content

Showing 1–4 of 4 results for author: Xuyang, S

Searching in archive cs. Search in all archives.
.
  1. arXiv:2506.10972  [pdf, ps, other

    cs.LG cs.AI

    Farseer: A Refined Scaling Law in Large Language Models

    Authors: Houyi Li, Wenzhen Zheng, Qiufeng Wang, Zhenyu Ding, Haoying Wang, Zili Wang, Shijie Xuyang, Ning Ding, Shuigeng Zhou, Xiangyu Zhang, Daxin Jiang

    Abstract: Training Large Language Models (LLMs) is prohibitively expensive, creating a critical scaling gap where insights from small-scale experiments often fail to transfer to resource-intensive production systems, thereby hindering efficient innovation. To bridge this, we introduce Farseer, a novel and refined scaling law offering enhanced predictive accuracy across scales. By systematically constructing… ▽ More

    Submitted 14 June, 2025; v1 submitted 12 June, 2025; originally announced June 2025.

    Comments: 34

    ACM Class: I.2

  2. arXiv:2505.14597  [pdf, ps, other

    cs.CL

    Success is in the Details: Evaluate and Enhance Details Sensitivity of Code LLMs through Counterfactuals

    Authors: Xianzhen Luo, Qingfu Zhu, Zhiming Zhang, Mingzheng Xu, Tianhao Cheng, Yixuan Wang, Zheng Chu, Shijie Xuyang, Zhiyuan Ma, YuanTao Fan, Wanxiang Che

    Abstract: Code Sensitivity refers to the ability of Code LLMs to recognize and respond to details changes in problem descriptions. While current code benchmarks and instruction data focus on difficulty and diversity, sensitivity is overlooked. We first introduce the CTF-Code benchmark, constructed using counterfactual perturbations, minimizing input changes while maximizing output changes. The evaluation sh… ▽ More

    Submitted 20 May, 2025; originally announced May 2025.

    Comments: Code & Model is https://github.com/Luowaterbi/CTF-Instruct

  3. arXiv:2505.11441  [pdf, ps, other

    cs.CL

    Is Compression Really Linear with Code Intelligence?

    Authors: Xianzhen Luo, Shijie Xuyang, Tianhao Cheng, Zheng Chu, Houyi Li, ziqi wang, Siming Huang, Qingfu Zhu, Qiufeng Wang, Xiangyu Zhang, Shuigeng Zhou, Wanxiang Che

    Abstract: Understanding the relationship between data compression and the capabilities of Large Language Models (LLMs) is crucial, especially in specialized domains like code intelligence. Prior work posited a linear relationship between compression and general intelligence. However, it overlooked the multifaceted nature of code that encompasses diverse programming languages and tasks, and struggled with fa… ▽ More

    Submitted 4 June, 2025; v1 submitted 16 May, 2025; originally announced May 2025.

    Comments: work in progress

  4. arXiv:2503.04715  [pdf, other

    cs.LG cs.AI

    Predictable Scale: Part I -- Optimal Hyperparameter Scaling Law in Large Language Model Pretraining

    Authors: Houyi Li, Wenzhen Zheng, Qiufeng Wang, Hanshan Zhang, Zili Wang, Shijie Xuyang, Yuantao Fan, Shuigeng Zhou, Xiangyu Zhang, Daxin Jiang

    Abstract: The impressive capabilities of Large Language Models (LLMs) across diverse tasks are now well-established, yet their effective deployment necessitates careful hyperparameter optimization. Through extensive empirical studies involving grid searches across diverse configurations, we discover universal scaling laws governing these hyperparameters: optimal learning rate follows a power-law relationshi… ▽ More

    Submitted 21 May, 2025; v1 submitted 6 March, 2025; originally announced March 2025.

    Comments: 22 pages

    ACM Class: F.2.2; I.2.7