Skip to main content

Showing 1–50 of 121 results for author: Xia, K

Searching in archive cs. Search in all archives.
.
  1. arXiv:2506.13585  [pdf, ps, other

    cs.CL cs.LG

    MiniMax-M1: Scaling Test-Time Compute Efficiently with Lightning Attention

    Authors: MiniMax, :, Aili Chen, Aonian Li, Bangwei Gong, Binyang Jiang, Bo Fei, Bo Yang, Boji Shan, Changqing Yu, Chao Wang, Cheng Zhu, Chengjun Xiao, Chengyu Du, Chi Zhang, Chu Qiao, Chunhao Zhang, Chunhui Du, Congchao Guo, Da Chen, Deming Ding, Dianjun Sun, Dong Li, Enwei Jiao, Haigang Zhou , et al. (103 additional authors not shown)

    Abstract: We introduce MiniMax-M1, the world's first open-weight, large-scale hybrid-attention reasoning model. MiniMax-M1 is powered by a hybrid Mixture-of-Experts (MoE) architecture combined with a lightning attention mechanism. The model is developed based on our previous MiniMax-Text-01 model, which contains a total of 456 billion parameters with 45.9 billion parameters activated per token. The M1 model… ▽ More

    Submitted 16 June, 2025; originally announced June 2025.

    Comments: A technical report from MiniMax. The authors are listed in alphabetical order. We open-source our MiniMax-M1 at https://github.com/MiniMax-AI/MiniMax-M1

  2. arXiv:2506.05117  [pdf, ps, other

    cs.RO

    Realizing Text-Driven Motion Generation on NAO Robot: A Reinforcement Learning-Optimized Control Pipeline

    Authors: Zihan Xu, Mengxian Hu, Kaiyan Xiao, Qin Fang, Chengju Liu, Qijun Chen

    Abstract: Human motion retargeting for humanoid robots, transferring human motion data to robots for imitation, presents significant challenges but offers considerable potential for real-world applications. Traditionally, this process relies on human demonstrations captured through pose estimation or motion capture systems. In this paper, we explore a text-driven approach to mapping human motion to humanoid… ▽ More

    Submitted 5 June, 2025; originally announced June 2025.

  3. arXiv:2505.24645  [pdf

    cs.RO physics.app-ph

    Intrinsic static/dynamic triboelectric pressure sensor for continuous and event-triggered control

    Authors: Kequan Xia, Song Yang, Jianguo Lu, Min Yu

    Abstract: Conventional pressure sensors often integrate two distinct mechanisms to detect static and dynamic stimuli, hindering the development of high fidelity human-machine interfaces. Here, we present an intrinsic static/dynamic triboelectric sensor (iSD Sensor) capable of reliably perceiving both continuous static pressure and transient mechanical shocks through a DC/AC signal decoupling strategy. By pa… ▽ More

    Submitted 4 July, 2025; v1 submitted 30 May, 2025; originally announced May 2025.

  4. arXiv:2505.12269  [pdf

    econ.GN cs.AI cs.CL math.LO q-fin.GN

    Vague Knowledge: Evidence from Analyst Reports

    Authors: Kerry Xiao, Amy Zang

    Abstract: People in the real world often possess vague knowledge of future payoffs, for which quantification is not feasible or desirable. We argue that language, with differing ability to convey vague information, plays an important but less-known role in representing subjective expectations. Empirically, we find that in their reports, analysts include useful information in linguistic expressions but not n… ▽ More

    Submitted 24 May, 2025; v1 submitted 18 May, 2025; originally announced May 2025.

    MSC Class: 03B48; 03B65; 03E02; 03E15; 03E72; 18E45; 28A05; 62F15; 68T01; 68T35; 68T50; 91G30; ACM Class: F.4; I.2.3; I.2.4; I.2.7; J.1; J.4; J.5

  5. arXiv:2505.09586  [pdf, ps, other

    cs.LG

    Rhomboid Tiling for Geometric Graph Deep Learning

    Authors: Yipeng Zhang, Longlong Li, Kelin Xia

    Abstract: Graph Neural Networks (GNNs) have proven effective for learning from graph-structured data through their neighborhood-based message passing framework. Many hierarchical graph clustering pooling methods modify this framework by introducing clustering-based strategies, enabling the construction of more expressive and powerful models. However, all of these message passing framework heavily rely on th… ▽ More

    Submitted 14 May, 2025; originally announced May 2025.

  6. arXiv:2505.09174  [pdf, other

    cs.LG cond-mat.mtrl-sci

    Quotient Complex Transformer (QCformer) for Perovskite Data Analysis

    Authors: Xinyu You, Xiang Liu, Chuan-Shen Hu, Kelin Xia, Tze Chien Sum

    Abstract: The discovery of novel functional materials is crucial in addressing the challenges of sustainable energy generation and climate change. Hybrid organic-inorganic perovskites (HOIPs) have gained attention for their exceptional optoelectronic properties in photovoltaics. Recently, geometric deep learning, particularly graph neural networks (GNNs), has shown strong potential in predicting material pr… ▽ More

    Submitted 14 May, 2025; originally announced May 2025.

  7. arXiv:2505.09107  [pdf, other

    astro-ph.IM astro-ph.EP astro-ph.SR cs.DC

    Architecture of Tianyu Software: Relative Photometry as a Case Study

    Authors: Yicheng Rui, Yifan Xuan, Shuyue Zheng, Kexin Li, Kaiming Cui, Kai Xiao, Jie Zheng, Jun Kai Ng, Hongxuan Jiang, Fabo Feng, Qinghui Sun

    Abstract: Tianyu telescope, an one-meter robotic optical survey instrument to be constructed in Lenghu, Qinghai, China, is designed for detecting transiting exoplanets, variable stars and transients. It requires a highly automated, optimally distributed, easily extendable, and highly flexible software to enable the data processing for the raw data at rates exceeding 500MB/s. In this work, we introduce the a… ▽ More

    Submitted 14 May, 2025; v1 submitted 13 May, 2025; originally announced May 2025.

    Comments: 18 pages, 10 figures, 6 tables, accepted for publication in PASP

  8. arXiv:2505.06285  [pdf, other

    eess.SP cs.CV

    FEMSN: Frequency-Enhanced Multiscale Network for fault diagnosis of rotating machinery under strong noise environments

    Authors: Yuhan Yuan, Xiaomo Jiang, Yanfeng Han, Ke Xiao

    Abstract: Rolling bearings are critical components of rotating machinery, and their proper functioning is essential for industrial production. Most existing condition monitoring methods focus on extracting discriminative features from time-domain signals to assess bearing health status. However, under complex operating conditions, periodic impulsive characteristics related to fault information are often obs… ▽ More

    Submitted 7 May, 2025; originally announced May 2025.

  9. arXiv:2505.04622  [pdf, other

    cs.GR cs.CV

    PrimitiveAnything: Human-Crafted 3D Primitive Assembly Generation with Auto-Regressive Transformer

    Authors: Jingwen Ye, Yuze He, Yanning Zhou, Yiqin Zhu, Kaiwen Xiao, Yong-Jin Liu, Wei Yang, Xiao Han

    Abstract: Shape primitive abstraction, which decomposes complex 3D shapes into simple geometric elements, plays a crucial role in human visual cognition and has broad applications in computer vision and graphics. While recent advances in 3D content generation have shown remarkable progress, existing primitive abstraction methods either rely on geometric optimization with limited semantic understanding or le… ▽ More

    Submitted 7 May, 2025; originally announced May 2025.

    Comments: SIGGRAPH 2025. 14 pages, 15 figures

  10. arXiv:2504.16053  [pdf, other

    cs.CL cs.AI

    LongMamba: Enhancing Mamba's Long Context Capabilities via Training-Free Receptive Field Enlargement

    Authors: Zhifan Ye, Kejing Xia, Yonggan Fu, Xin Dong, Jihoon Hong, Xiangchi Yuan, Shizhe Diao, Jan Kautz, Pavlo Molchanov, Yingyan Celine Lin

    Abstract: State space models (SSMs) have emerged as an efficient alternative to Transformer models for language modeling, offering linear computational complexity and constant memory usage as context length increases. However, despite their efficiency in handling long contexts, recent studies have shown that SSMs, such as Mamba models, generally underperform compared to Transformers in long-context understa… ▽ More

    Submitted 22 April, 2025; originally announced April 2025.

    Comments: Accepted by ICLR 2025

  11. arXiv:2504.09174  [pdf, other

    cs.CG math.AC math.AT

    Commutative algebra-enhanced topological data analysis

    Authors: Chuanshen Hu, Yu Wang, Kelin Xia, Ke Ye, Yipeng Zhang

    Abstract: Topological Data Analysis (TDA) combines computational topology and data science to extract and analyze intrinsic topological and geometric structures in data set in a metric space. While the persistent homology (PH), a widely used tool in TDA, which tracks the lifespan information of topological features through a filtration process, has shown its effectiveness in applications,it is inherently li… ▽ More

    Submitted 12 April, 2025; originally announced April 2025.

  12. arXiv:2504.07405  [pdf, other

    cs.CV

    FlexIP: Dynamic Control of Preservation and Personality for Customized Image Generation

    Authors: Linyan Huang, Haonan Lin, Yanning Zhou, Kaiwen Xiao

    Abstract: With the rapid advancement of 2D generative models, preserving subject identity while enabling diverse editing has emerged as a critical research focus. Existing methods typically face inherent trade-offs between identity preservation and personalized manipulation. We introduce FlexIP, a novel framework that decouples these objectives through two dedicated components: a Personalization Adapter for… ▽ More

    Submitted 9 April, 2025; originally announced April 2025.

  13. arXiv:2504.02286  [pdf, other

    cs.CV

    Moment Quantization for Video Temporal Grounding

    Authors: Xiaolong Sun, Le Wang, Sanping Zhou, Liushuai Shi, Kun Xia, Mengnan Liu, Yabing Wang, Gang Hua

    Abstract: Video temporal grounding is a critical video understanding task, which aims to localize moments relevant to a language description. The challenge of this task lies in distinguishing relevant and irrelevant moments. Previous methods focused on learning continuous features exhibit weak differentiation between foreground and background features. In this paper, we propose a novel Moment-Quantization b… ▽ More

    Submitted 3 April, 2025; originally announced April 2025.

  14. arXiv:2504.02281  [pdf, other

    cs.CE

    Parallel Market Environments for FinRL Contests

    Authors: Keyi Wang, Nikolaus Holzer, Ziyi Xia, Yupeng Cao, Jiechao Gao, Anwar Walid, Kairong Xiao, Xiao-Yang Liu Yanglet

    Abstract: Financial reinforcement learning (FinRL) has emerged as a promising paradigm for sequential decision-making in financial engineering. However, applying RL in real-world trading tasks remains challenging due to the non-stationarity of financial data, low signal-to-noise ratios, and various market frictions. Although numerous FinRL methods have been developed for tasks such as trading and portfolio… ▽ More

    Submitted 16 May, 2025; v1 submitted 3 April, 2025; originally announced April 2025.

    Comments: arXiv admin note: text overlap with arXiv:2501.10709

  15. arXiv:2503.20835  [pdf, other

    cs.CL

    Comprehensive Manuscript Assessment with Text Summarization Using 69707 articles

    Authors: Qichen Sun, Yuxing Lu, Kun Xia, Li Chen, He Sun, Jinzhuo Wang

    Abstract: Rapid and efficient assessment of the future impact of research articles is a significant concern for both authors and reviewers. The most common standard for measuring the impact of academic papers is the number of citations. In recent years, numerous efforts have been undertaken to predict citation counts within various citation windows. However, most of these studies focus solely on a specific… ▽ More

    Submitted 26 March, 2025; originally announced March 2025.

  16. arXiv:2503.19936  [pdf, other

    cs.CV

    VisualQuest: A Diverse Image Dataset for Evaluating Visual Recognition in LLMs

    Authors: Kelaiti Xiao, Liang Yang, Paerhati Tulajiang, Hongfei Lin

    Abstract: This paper introduces VisualQuest, a novel image dataset designed to assess the ability of large language models (LLMs) to interpret non-traditional, stylized imagery. Unlike conventional photographic benchmarks, VisualQuest challenges models with images that incorporate abstract, symbolic, and metaphorical elements, requiring the integration of domain-specific knowledge and advanced reasoning. Th… ▽ More

    Submitted 24 March, 2025; originally announced March 2025.

  17. The Immersive Archive: Archival Strategies for the Sensorama & Sutherland HMD

    Authors: Zeynep Abes, Nathan Fairchild, Spencer Lin, Michael Wahba, Katrina Xiao, Scott S. Fisher

    Abstract: The Immersive Archive is an initiative dedicated to preserve and restore the groundbreaking works from across Extended Reality (XR) history. Originating at the University of Southern California's Mobile and Environmental Media Lab, this archive is committed to developing and exhibiting simulations of influential XR devices that have shaped immersive media over time. This paper examines the challen… ▽ More

    Submitted 17 March, 2025; originally announced March 2025.

    Journal ref: Proc. IEEE Conf. AI & XR, 2025, pp. 307-312

  18. ArtInsight: Enabling AI-Powered Artwork Engagement for Mixed Visual-Ability Families

    Authors: Arnavi Chheda-Kothary, Ritesh Kanchi, Chris Sanders, Kevin Xiao, Aditya Sengupta, Melanie Kneitmix, Jacob O. Wobbrock, Jon E. Froehlich

    Abstract: We introduce ArtInsight, a novel AI-powered system to facilitate deeper engagement with child-created artwork in mixed visual-ability families. ArtInsight leverages large language models (LLMs) to craft a respectful and thorough initial description of a child's artwork, and provides: creative AI-generated descriptions for a vivid overview, audio recording to capture the child's own description of… ▽ More

    Submitted 10 March, 2025; v1 submitted 26 February, 2025; originally announced February 2025.

    Comments: 21 pages, 30th International Conference on Intelligent User Interfaces (IUI 2025)

    Journal ref: 30th International Conference on Intelligent User Interfaces 2025

  19. arXiv:2502.09791  [pdf

    cond-mat.mtrl-sci cs.CV

    Atom identification in bilayer moire materials with Gomb-Net

    Authors: Austin C. Houston, Sumner B. Harris, Hao Wang, Yu-Chuan Lin, David B. Geohegan, Kai Xiao, Gerd Duscher

    Abstract: Moire patterns in van der Waals bilayer materials complicate the analysis of atomic-resolution images, hindering the atomic-scale insight typically attainable with scanning transmission electron microscopy. Here, we report a method to detect the positions and identities of atoms in each of the individual layers that compose twisted bilayer heterostructures. We developed a deep learning model, Gomb… ▽ More

    Submitted 26 February, 2025; v1 submitted 13 February, 2025; originally announced February 2025.

  20. arXiv:2502.06207  [pdf, other

    cs.CL cs.AI

    Is LLM an Overconfident Judge? Unveiling the Capabilities of LLMs in Detecting Offensive Language with Annotation Disagreement

    Authors: Junyu Lu, Kai Ma, Kaichun Wang, Kelaiti Xiao, Roy Ka-Wei Lee, Bo Xu, Liang Yang, Hongfei Lin

    Abstract: Large Language Models (LLMs) have become essential for offensive language detection, yet their ability to handle annotation disagreement remains underexplored. Disagreement samples, which arise from subjective interpretations, pose a unique challenge due to their ambiguous nature. Understanding how LLMs process these cases, particularly their confidence levels, can offer insight into their alignme… ▽ More

    Submitted 18 May, 2025; v1 submitted 10 February, 2025; originally announced February 2025.

    Comments: 18 pages, accepted at the ACL 2025

  21. arXiv:2502.05902  [pdf, other

    cs.CV

    Fast Omni-Directional Image Super-Resolution: Adapting the Implicit Image Function with Pixel and Semantic-Wise Spherical Geometric Priors

    Authors: Xuelin Shen, Yitong Wang, Silin Zheng, Kang Xiao, Wenhan Yang, Xu Wang

    Abstract: In the context of Omni-Directional Image (ODI) Super-Resolution (SR), the unique challenge arises from the non-uniform oversampling characteristics caused by EquiRectangular Projection (ERP). Considerable efforts in designing complex spherical convolutions or polyhedron reprojection offer significant performance improvements but at the expense of cumbersome processing procedures and slower inferen… ▽ More

    Submitted 9 February, 2025; originally announced February 2025.

    Comments: 9 pages, 4 figures, AAAI 2025

  22. arXiv:2501.18841  [pdf, other

    cs.LG cs.CR

    Trading Inference-Time Compute for Adversarial Robustness

    Authors: Wojciech Zaremba, Evgenia Nitishinskaya, Boaz Barak, Stephanie Lin, Sam Toyer, Yaodong Yu, Rachel Dias, Eric Wallace, Kai Xiao, Johannes Heidecke, Amelia Glaese

    Abstract: We conduct experiments on the impact of increasing inference-time compute in reasoning models (specifically OpenAI o1-preview and o1-mini) on their robustness to adversarial attacks. We find that across a variety of attacks, increased inference-time compute leads to improved robustness. In many cases (with important exceptions), the fraction of model samples where the attack succeeds tends to zero… ▽ More

    Submitted 30 January, 2025; originally announced January 2025.

  23. arXiv:2501.12174  [pdf, other

    cs.LG

    BiMarker: Enhancing Text Watermark Detection for Large Language Models with Bipolar Watermarks

    Authors: Zhuang Li, Qiuping Yi, Zongcheng Ji, Yijian Lu, Yanqi Li, Keyang Xiao, Hongliang Liang

    Abstract: The rapid growth of Large Language Models (LLMs) raises concerns about distinguishing AI-generated text from human content. Existing watermarking techniques, like \kgw, struggle with low watermark strength and stringent false-positive requirements. Our analysis reveals that current methods rely on coarse estimates of non-watermarked text, limiting watermark detectability. To address this, we propo… ▽ More

    Submitted 21 May, 2025; v1 submitted 21 January, 2025; originally announced January 2025.

  24. arXiv:2501.10963  [pdf, other

    cs.CE

    Open FinLLM Leaderboard: Towards Financial AI Readiness

    Authors: Shengyuan Colin Lin, Felix Tian, Keyi Wang, Xingjian Zhao, Jimin Huang, Qianqian Xie, Luca Borella, Matt White, Christina Dan Wang, Kairong Xiao, Xiao-Yang Liu Yanglet, Li Deng

    Abstract: Financial large language models (FinLLMs) with multimodal capabilities are envisioned to revolutionize applications across business, finance, accounting, and auditing. However, real-world adoption requires robust benchmarks of FinLLMs' and FinAgents' performance. Maintaining an open leaderboard is crucial for encouraging innovative adoption and improving model effectiveness. In collaboration with… ▽ More

    Submitted 29 April, 2025; v1 submitted 19 January, 2025; originally announced January 2025.

  25. arXiv:2501.10709  [pdf, other

    cs.CE cs.AI stat.ML

    Revisiting Ensemble Methods for Stock Trading and Crypto Trading Tasks at ACM ICAIF FinRL Contest 2023-2024

    Authors: Nikolaus Holzer, Keyi Wang, Kairong Xiao, Xiao-Yang Liu Yanglet

    Abstract: Reinforcement learning has demonstrated great potential for performing financial tasks. However, it faces two major challenges: policy instability and sampling bottlenecks. In this paper, we revisit ensemble methods with massively parallel simulations on graphics processing units (GPUs), significantly enhancing the computational efficiency and robustness of trained models in volatile financial mar… ▽ More

    Submitted 18 January, 2025; originally announced January 2025.

  26. arXiv:2501.10404  [pdf, other

    eess.SP cs.LG

    Automated Detection of Epileptic Spikes and Seizures Incorporating a Novel Spatial Clustering Prior

    Authors: Hanyang Dong, Shurong Sheng, Xiongfei Wang, Jiahong Gao, Yi Sun, Wanli Yang, Kuntao Xiao, Pengfei Teng, Guoming Luan, Zhao Lv

    Abstract: A Magnetoencephalography (MEG) time-series recording consists of multi-channel signals collected by superconducting sensors, with each signal's intensity reflecting magnetic field changes over time at the sensor location. Automating epileptic MEG spike detection significantly reduces manual assessment time and effort, yielding substantial clinical benefits. Existing research addresses MEG spike de… ▽ More

    Submitted 4 January, 2025; originally announced January 2025.

    Comments: 8 pages, 6 figures, accepted by BIBM2024

  27. arXiv:2501.08313  [pdf, other

    cs.CL cs.CV

    MiniMax-01: Scaling Foundation Models with Lightning Attention

    Authors: MiniMax, Aonian Li, Bangwei Gong, Bo Yang, Boji Shan, Chang Liu, Cheng Zhu, Chunhao Zhang, Congchao Guo, Da Chen, Dong Li, Enwei Jiao, Gengxin Li, Guojun Zhang, Haohai Sun, Houze Dong, Jiadai Zhu, Jiaqi Zhuang, Jiayuan Song, Jin Zhu, Jingtao Han, Jingyang Li, Junbin Xie, Junhao Xu, Junjie Yan , et al. (65 additional authors not shown)

    Abstract: We introduce MiniMax-01 series, including MiniMax-Text-01 and MiniMax-VL-01, which are comparable to top-tier models while offering superior capabilities in processing longer contexts. The core lies in lightning attention and its efficient scaling. To maximize computational capacity, we integrate it with Mixture of Experts (MoE), creating a model with 32 experts and 456 billion total parameters, o… ▽ More

    Submitted 14 January, 2025; originally announced January 2025.

    Comments: A technical report from MiniMax. The authors are listed in alphabetical order. We open-sourced our MiniMax-01 at https://github.com/MiniMax-AI

  28. arXiv:2412.18693  [pdf, other

    cs.LG cs.AI cs.CL

    Diverse and Effective Red Teaming with Auto-generated Rewards and Multi-step Reinforcement Learning

    Authors: Alex Beutel, Kai Xiao, Johannes Heidecke, Lilian Weng

    Abstract: Automated red teaming can discover rare model failures and generate challenging examples that can be used for training or evaluation. However, a core challenge in automated red teaming is ensuring that the attacks are both diverse and effective. Prior methods typically succeed in optimizing either for diversity or for effectiveness, but rarely both. In this paper, we provide methods that enable au… ▽ More

    Submitted 24 December, 2024; originally announced December 2024.

  29. arXiv:2412.16720  [pdf, other

    cs.AI

    OpenAI o1 System Card

    Authors: OpenAI, :, Aaron Jaech, Adam Kalai, Adam Lerer, Adam Richardson, Ahmed El-Kishky, Aiden Low, Alec Helyar, Aleksander Madry, Alex Beutel, Alex Carney, Alex Iftimie, Alex Karpenko, Alex Tachard Passos, Alexander Neitz, Alexander Prokofiev, Alexander Wei, Allison Tam, Ally Bennett, Ananya Kumar, Andre Saraiva, Andrea Vallone, Andrew Duberstein, Andrew Kondrich , et al. (238 additional authors not shown)

    Abstract: The o1 model series is trained with large-scale reinforcement learning to reason using chain of thought. These advanced reasoning capabilities provide new avenues for improving the safety and robustness of our models. In particular, our models can reason about our safety policies in context when responding to potentially unsafe prompts, through deliberative alignment. This leads to state-of-the-ar… ▽ More

    Submitted 21 December, 2024; originally announced December 2024.

  30. arXiv:2412.11814  [pdf, other

    cs.CL

    EventSum: A Large-Scale Event-Centric Summarization Dataset for Chinese Multi-News Documents

    Authors: Mengna Zhu, Kaisheng Zeng, Mao Wang, Kaiming Xiao, Lei Hou, Hongbin Huang, Juanzi Li

    Abstract: In real life, many dynamic events, such as major disasters and large-scale sports events, evolve continuously over time. Obtaining an overview of these events can help people quickly understand the situation and respond more effectively. This is challenging because the key information of the event is often scattered across multiple documents, involving complex event knowledge understanding and rea… ▽ More

    Submitted 3 January, 2025; v1 submitted 16 December, 2024; originally announced December 2024.

    Comments: Extended version for paper accepted to AAAI 2025

  31. Privacy-Preserving Brain-Computer Interfaces: A Systematic Review

    Authors: K. Xia, W. Duch, Y. Sun, K. Xu, W. Fang, H. Luo, Y. Zhang, D. Sang, X. Xu, F-Y Wang, D. Wu

    Abstract: A brain-computer interface (BCI) establishes a direct communication pathway between the human brain and a computer. It has been widely used in medical diagnosis, rehabilitation, education, entertainment, etc. Most research so far focuses on making BCIs more accurate and reliable, but much less attention has been paid to their privacy. Developing a commercial BCI system usually requires close colla… ▽ More

    Submitted 15 December, 2024; originally announced December 2024.

    Journal ref: IEEE Trans. on Computational Social Systems, 10(5):2312-2324, 2023

  32. arXiv:2412.11159  [pdf, other

    cs.CE

    A Report on Financial Regulations Challenge at COLING 2025

    Authors: Keyi Wang, Jaisal Patel, Charlie Shen, Daniel Kim, Andy Zhu, Alex Lin, Luca Borella, Cailean Osborne, Matt White, Steve Yang, Kairong Xiao, Xiao-Yang Liu Yanglet

    Abstract: Financial large language models (FinLLMs) have been applied to various tasks in business, finance, accounting, and auditing. Complex financial regulations and standards are critical to financial services, which LLMs must comply with. However, FinLLMs' performance in understanding and interpreting financial regulations has rarely been studied. Therefore, we organize the Regulations Challenge, a sha… ▽ More

    Submitted 12 January, 2025; v1 submitted 15 December, 2024; originally announced December 2024.

    Comments: 8 pages, 4 tables

  33. arXiv:2412.08896  [pdf, other

    cs.CV

    LV-CadeNet: Long View Feature Convolution-Attention Fusion Encoder-Decoder Network for Clinical MEG Spike Detection

    Authors: Kuntao Xiao, Xiongfei Wang, Pengfei Teng, Yi Sun, Wanli Yang, Liang Zhang, Hanyang Dong, Guoming Luan, Shurong Sheng

    Abstract: It is widely acknowledged that the epileptic foci can be pinpointed by source localizing interictal epileptic discharges (IEDs) via Magnetoencephalography (MEG). However, manual detection of IEDs, which appear as spikes in MEG data, is extremely labor intensive and requires considerable professional expertise, limiting the broader adoption of MEG technology. Numerous studies have focused on automa… ▽ More

    Submitted 11 December, 2024; originally announced December 2024.

    ACM Class: I.4.6; I.5.1; J.3

  34. Semi-Supervised Transfer Boosting (SS-TrBoosting)

    Authors: Lingfei Deng, Changming Zhao, Zhenbang Du, Kun Xia, Dongrui Wu

    Abstract: Semi-supervised domain adaptation (SSDA) aims at training a high-performance model for a target domain using few labeled target data, many unlabeled target data, and plenty of auxiliary data from a source domain. Previous works in SSDA mainly focused on learning transferable representations across domains. However, it is difficult to find a feature space where the source and target domains share t… ▽ More

    Submitted 4 December, 2024; originally announced December 2024.

    Journal ref: IEEE Trans. on Artificial Intelligence, 5(7):3431-3444, 2024

  35. arXiv:2411.13887  [pdf, other

    math.AT cond-mat.mtrl-sci cs.CG math.MG stat.ML

    A cohomology-based Gromov-Hausdorff metric approach for quantifying molecular similarity

    Authors: JunJie Wee, Xue Gong, Wilderich Tuschmann, Kelin Xia

    Abstract: We introduce, for the first time, a cohomology-based Gromov-Hausdorff ultrametric method to analyze 1-dimensional and higher-dimensional (co)homology groups, focusing on loops, voids, and higher-dimensional cavity structures in simplicial complexes, to address typical clustering questions arising in molecular data analysis. The Gromov-Hausdorff distance quantifies the dissimilarity between two met… ▽ More

    Submitted 25 February, 2025; v1 submitted 21 November, 2024; originally announced November 2024.

    Comments: 16 pages, 4 figures, 1 table

    MSC Class: 55N31; 68U05; 92E10; 62H30; 55U10 ACM Class: G.2.2; I.1.1; I.5.3; J.2

  36. arXiv:2411.12135  [pdf, other

    stat.ML cs.LG

    Exact Risk Curves of signSGD in High-Dimensions: Quantifying Preconditioning and Noise-Compression Effects

    Authors: Ke Liang Xiao, Noah Marshall, Atish Agarwala, Elliot Paquette

    Abstract: In recent years, signSGD has garnered interest as both a practical optimizer as well as a simple model to understand adaptive optimizers like Adam. Though there is a general consensus that signSGD acts to precondition optimization and reshapes noise, quantitatively understanding these effects in theoretically solvable settings remains difficult. We present an analysis of signSGD in a high dimensio… ▽ More

    Submitted 21 February, 2025; v1 submitted 18 November, 2024; originally announced November 2024.

  37. arXiv:2411.05738  [pdf, other

    cs.CV

    StdGEN: Semantic-Decomposed 3D Character Generation from Single Images

    Authors: Yuze He, Yanning Zhou, Wang Zhao, Zhongkai Wu, Kaiwen Xiao, Wei Yang, Yong-Jin Liu, Xiao Han

    Abstract: We present StdGEN, an innovative pipeline for generating semantically decomposed high-quality 3D characters from single images, enabling broad applications in virtual reality, gaming, and filmmaking, etc. Unlike previous methods which struggle with limited decomposability, unsatisfactory quality, and long optimization times, StdGEN features decomposability, effectiveness and efficiency; i.e., it g… ▽ More

    Submitted 5 March, 2025; v1 submitted 8 November, 2024; originally announced November 2024.

    Comments: CVPR 2025. 13 pages, 10 figures

  38. arXiv:2411.00064  [pdf, other

    cs.SD cs.AI

    The ISCSLP 2024 Conversational Voice Clone (CoVoC) Challenge: Tasks, Results and Findings

    Authors: Kangxiang Xia, Dake Guo, Jixun Yao, Liumeng Xue, Hanzhao Li, Shuai Wang, Zhao Guo, Lei Xie, Qingqing Zhang, Lei Luo, Minghui Dong, Peng Sun

    Abstract: The ISCSLP 2024 Conversational Voice Clone (CoVoC) Challenge aims to benchmark and advance zero-shot spontaneous style voice cloning, particularly focusing on generating spontaneous behaviors in conversational speech. The challenge comprises two tracks: an unconstrained track without limitation on data and model usage, and a constrained track only allowing the use of constrained open-source datase… ▽ More

    Submitted 31 October, 2024; originally announced November 2024.

    Comments: accepted by ISCSLP 2024

  39. arXiv:2410.23815  [pdf, other

    cs.SD cs.AI eess.AS

    The NPU-HWC System for the ISCSLP 2024 Inspirational and Convincing Audio Generation Challenge

    Authors: Dake Guo, Jixun Yao, Xinfa Zhu, Kangxiang Xia, Zhao Guo, Ziyu Zhang, Yao Wang, Jie Liu, Lei Xie

    Abstract: This paper presents the NPU-HWC system submitted to the ISCSLP 2024 Inspirational and Convincing Audio Generation Challenge 2024 (ICAGC). Our system consists of two modules: a speech generator for Track 1 and a background audio generator for Track 2. In Track 1, we employ Single-Codec to tokenize the speech into discrete tokens and use a language-model-based approach to achieve zero-shot speaking… ▽ More

    Submitted 31 October, 2024; originally announced October 2024.

    Comments: accepted by ISCSLP 2024

  40. Customized FinGPT Search Agents Using Foundation Models

    Authors: Felix Tian, Ajay Byadgi, Daniel Kim, Daochen Zha, Matt White, Kairong Xiao, Xiao-Yang Liu Yanglet

    Abstract: Current large language models (LLMs) have proven useful for analyzing financial data, but most existing models, such as BloombergGPT and FinGPT, lack customization for specific user needs. In this paper, we address this gap by developing FinGPT Search Agents tailored for two types of users: individuals and institutions. For individuals, we leverage Retrieval-Augmented Generation (RAG) to integrate… ▽ More

    Submitted 20 October, 2024; originally announced October 2024.

    Journal ref: 5th ACM International Conference on AI in Finance, 2024

  41. arXiv:2410.12425  [pdf, other

    cs.LG

    Perseus: Leveraging Common Data Patterns with Curriculum Learning for More Robust Graph Neural Networks

    Authors: Kaiwen Xia, Huijun Wu, Duanyu Li, Min Xie, Ruibo Wang, Wenzhe Zhang

    Abstract: Graph Neural Networks (GNNs) excel at handling graph data but remain vulnerable to adversarial attacks. Existing defense methods typically rely on assumptions like graph sparsity and homophily to either preprocess the graph or guide structure learning. However, preprocessing methods often struggle to accurately distinguish between normal edges and adversarial perturbations, leading to suboptimal r… ▽ More

    Submitted 16 October, 2024; originally announced October 2024.

  42. arXiv:2410.11323  [pdf, other

    cs.LG q-bio.QM

    KA-GNN: Kolmogorov-Arnold Graph Neural Networks for Molecular Property Prediction

    Authors: Longlong Li, Yipeng Zhang, Guanghui Wang, Kelin Xia

    Abstract: As key models in geometric deep learning, graph neural networks have demonstrated enormous power in molecular data analysis. Recently, a specially-designed learning scheme, known as Kolmogorov-Arnold Network (KAN), shows unique potential for the improvement of model accuracy, efficiency, and explainability. Here we propose the first non-trivial Kolmogorov-Arnold Network-based Graph Neural Networks… ▽ More

    Submitted 18 December, 2024; v1 submitted 15 October, 2024; originally announced October 2024.

  43. arXiv:2410.04765  [pdf, other

    cond-mat.mtrl-sci cs.AI cs.LG

    Molecular topological deep learning for polymer property prediction

    Authors: Cong Shen, Yipeng Zhang, Fei Han, Kelin Xia

    Abstract: Accurate and efficient prediction of polymer properties is of key importance for polymer design. Traditional experimental tools and density function theory (DFT)-based simulations for polymer property evaluation, are both expensive and time-consuming. Recently, a gigantic amount of graph-based molecular models have emerged and demonstrated huge potential in molecular data analysis. Even with the g… ▽ More

    Submitted 7 October, 2024; originally announced October 2024.

  44. arXiv:2408.09920  [pdf, other

    cs.CV cs.MM eess.IV

    Sliced Maximal Information Coefficient: A Training-Free Approach for Image Quality Assessment Enhancement

    Authors: Kang Xiao, Xu Wang, Yulin He, Baoliang Chen, Xuelin Shen

    Abstract: Full-reference image quality assessment (FR-IQA) models generally operate by measuring the visual differences between a degraded image and its reference. However, existing FR-IQA models including both the classical ones (eg, PSNR and SSIM) and deep-learning based measures (eg, LPIPS and DISTS) still exhibit limitations in capturing the full perception characteristics of the human visual system (HV… ▽ More

    Submitted 19 August, 2024; originally announced August 2024.

    Comments: 6 pages, 5 figures, accepted by ICME2024

  45. arXiv:2407.16996  [pdf, other

    cs.CE math.AT

    Quotient complex (QC)-based machine learning for 2D perovskite design

    Authors: Chuan-Shen Hu, Rishikanta Mayengbam, Kelin Xia, Tze Chien Sum

    Abstract: With remarkable stability and exceptional optoelectronic properties, two-dimensional (2D) halide layered perovskites hold immense promise for revolutionizing photovoltaic technology. Presently, inadequate representations have substantially impeded the design and discovery of 2D perovskites. In this context, we introduce a novel computational topology framework termed the quotient complex (QC), whi… ▽ More

    Submitted 24 July, 2024; originally announced July 2024.

  46. arXiv:2407.14177  [pdf, other

    cs.CV

    EVLM: An Efficient Vision-Language Model for Visual Understanding

    Authors: Kaibing Chen, Dong Shen, Hanwen Zhong, Huasong Zhong, Kui Xia, Di Xu, Wei Yuan, Yifei Hu, Bin Wen, Tianke Zhang, Changyi Liu, Dewen Fan, Huihui Xiao, Jiahong Wu, Fan Yang, Size Li, Di Zhang

    Abstract: In the field of multi-modal language models, the majority of methods are built on an architecture similar to LLaVA. These models use a single-layer ViT feature as a visual prompt, directly feeding it into the language models alongside textual tokens. However, when dealing with long sequences of visual signals or inputs such as videos, the self-attention mechanism of language models can lead to sig… ▽ More

    Submitted 19 July, 2024; originally announced July 2024.

  47. arXiv:2407.08974  [pdf, other

    q-bio.QM cs.LG math.GN q-bio.BM

    Topology-enhanced machine learning model (Top-ML) for anticancer peptide prediction

    Authors: Joshua Zhi En Tan, JunJie Wee, Xue Gong, Kelin Xia

    Abstract: Recently, therapeutic peptides have demonstrated great promise for cancer treatment. To explore powerful anticancer peptides, artificial intelligence (AI)-based approaches have been developed to systematically screen potential candidates. However, the lack of efficient featurization of peptides has become a bottleneck for these machine-learning models. In this paper, we propose a topology-enhanced… ▽ More

    Submitted 15 April, 2025; v1 submitted 12 July, 2024; originally announced July 2024.

    MSC Class: 62P10; 92C40; 68T07; 55U10 ACM Class: J.3; I.2.6

  48. arXiv:2406.13265  [pdf, other

    cs.LG cond-mat.mtrl-sci

    Molecule Graph Networks with Many-body Equivariant Interactions

    Authors: Zetian Mao, Chuan-Shen Hu, Jiawen Li, Chen Liang, Diptesh Das, Masato Sumita, Kelin Xia, Koji Tsuda

    Abstract: Message passing neural networks have demonstrated significant efficacy in predicting molecular interactions. Introducing equivariant vectorial representations augments expressivity by capturing geometric data symmetries, thereby improving model accuracy. However, two-body bond vectors in opposition may cancel each other out during message passing, leading to the loss of directional information on… ▽ More

    Submitted 21 January, 2025; v1 submitted 19 June, 2024; originally announced June 2024.

  49. arXiv:2406.11733  [pdf, other

    stat.ML cs.LG

    To Clip or not to Clip: the Dynamics of SGD with Gradient Clipping in High-Dimensions

    Authors: Noah Marshall, Ke Liang Xiao, Atish Agarwala, Elliot Paquette

    Abstract: The success of modern machine learning is due in part to the adaptive optimization methods that have been developed to deal with the difficulties of training large models over complex datasets. One such method is gradient clipping: a practical procedure with limited theoretical underpinnings. In this work, we study clipping in a least squares problem under streaming SGD. We develop a theoretical a… ▽ More

    Submitted 6 October, 2024; v1 submitted 17 June, 2024; originally announced June 2024.

  50. arXiv:2406.04680  [pdf, other

    eess.IV cs.CV

    MTS-Net: Dual-Enhanced Positional Multi-Head Self-Attention for 3D CT Diagnosis of May-Thurner Syndrome

    Authors: Yixin Huang, Yiqi Jin, Ke Tao, Kaijian Xia, Jianfeng Gu, Lei Yu, Lan Du, Cunjian Chen

    Abstract: May-Thurner Syndrome (MTS), also known as iliac vein compression syndrome or Cockett's syndrome, is a condition potentially impacting over 20 percent of the population, leading to an increased risk of iliofemoral deep venous thrombosis. In this paper, we present a 3D-based deep learning approach called MTS-Net for diagnosing May-Thurner Syndrome using CT scans. To effectively capture the spatial-t… ▽ More

    Submitted 7 June, 2024; originally announced June 2024.