Skip to main content

Showing 1–50 of 106 results for author: Ji, L

Searching in archive cs. Search in all archives.
.
  1. arXiv:2507.02454  [pdf, ps, other

    cs.CV

    Weakly-supervised Contrastive Learning with Quantity Prompts for Moving Infrared Small Target Detection

    Authors: Weiwei Duan, Luping Ji, Shengjia Chen, Sicheng Zhu, Jianghong Huang, Mao Ye

    Abstract: Different from general object detection, moving infrared small target detection faces huge challenges due to tiny target size and weak background contrast.Currently, most existing methods are fully-supervised, heavily relying on a large number of manual target-wise annotations. However, manually annotating video sequences is often expensive and time-consuming, especially for low-quality infrared f… ▽ More

    Submitted 3 July, 2025; originally announced July 2025.

  2. arXiv:2506.23127  [pdf, ps, other

    cs.CL cs.AI

    Unleashing Embodied Task Planning Ability in LLMs via Reinforcement Learning

    Authors: Zhaoye Fei, Li Ji, Siyin Wang, Junhao Shi, Jingjing Gong, Xipeng Qiu

    Abstract: Large Language Models (LLMs) have demonstrated remarkable capabilities across various tasks, yet they face significant challenges in embodied task planning scenarios that require continuous environmental understanding and action generation. Existing approaches generate open-loop action scripts based on static knowledge, making it difficult to learn causal relationships between actions and environm… ▽ More

    Submitted 29 June, 2025; originally announced June 2025.

  3. arXiv:2506.17892  [pdf, ps, other

    cs.CV cs.LG

    BeltCrack: the First Sequential-image Industrial Conveyor Belt Crack Detection Dataset and Its Baseline with Triple-domain Feature Learning

    Authors: Jianghong Huang, Luping Ji, Xin Ma, Mao Ye

    Abstract: Conveyor belts are important equipment in modern industry, widely applied in production and manufacturing. Their health is much critical to operational efficiency and safety. Cracks are a major threat to belt health. Currently, considering safety, how to intelligently detect belt cracks is catching an increasing attention. To implement the intelligent detection with machine learning, real crack sa… ▽ More

    Submitted 24 June, 2025; v1 submitted 21 June, 2025; originally announced June 2025.

    Comments: 14 pages, 10 figures

  4. arXiv:2506.14907  [pdf, ps, other

    cs.CV cs.AI

    PeRL: Permutation-Enhanced Reinforcement Learning for Interleaved Vision-Language Reasoning

    Authors: Yizhen Zhang, Yang Ding, Shuoshuo Zhang, Xinchen Zhang, Haoling Li, Zhong-zhi Li, Peijie Wang, Jie Wu, Lei Ji, Yelong Shen, Yujiu Yang, Yeyun Gong

    Abstract: Inspired by the impressive reasoning capabilities demonstrated by reinforcement learning approaches like DeepSeek-R1, recent emerging research has begun exploring the use of reinforcement learning (RL) to enhance vision-language models (VLMs) for multimodal reasoning tasks. However, most existing multimodal reinforcement learning approaches remain limited to spatial reasoning within single-image c… ▽ More

    Submitted 17 June, 2025; originally announced June 2025.

  5. arXiv:2506.02678  [pdf, ps, other

    cs.CL cs.CE math.NA

    TL;DR: Too Long, Do Re-weighting for Efficient LLM Reasoning Compression

    Authors: Zhong-Zhi Li, Xiao Liang, Zihao Tang, Lei Ji, Peijie Wang, Haotian Xu, Xing W, Haizhen Huang, Weiwei Deng, Yeyun Gong, Zhijiang Guo, Xiao Liu, Fei Yin, Cheng-Lin Liu

    Abstract: Large Language Models (LLMs) have recently achieved remarkable progress by leveraging Reinforcement Learning and extended Chain-of-Thought (CoT) techniques. However, the challenge of performing efficient language reasoning--especially during inference with extremely long outputs--has drawn increasing attention from the research community. In this work, we propose a dynamic ratio-based training pip… ▽ More

    Submitted 14 June, 2025; v1 submitted 3 June, 2025; originally announced June 2025.

  6. arXiv:2505.06987  [pdf, ps, other

    cs.CL cs.AI

    Convert Language Model into a Value-based Strategic Planner

    Authors: Xiaoyu Wang, Yue Zhao, Qingqing Gu, Zhonglin Jiang, Xiaokai Chen, Yong Chen, Luo Ji

    Abstract: Emotional support conversation (ESC) aims to alleviate the emotional distress of individuals through effective conversations. Although large language models (LLMs) have obtained remarkable progress on ESC, most of these studies might not define the diagram from the state model perspective, therefore providing a suboptimal solution for long-term satisfaction. To address such an issue, we leverage t… ▽ More

    Submitted 17 June, 2025; v1 submitted 11 May, 2025; originally announced May 2025.

    Comments: 13 pages, 6 figures, Accepted by ACL 2025 Industry Track

  7. arXiv:2505.01022  [pdf, other

    cs.SE

    Detecting the Root Cause Code Lines in Bug-Fixing Commits by Heterogeneous Graph Learning

    Authors: Liguo Ji, Chenchen Li, Shenglin Wang, Furui Zhan

    Abstract: With the continuous growth in the scale and complexity of software systems, defect remediation has become increasingly difficult and costly. Automated defect prediction tools can proactively identify software changes prone to defects within software projects, thereby enhancing software development efficiency. However, existing work in heterogeneous and complex software projects continues to face c… ▽ More

    Submitted 13 May, 2025; v1 submitted 2 May, 2025; originally announced May 2025.

  8. arXiv:2504.13993  [pdf, other

    cs.IR cs.AI cs.LG

    CPR: Leveraging LLMs for Topic and Phrase Suggestion to Facilitate Comprehensive Product Reviews

    Authors: Ekta Gujral, Apurva Sinha, Lishi Ji, Bijayani Sanghamitra Mishra

    Abstract: Consumers often heavily rely on online product reviews, analyzing both quantitative ratings and textual descriptions to assess product quality. However, existing research hasn't adequately addressed how to systematically encourage the creation of comprehensive reviews that capture both customers sentiment and detailed product feature analysis. This paper presents CPR, a novel methodology that leve… ▽ More

    Submitted 18 April, 2025; originally announced April 2025.

  9. arXiv:2504.11837  [pdf, other

    cs.CL cs.AI

    FiSMiness: A Finite State Machine Based Paradigm for Emotional Support Conversations

    Authors: Yue Zhao, Qingqing Gu, Xiaoyu Wang, Teng Chen, Zhonglin Jiang, Yong Chen, Luo Ji

    Abstract: Emotional support conversation (ESC) aims to alleviate the emotional distress of individuals through effective conversations. Although large language models (LLMs) have obtained remarkable progress on ESC, most of these studies might not define the diagram from the state model perspective, therefore providing a suboptimal solution for long-term satisfaction. To address such an issue, we leverage t… ▽ More

    Submitted 16 April, 2025; originally announced April 2025.

    Comments: accepted by CMCL

  10. arXiv:2503.21802  [pdf

    stat.AP cs.LG stat.ML

    Structured and sparse partial least squares coherence for multivariate cortico-muscular analysis

    Authors: Jingyao Sun, Qilu Zhang, Di Ma, Tianyu Jia, Shijie Jia, Xiaoxue Zhai, Ruimou Xie, Ping-Ju Lin, Zhibin Li, Yu Pan, Linhong Ji, Chong Li

    Abstract: Multivariate cortico-muscular analysis has recently emerged as a promising approach for evaluating the corticospinal neural pathway. However, current multivariate approaches encounter challenges such as high dimensionality and limited sample sizes, thus restricting their further applications. In this paper, we propose a structured and sparse partial least squares coherence algorithm (ssPLSC) to ex… ▽ More

    Submitted 14 June, 2025; v1 submitted 24 March, 2025; originally announced March 2025.

    Comments: This work has been submitted to the IEEE for possible publication

  11. arXiv:2503.19123  [pdf, other

    cs.CL cs.AI

    Overcoming Vocabulary Mismatch: Vocabulary-agnostic Teacher Guided Language Modeling

    Authors: Haebin Shin, Lei Ji, Xiao Liu, Yeyun Gong

    Abstract: Using large teacher models to guide the training of smaller student models has become the prevailing paradigm for efficient and effective learning. However, vocabulary mismatches between teacher and student language models pose significant challenges in language modeling, resulting in divergent token sequences and output distributions. To overcome these limitations, we propose Vocabulary-agnostic… ▽ More

    Submitted 24 March, 2025; originally announced March 2025.

  12. arXiv:2503.16068  [pdf, other

    cs.CV

    PoseTraj: Pose-Aware Trajectory Control in Video Diffusion

    Authors: Longbin Ji, Lei Zhong, Pengfei Wei, Changjian Li

    Abstract: Recent advancements in trajectory-guided video generation have achieved notable progress. However, existing models still face challenges in generating object motions with potentially changing 6D poses under wide-range rotations, due to limited 3D understanding. To address this problem, we introduce PoseTraj, a pose-aware video dragging model for generating 3D-aligned motion from 2D trajectories. O… ▽ More

    Submitted 20 March, 2025; originally announced March 2025.

    Comments: Code, data and project page: https://robingg1.github.io/Pose-Traj/

  13. arXiv:2503.01699  [pdf, other

    cs.CE

    Camera Measurement of Blood Oxygen Saturation

    Authors: Jiankai Tang, Xin Liu, Daniel McDuff, Zhang Jiang, Hongming Hu, Luxi Zhou, Nodoka Nagao, Haruta Suzuki, Yuki Nagahama, Wei Li, Linhong Ji, Yuanchun Shi, Izumi Nishidate, Yuntao Wang

    Abstract: Blood oxygen saturation (SpO2) is a crucial vital sign routinely monitored in medical settings. Traditional methods require dedicated contact sensors, limiting accessibility and comfort. This study presents a deep learning framework for contactless SpO2 measurement using an off-the-shelf camera, addressing challenges related to lighting variations and skin tone diversity. We conducted two large-sc… ▽ More

    Submitted 3 March, 2025; originally announced March 2025.

  14. arXiv:2502.06269  [pdf, other

    cs.IR

    Progressive Collaborative and Semantic Knowledge Fusion for Generative Recommendation

    Authors: Longtao Xiao, Haozhao Wang, Cheng Wang, Linfei Ji, Yifan Wang, Jieming Zhu, Zhenhua Dong, Rui Zhang, Ruixuan Li

    Abstract: With the recent surge in interest surrounding generative paradigms, generative recommendation has increasingly attracted the attention of researchers in the recommendation community. This paradigm generally consists of two stages. In the first stage, pretrained semantic embeddings or collaborative ID embeddings are quantized to create item codes, aiming to capture and preserve rich semantic or col… ▽ More

    Submitted 10 February, 2025; originally announced February 2025.

  15. arXiv:2502.06178  [pdf, other

    math.OC cs.LG stat.ML

    Bayesian Optimization by Kernel Regression and Density-based Exploration

    Authors: Tansheng Zhu, Hongyu Zhou, Ke Jin, Xusheng Xu, Qiufan Yuan, Lijie Ji

    Abstract: Bayesian optimization is highly effective for optimizing expensive-to-evaluate black-box functions, but it faces significant computational challenges due to the high computational complexity of Gaussian processes, which results in a total time complexity that is quartic with respect to the number of iterations. To address this limitation, we propose the Bayesian Optimization by Kernel regression a… ▽ More

    Submitted 15 March, 2025; v1 submitted 10 February, 2025; originally announced February 2025.

  16. arXiv:2412.18966  [pdf, other

    cs.CV cs.AI cs.LG

    ModelGrow: Continual Text-to-Video Pre-training with Model Expansion and Language Understanding Enhancement

    Authors: Zhefan Rao, Liya Ji, Yazhou Xing, Runtao Liu, Zhaoyang Liu, Jiaxin Xie, Ziqiao Peng, Yingqing He, Qifeng Chen

    Abstract: Text-to-video (T2V) generation has gained significant attention recently. However, the costs of training a T2V model from scratch remain persistently high, and there is considerable room for improving the generation performance, especially under limited computation resources. This work explores the continual general pre-training of text-to-video models, enabling the model to "grow" its abilities b… ▽ More

    Submitted 25 December, 2024; originally announced December 2024.

    Comments: 18 pages

  17. arXiv:2412.16615  [pdf, other

    cs.IR cs.CL cs.LG

    Large Language Model Can Be a Foundation for Hidden Rationale-Based Retrieval

    Authors: Luo Ji, Feixiang Guo, Teng Chen, Qingqing Gu, Xiaoyu Wang, Ningyuan Xi, Yihong Wang, Peng Yu, Yue Zhao, Hongyang Lei, Zhonglin Jiang, Yong Chen

    Abstract: Despite the recent advancement in Retrieval-Augmented Generation (RAG) systems, most retrieval methodologies are often developed for factual retrieval, which assumes query and positive documents are semantically similar. In this paper, we instead propose and study a more challenging type of retrieval task, called hidden rationale retrieval, in which query and document are not similar but can be in… ▽ More

    Submitted 9 April, 2025; v1 submitted 21 December, 2024; originally announced December 2024.

    Comments: 10 pages, 3 figures, ECIR 2025

  18. arXiv:2412.11466  [pdf, other

    cs.LG stat.ML

    Mining In-distribution Attributes in Outliers for Out-of-distribution Detection

    Authors: Yutian Lei, Luping Ji, Pei Liu

    Abstract: Out-of-distribution (OOD) detection is indispensable for deploying reliable machine learning systems in real-world scenarios. Recent works, using auxiliary outliers in training, have shown good potential. However, they seldom concern the intrinsic correlations between in-distribution (ID) and OOD data. In this work, we discover an obvious correlation that OOD data usually possesses significant ID… ▽ More

    Submitted 16 December, 2024; originally announced December 2024.

    Comments: Accepted by AAAI2025

  19. arXiv:2412.05342  [pdf, ps, other

    cs.CL cs.AI

    Multi-Party Supervised Fine-tuning of Language Models for Multi-Party Dialogue Generation

    Authors: Xiaoyu Wang, Ningyuan Xi, Teng Chen, Qingqing Gu, Yue Zhao, Xiaokai Chen, Zhonglin Jiang, Yong Chen, Luo Ji

    Abstract: Large Language Models (LLM) are usually fine-tuned to participate in dyadic or two-party dialogues, which can not adapt well to multi-party dialogues (MPD), which hinders their applications in such scenarios including multi-personal meetings, discussions and daily communication. Previous LLM-based researches mainly focus on the multi-agent framework, while their base LLMs are still pairwisely fine… ▽ More

    Submitted 11 June, 2025; v1 submitted 6 December, 2024; originally announced December 2024.

    Comments: Accepted by IJCNN 2025

  20. arXiv:2411.15927  [pdf, other

    cs.CL cs.AI

    Generative Prompt Internalization

    Authors: Haebin Shin, Lei Ji, Yeyun Gong, Sungdong Kim, Eunbi Choi, Minjoon Seo

    Abstract: Prompts used in recent large language model based applications are often fixed and lengthy, leading to significant computational overhead. To address this challenge, we propose Generative Prompt Internalization (GenPI), a lightweight method that employs a joint training approach. GenPI not only replicates the behavior of models with prompt inputs but also generates the content of the prompt along… ▽ More

    Submitted 24 March, 2025; v1 submitted 24 November, 2024; originally announced November 2024.

    Comments: NAACL 2025 (Main Conference)

  21. arXiv:2411.06655  [pdf, other

    cs.CL cs.AI

    Explore the Reasoning Capability of LLMs in the Chess Testbed

    Authors: Shu Wang, Lei Ji, Renxi Wang, Wenxiao Zhao, Haokun Liu, Yifan Hou, Ying Nian Wu

    Abstract: Reasoning is a central capability of human intelligence. In recent years, with the advent of large-scale datasets, pretrained large language models have emerged with new capabilities, including reasoning. However, these models still struggle with long-term, complex reasoning tasks, such as playing chess. Based on the observation that expert chess players employ a dual approach combining long-term… ▽ More

    Submitted 28 February, 2025; v1 submitted 10 November, 2024; originally announced November 2024.

    Comments: NAACL2025 Main Conference. Data and models are available: https://mate-chess.github.io/

  22. arXiv:2410.23231  [pdf, other

    cs.CV

    LGU-SLAM: Learnable Gaussian Uncertainty Matching with Deformable Correlation Sampling for Deep Visual SLAM

    Authors: Yucheng Huang, Luping Ji, Hudong Liu, Mao Ye

    Abstract: Deep visual Simultaneous Localization and Mapping (SLAM) techniques, e.g., DROID, have made significant advancements by leveraging deep visual odometry on dense flow fields. In general, they heavily rely on global visual similarity matching. However, the ambiguous similarity interference in uncertain regions could often lead to excessive noise in correspondences, ultimately misleading SLAM in geom… ▽ More

    Submitted 30 October, 2024; originally announced October 2024.

  23. Queryable Prototype Multiple Instance Learning with Vision-Language Models for Incremental Whole Slide Image Classification

    Authors: Jiaxiang Gou, Luping Ji, Pei Liu, Mao Ye

    Abstract: Whole Slide Image (WSI) classification has very significant applications in clinical pathology, e.g., tumor identification and cancer diagnosis. Currently, most research attention is focused on Multiple Instance Learning (MIL) using static datasets. One of the most obvious weaknesses of these methods is that they cannot efficiently preserve and utilize previously learned knowledge. With any new da… ▽ More

    Submitted 25 December, 2024; v1 submitted 14 October, 2024; originally announced October 2024.

    Comments: Accepted by AAAI 2025

  24. arXiv:2410.03439  [pdf, other

    cs.CL

    ToolGen: Unified Tool Retrieval and Calling via Generation

    Authors: Renxi Wang, Xudong Han, Lei Ji, Shu Wang, Timothy Baldwin, Haonan Li

    Abstract: As large language models (LLMs) advance, their inability to autonomously execute tasks by directly interacting with external tools remains a critical limitation. Traditional methods rely on inputting tool descriptions as context, which is constrained by context length and requires separate, often inefficient, retrieval mechanisms. We introduce ToolGen, a paradigm shift that integrates tool knowled… ▽ More

    Submitted 29 March, 2025; v1 submitted 4 October, 2024; originally announced October 2024.

    Comments: ICLR 2025

    ACM Class: I.2.7

  25. arXiv:2409.12059  [pdf, other

    cs.CL cs.AI cs.LG

    MeTHanol: Modularized Thinking Language Models with Intermediate Layer Thinking, Decoding and Bootstrapping Reasoning

    Authors: Ningyuan Xi, Xiaoyu Wang, Yetao Wu, Teng Chen, Qingqing Gu, Yue Zhao, Jinxian Qu, Zhonglin Jiang, Yong Chen, Luo Ji

    Abstract: Large Language Model can reasonably understand and generate human expressions but may lack of thorough thinking and reasoning mechanisms. Recently there have been several studies which enhance the thinking ability of language models but most of them are not data-driven or training-based. In this paper, we are motivated by the cognitive mechanism in the natural world, and design a novel model archi… ▽ More

    Submitted 25 April, 2025; v1 submitted 18 September, 2024; originally announced September 2024.

    Comments: 19 pages, 7 figures

  26. arXiv:2409.09369  [pdf, other

    cs.CV

    Interpretable Vision-Language Survival Analysis with Ordinal Inductive Bias for Computational Pathology

    Authors: Pei Liu, Luping Ji, Jiaxiang Gou, Bo Fu, Mao Ye

    Abstract: Histopathology Whole-Slide Images (WSIs) provide an important tool to assess cancer prognosis in computational pathology (CPATH). While existing survival analysis (SA) approaches have made exciting progress, they are generally limited to adopting highly-expressive network architectures and only coarse-grained patient-level labels to learn visual prognostic representations from gigapixel WSIs. Such… ▽ More

    Submitted 11 February, 2025; v1 submitted 14 September, 2024; originally announced September 2024.

    Comments: Accepted to ICLR 2025

  27. arXiv:2409.07416  [pdf, other

    cs.IR cs.AI cs.LG

    Hierarchical Reinforcement Learning for Temporal Abstraction of Listwise Recommendation

    Authors: Luo Ji, Gao Liu, Mingyang Yin, Hongxia Yang, Jingren Zhou

    Abstract: Modern listwise recommendation systems need to consider both long-term user perceptions and short-term interest shifts. Reinforcement learning can be applied on recommendation to study such a problem but is also subject to large search space, sparse user feedback and long interactive latency. Motivated by recent progress in hierarchical reinforcement learning, we propose a novel framework called m… ▽ More

    Submitted 11 September, 2024; originally announced September 2024.

    Comments: 18 pages, 4 figures

  28. arXiv:2409.07341  [pdf, other

    cs.LG cs.AI cs.RO

    Online Decision MetaMorphFormer: A Casual Transformer-Based Reinforcement Learning Framework of Universal Embodied Intelligence

    Authors: Luo Ji, Runji Lin

    Abstract: Interactive artificial intelligence in the motion control field is an interesting topic, especially when universal knowledge is adaptive to multiple tasks and universal environments. Despite there being increasing efforts in the field of Reinforcement Learning (RL) with the aid of transformers, most of them might be limited by the offline training pipeline, which prohibits exploration and generali… ▽ More

    Submitted 11 September, 2024; originally announced September 2024.

    Comments: 12 pages, 6 figures

  29. arXiv:2409.06624  [pdf, other

    cs.CL cs.AI cs.LG

    A Practice of Post-Training on Llama-3 70B with Optimal Selection of Additional Language Mixture Ratio

    Authors: Ningyuan Xi, Yetao Wu, Kun Fan, Teng Chen, Qingqing Gu, Peng Yu, Jinxian Qu, Chenxi Liu, Zhonglin Jiang, Yong Chen, Luo Ji

    Abstract: Large Language Models (LLM) often needs to be Continual Pre-Trained (CPT) to obtain the unfamiliar language skill or adapt into new domains. The huge training cost of CPT often asks for cautious choice of key hyper-parameters such as the mixture ratio of extra language or domain corpus. However, there is no systematic study which bridge the gap between the optimal mixture ratio and the actual mode… ▽ More

    Submitted 10 September, 2024; originally announced September 2024.

    Comments: 11 pages, 4 figures

  30. arXiv:2409.06601  [pdf, other

    cs.CL cs.LG

    LaMsS: When Large Language Models Meet Self-Skepticism

    Authors: Yetao Wu, Yihong Wang, Teng Chen, Ningyuan Xi, Qingqing Gu, Hongyang Lei, Luo Ji

    Abstract: Hallucination is a major challenge for large language models (LLMs), preventing their further application in some fields. The skeptical thinking of humankind could be useful for LLMs to self-cognition, self-reflection and alleviate their hallucinations. Inspired by this consideration, we propose a novel approach called LaMsS, which combines the semantic understanding capability of LLMs with self-s… ▽ More

    Submitted 25 April, 2025; v1 submitted 10 September, 2024; originally announced September 2024.

    Comments: 11 pages, 6 figures, ICLR 2025 Workshop SSI-FM,

  31. arXiv:2409.05929  [pdf, ps, other

    cs.LG cs.AI

    M3-JEPA: Multimodal Alignment via Multi-gate MoE based on the Joint-Embedding Predictive Architecture

    Authors: Hongyang Lei, Xiaolong Cheng, Qi Qin, Dan Wang, Kun Fan, Huazhen Huang, Qingqing Gu, Yetao Wu, Zhonglin Jiang, Yong Chen, Luo Ji

    Abstract: Current multimodal learning strategies primarily optimize in the original token space. Such a framework is easy to incorporate with the backbone of pretrained language model, but might result in modality collapse. To alleviate such issues, we leverage the Joint-Embedding Predictive Architecture (JEPA) on the multimodal tasks, which converts the input embedding into the output embedding space by a… ▽ More

    Submitted 18 June, 2025; v1 submitted 9 September, 2024; originally announced September 2024.

    Comments: 16 pages, 5 figures. ICML 2025

  32. arXiv:2409.01908  [pdf, other

    stat.ME cs.LG q-fin.ST stat.AP stat.ML

    Bayesian CART models for aggregate claim modeling

    Authors: Yaojun Zhang, Lanpeng Ji, Georgios Aivaliotis, Charles C. Taylor

    Abstract: This paper proposes three types of Bayesian CART (or BCART) models for aggregate claim amount, namely, frequency-severity models, sequential models and joint models. We propose a general framework for the BCART models applicable to data with multivariate responses, which is particularly useful for the joint BCART models with a bivariate response: the number of claims and aggregate claim amount. To… ▽ More

    Submitted 3 September, 2024; originally announced September 2024.

  33. arXiv:2408.03060  [pdf

    cs.CV cs.GR

    MGFs: Masked Gaussian Fields for Meshing Building based on Multi-View Images

    Authors: Tengfei Wang, Zongqian Zhan, Rui Xia, Linxia Ji, Xin Wang

    Abstract: Over the last few decades, image-based building surface reconstruction has garnered substantial research interest and has been applied across various fields, such as heritage preservation, architectural planning, etc. Compared to the traditional photogrammetric and NeRF-based solutions, recently, Gaussian fields-based methods have exhibited significant potential in generating surface meshes due to… ▽ More

    Submitted 6 August, 2024; originally announced August 2024.

  34. arXiv:2407.07289  [pdf, other

    cs.CV

    Deformable Feature Alignment and Refinement for Moving Infrared Dim-small Target Detection

    Authors: Dengyan Luo, Yanping Xiang, Hu Wang, Luping Ji, Shuai Li, Mao Ye

    Abstract: The detection of moving infrared dim-small targets has been a challenging and prevalent research topic. The current state-of-the-art methods are mainly based on ConvLSTM to aggregate information from adjacent frames to facilitate the detection of the current frame. However, these methods implicitly utilize motion information only in the training stage and fail to explicitly explore motion compensa… ▽ More

    Submitted 9 July, 2024; originally announced July 2024.

  35. Triple-domain Feature Learning with Frequency-aware Memory Enhancement for Moving Infrared Small Target Detection

    Authors: Weiwei Duan, Luping Ji, Shengjia Chen, Sicheng Zhu, Mao Ye

    Abstract: As a sub-field of object detection, moving infrared small target detection presents significant challenges due to tiny target sizes and low contrast against backgrounds. Currently-existing methods primarily rely on the features extracted only from spatio-temporal domain. Frequency domain has hardly been concerned yet, although it has been widely applied in image processing. To extend feature sourc… ▽ More

    Submitted 5 September, 2024; v1 submitted 11 June, 2024; originally announced June 2024.

    Comments: This paper has accepted IEEE TGRS

    Journal ref: IEEE Transactions on Geoscience and Remote Sensing 2024

  36. arXiv:2405.15343  [pdf, other

    cs.CV

    Distinguish Any Fake Videos: Unleashing the Power of Large-scale Data and Motion Features

    Authors: Lichuan Ji, Yingqi Lin, Zhenhua Huang, Yan Han, Xiaogang Xu, Jiafei Wu, Chong Wang, Zhe Liu

    Abstract: The development of AI-Generated Content (AIGC) has empowered the creation of remarkably realistic AI-generated videos, such as those involving Sora. However, the widespread adoption of these models raises concerns regarding potential misuse, including face video scams and copyright disputes. Addressing these concerns requires the development of robust tools capable of accurately determining video… ▽ More

    Submitted 24 May, 2024; originally announced May 2024.

  37. arXiv:2405.07652  [pdf, other

    cs.HC cs.AI

    G-VOILA: Gaze-Facilitated Information Querying in Daily Scenarios

    Authors: Zeyu Wang, Yuanchun Shi, Yuntao Wang, Yuchen Yao, Kun Yan, Yuhan Wang, Lei Ji, Xuhai Xu, Chun Yu

    Abstract: Modern information querying systems are progressively incorporating multimodal inputs like vision and audio. However, the integration of gaze -- a modality deeply linked to user intent and increasingly accessible via gaze-tracking wearables -- remains underexplored. This paper introduces a novel gaze-facilitated information querying paradigm, named G-VOILA, which synergizes users' gaze, visual fie… ▽ More

    Submitted 13 May, 2024; originally announced May 2024.

    Comments: 25 pages, 12 figures

  38. arXiv:2405.04405  [pdf, other

    cs.LG

    Weakly-Supervised Residual Evidential Learning for Multi-Instance Uncertainty Estimation

    Authors: Pei Liu, Luping Ji

    Abstract: Uncertainty estimation (UE), as an effective means of quantifying predictive uncertainty, is crucial for safe and reliable decision-making, especially in high-risk scenarios. Existing UE schemes usually assume that there are completely-labeled samples to support fully-supervised learning. In practice, however, many UE tasks often have no sufficiently-labeled data to use, such as the Multiple Insta… ▽ More

    Submitted 9 May, 2024; v1 submitted 7 May, 2024; originally announced May 2024.

    Comments: ICML 2024

  39. arXiv:2401.11430  [pdf, other

    cs.CV

    Exploring Diffusion Time-steps for Unsupervised Representation Learning

    Authors: Zhongqi Yue, Jiankun Wang, Qianru Sun, Lei Ji, Eric I-Chao Chang, Hanwang Zhang

    Abstract: Representation learning is all about discovering the hidden modular attributes that generate the data faithfully. We explore the potential of Denoising Diffusion Probabilistic Model (DM) in unsupervised learning of the modular attributes. We build a theoretical framework that connects the diffusion time-steps and the hidden attributes, which serves as an effective inductive bias for unsupervised l… ▽ More

    Submitted 21 January, 2024; originally announced January 2024.

    Comments: Accepted by ICLR 2024

  40. arXiv:2401.09454  [pdf, other

    cs.CV cs.AI cs.CL cs.LG

    Voila-A: Aligning Vision-Language Models with User's Gaze Attention

    Authors: Kun Yan, Lei Ji, Zeyu Wang, Yuntao Wang, Nan Duan, Shuai Ma

    Abstract: In recent years, the integration of vision and language understanding has led to significant advancements in artificial intelligence, particularly through Vision-Language Models (VLMs). However, existing VLMs face challenges in handling real-world applications with complex scenes and multiple objects, as well as aligning their focus with the diverse attention patterns of human users. In this paper… ▽ More

    Submitted 22 December, 2023; originally announced January 2024.

  41. arXiv:2312.17072  [pdf, other

    cs.IR cs.LG

    An Adaptive Framework of Geographical Group-Specific Network on O2O Recommendation

    Authors: Luo Ji, Jiayu Mao, Hailong Shi, Qian Li, Yunfei Chu, Hongxia Yang

    Abstract: Online to offline recommendation strongly correlates with the user and service's spatiotemporal information, therefore calling for a higher degree of model personalization. The traditional methodology is based on a uniform model structure trained by collected centralized data, which is unlikely to capture all user patterns over different geographical areas or time periods. To tackle this challenge… ▽ More

    Submitted 28 December, 2023; originally announced December 2023.

    Comments: 7 pages, 4 figures, Accepted by ECIR 2024

  42. arXiv:2312.13108  [pdf, other

    cs.CV

    ASSISTGUI: Task-Oriented Desktop Graphical User Interface Automation

    Authors: Difei Gao, Lei Ji, Zechen Bai, Mingyu Ouyang, Peiran Li, Dongxing Mao, Qinchen Wu, Weichen Zhang, Peiyi Wang, Xiangwu Guo, Hengxu Wang, Luowei Zhou, Mike Zheng Shou

    Abstract: Graphical User Interface (GUI) automation holds significant promise for assisting users with complex tasks, thereby boosting human productivity. Existing works leveraging Large Language Model (LLM) or LLM-based AI agents have shown capabilities in automating tasks on Android and Web platforms. However, these tasks are primarily aimed at simple device usage and entertainment operations. This paper… ▽ More

    Submitted 1 January, 2024; v1 submitted 20 December, 2023; originally announced December 2023.

    Comments: Project Page: https://showlab.github.io/assistgui/

  43. arXiv:2310.18652  [pdf, other

    cs.CL cs.AI cs.CV

    EHRXQA: A Multi-Modal Question Answering Dataset for Electronic Health Records with Chest X-ray Images

    Authors: Seongsu Bae, Daeun Kyung, Jaehee Ryu, Eunbyeol Cho, Gyubok Lee, Sunjun Kweon, Jungwoo Oh, Lei Ji, Eric I-Chao Chang, Tackeun Kim, Edward Choi

    Abstract: Electronic Health Records (EHRs), which contain patients' medical histories in various multi-modal formats, often overlook the potential for joint reasoning across imaging and table modalities underexplored in current EHR Question Answering (QA) systems. In this paper, we introduce EHRXQA, a novel multi-modal question answering dataset combining structured EHRs and chest X-ray images. To develop o… ▽ More

    Submitted 25 December, 2023; v1 submitted 28 October, 2023; originally announced October 2023.

    Comments: Accepted at NeurIPS 2023 Datasets and Benchmarks Track (10 pages for main text, 4 pages for references, 39 pages for supplementary materials)

  44. arXiv:2310.11285  [pdf, ps, other

    cs.DM

    Construction of optimal flag codes by MRD codes

    Authors: Shuangqing Liu, Shuhui Yu, Lijun Ji

    Abstract: Flag codes have received a lot of attention due to its application in random network coding. In 2021, Alonso-González et al. constructed optimal $(n,\mathcal{A})$-Optimum distance flag codes(ODFC) for $\mathcal {A}\subseteq \{1,2,\ldots,k,n-k,\ldots,n-1\}$ with $k\in \mathcal A$ and $k\mid n$. In this paper, we introduce a new construction of $(n,\mathcal A)_q$-ODFCs by maximum rank-metric codes,… ▽ More

    Submitted 11 October, 2024; v1 submitted 17 October, 2023; originally announced October 2023.

    Comments: 23 pages

    MSC Class: 94B99

  45. arXiv:2309.16609  [pdf, other

    cs.CL

    Qwen Technical Report

    Authors: Jinze Bai, Shuai Bai, Yunfei Chu, Zeyu Cui, Kai Dang, Xiaodong Deng, Yang Fan, Wenbin Ge, Yu Han, Fei Huang, Binyuan Hui, Luo Ji, Mei Li, Junyang Lin, Runji Lin, Dayiheng Liu, Gao Liu, Chengqiang Lu, Keming Lu, Jianxin Ma, Rui Men, Xingzhang Ren, Xuancheng Ren, Chuanqi Tan, Sinan Tan , et al. (23 additional authors not shown)

    Abstract: Large language models (LLMs) have revolutionized the field of artificial intelligence, enabling natural language processing tasks that were previously thought to be exclusive to humans. In this work, we introduce Qwen, the first installment of our large language model series. Qwen is a comprehensive language model series that encompasses distinct models with varying parameter counts. It includes Q… ▽ More

    Submitted 28 September, 2023; originally announced September 2023.

    Comments: 59 pages, 5 figures

  46. arXiv:2309.07141  [pdf

    eess.SP cs.AI cs.LG

    Design of Recognition and Evaluation System for Table Tennis Players' Motor Skills Based on Artificial Intelligence

    Authors: Zhuo-yong Shi, Ye-tao Jia, Ke-xin Zhang, Ding-han Wang, Long-meng Ji, Yong Wu

    Abstract: With the rapid development of electronic science and technology, the research on wearable devices is constantly updated, but for now, it is not comprehensive for wearable devices to recognize and analyze the movement of specific sports. Based on this, this paper improves wearable devices of table tennis sport, and realizes the pattern recognition and evaluation of table tennis players' motor skill… ▽ More

    Submitted 4 September, 2023; originally announced September 2023.

    Comments: 34pages, 16figures

    MSC Class: 93-01 ACM Class: G.1; H.4

  47. arXiv:2308.15016  [pdf, other

    cs.CV

    C2G2: Controllable Co-speech Gesture Generation with Latent Diffusion Model

    Authors: Longbin Ji, Pengfei Wei, Yi Ren, Jinglin Liu, Chen Zhang, Xiang Yin

    Abstract: Co-speech gesture generation is crucial for automatic digital avatar animation. However, existing methods suffer from issues such as unstable training and temporal inconsistency, particularly in generating high-fidelity and comprehensive gestures. Additionally, these methods lack effective control over speaker identity and temporal editing of the generated gestures. Focusing on capturing temporal… ▽ More

    Submitted 29 August, 2023; originally announced August 2023.

    Comments: 12 pages, 6 figures, 7 tables

  48. arXiv:2307.07893  [pdf, other

    cs.CV cs.AI cs.LG eess.IV

    Anomaly Detection in Automated Fibre Placement: Learning with Data Limitations

    Authors: Assef Ghamisi, Todd Charter, Li Ji, Maxime Rivard, Gil Lund, Homayoun Najjaran

    Abstract: Conventional defect detection systems in Automated Fibre Placement (AFP) typically rely on end-to-end supervised learning, necessitating a substantial number of labelled defective samples for effective training. However, the scarcity of such labelled data poses a challenge. To overcome this limitation, we present a comprehensive framework for defect detection and localization in Automated Fibre Pl… ▽ More

    Submitted 14 August, 2023; v1 submitted 15 July, 2023; originally announced July 2023.

    Journal ref: Frontiers in Manufacturing Technology, 2024, 4, 1277152

  49. arXiv:2307.07409  [pdf, other

    cs.CL cs.AI eess.IV

    KU-DMIS-MSRA at RadSum23: Pre-trained Vision-Language Model for Radiology Report Summarization

    Authors: Gangwoo Kim, Hajung Kim, Lei Ji, Seongsu Bae, Chanhwi Kim, Mujeen Sung, Hyunjae Kim, Kun Yan, Eric Chang, Jaewoo Kang

    Abstract: In this paper, we introduce CheXOFA, a new pre-trained vision-language model (VLM) for the chest X-ray domain. Our model is initially pre-trained on various multimodal datasets within the general domain before being transferred to the chest X-ray domain. Following a prominent VLM, we unify various domain-specific tasks into a simple sequence-to-sequence schema. It enables the model to effectively… ▽ More

    Submitted 10 July, 2023; originally announced July 2023.

    Comments: Published at BioNLP workshop @ ACL 2023

  50. Pseudo-Bag Mixup Augmentation for Multiple Instance Learning-Based Whole Slide Image Classification

    Authors: Pei Liu, Luping Ji, Xinyu Zhang, Feng Ye

    Abstract: Given the special situation of modeling gigapixel images, multiple instance learning (MIL) has become one of the most important frameworks for Whole Slide Image (WSI) classification. In current practice, most MIL networks often face two unavoidable problems in training: i) insufficient WSI data and ii) the sample memorization inclination inherent in neural networks. These problems may hinder MIL m… ▽ More

    Submitted 2 November, 2023; v1 submitted 28 June, 2023; originally announced June 2023.

    Comments: 12 pages, 6 figures, 10 tables