Skip to main content

Showing 1–50 of 105 results for author: Jia, Q

Searching in archive cs. Search in all archives.
.
  1. arXiv:2505.08179  [pdf, ps, other

    cs.LG cs.AI

    Feasibility-Aware Pessimistic Estimation: Toward Long-Horizon Safety in Offline RL

    Authors: Zhikun Tao, Gang Xiong, He Fang, Zhen Shen, Yunjun Han, Qing-Shan Jia

    Abstract: Offline safe reinforcement learning(OSRL) derives constraint-satisfying policies from pre-collected datasets, offers a promising avenue for deploying RL in safety-critical real-world domains such as robotics. However, the majority of existing approaches emphasize only short-term safety, neglecting long-horizon considerations. Consequently, they may violate safety constraints and fail to ensure sus… ▽ More

    Submitted 12 May, 2025; originally announced May 2025.

  2. arXiv:2504.18604  [pdf, other

    cs.AI

    A Cognitive-Mechanistic Human Reliability Analysis Framework: A Nuclear Power Plant Case Study

    Authors: Xingyu Xiao, Peng Chen, Jiejuan Tong, Shunshun Liu, Hongru Zhao, Jun Zhao, Qianqian Jia, Jingang Liang, Haitao Wang

    Abstract: Traditional human reliability analysis (HRA) methods, such as IDHEAS-ECA, rely on expert judgment and empirical rules that often overlook the cognitive underpinnings of human error. Moreover, conducting human-in-the-loop experiments for advanced nuclear power plants is increasingly impractical due to novel interfaces and limited operational data. This study proposes a cognitive-mechanistic framewo… ▽ More

    Submitted 5 May, 2025; v1 submitted 24 April, 2025; originally announced April 2025.

  3. arXiv:2504.12444  [pdf

    eess.SY cs.LG physics.chem-ph

    Enhanced Battery Capacity Estimation in Data-Limited Scenarios through Swarm Learning

    Authors: Jiawei Zhang, Yu Zhang, Wei Xu, Yifei Zhang, Weiran Jiang, Qi Jiao, Yao Ren, Ziyou Song

    Abstract: Data-driven methods have shown potential in electric-vehicle battery management tasks such as capacity estimation, but their deployment is bottlenecked by poor performance in data-limited scenarios. Sharing battery data among algorithm developers can enable accurate and generalizable data-driven models. However, an effective battery management framework that simultaneously ensures data privacy and… ▽ More

    Submitted 16 April, 2025; originally announced April 2025.

    Comments: This paper has been accepted for presentation at the 2025 IEEE Transportation Electrification Conference & Expo (ITEC)

  4. arXiv:2503.10127  [pdf, other

    cs.CV

    PlanGen: Towards Unified Layout Planning and Image Generation in Auto-Regressive Vision Language Models

    Authors: Runze He, Bo Cheng, Yuhang Ma, Qingxiang Jia, Shanyuan Liu, Ao Ma, Xiaoyu Wu, Liebucha Wu, Dawei Leng, Yuhui Yin

    Abstract: In this paper, we propose a unified layout planning and image generation model, PlanGen, which can pre-plan spatial layout conditions before generating images. Unlike previous diffusion-based models that treat layout planning and layout-to-image as two separate models, PlanGen jointly models the two tasks into one autoregressive transformer using only next-token prediction. PlanGen integrates layo… ▽ More

    Submitted 30 March, 2025; v1 submitted 13 March, 2025; originally announced March 2025.

    Comments: 15 pages, 12 figures, project page: https://360cvgroup.github.io/PlanGen

  5. arXiv:2503.06053  [pdf, other

    cs.CV cs.AI

    DropletVideo: A Dataset and Approach to Explore Integral Spatio-Temporal Consistent Video Generation

    Authors: Runze Zhang, Guoguang Du, Xiaochuan Li, Qi Jia, Liang Jin, Lu Liu, Jingjing Wang, Cong Xu, Zhenhua Guo, Yaqian Zhao, Xiaoli Gong, Rengang Li, Baoyu Fan

    Abstract: Spatio-temporal consistency is a critical research topic in video generation. A qualified generated video segment must ensure plot plausibility and coherence while maintaining visual consistency of objects and scenes across varying viewpoints. Prior research, especially in open-source projects, primarily focuses on either temporal or spatial consistency, or their basic combination, such as appendi… ▽ More

    Submitted 7 March, 2025; originally announced March 2025.

  6. arXiv:2503.05064  [pdf, other

    cs.RO cs.AI

    Perceiving, Reasoning, Adapting: A Dual-Layer Framework for VLM-Guided Precision Robotic Manipulation

    Authors: Qingxuan Jia, Guoqin Tang, Zeyuan Huang, Zixuan Hao, Ning Ji, Shihang, Yin, Gang Chen

    Abstract: Vision-Language Models (VLMs) demonstrate remarkable potential in robotic manipulation, yet challenges persist in executing complex fine manipulation tasks with high speed and precision. While excelling at high-level planning, existing VLM methods struggle to guide robots through precise sequences of fine motor actions. To address this limitation, we introduce a progressive VLM planning algorithm… ▽ More

    Submitted 6 March, 2025; originally announced March 2025.

  7. arXiv:2503.03743  [pdf, other

    cs.AI

    CHOP: Mobile Operating Assistant with Constrained High-frequency Optimized Subtask Planning

    Authors: Yuqi Zhou, Shuai Wang, Sunhao Dai, Qinglin Jia, Zhaocheng Du, Zhenhua Dong, Jun Xu

    Abstract: The advancement of visual language models (VLMs) has enhanced mobile device operations, allowing simulated human-like actions to address user requirements. Current VLM-based mobile operating assistants can be structured into three levels: task, subtask, and action. The subtask level, linking high-level goals with low-level executable actions, is crucial for task completion but faces two challenges… ▽ More

    Submitted 5 March, 2025; originally announced March 2025.

  8. Model Adaptation: Unsupervised Domain Adaptation without Source Data

    Authors: Rui Li, Qianfen Jiao, Wenming Cao, Hau-San Wong, Si Wu

    Abstract: In this paper, we investigate a challenging unsupervised domain adaptation setting -- unsupervised model adaptation. We aim to explore how to rely only on unlabeled target data to improve performance of an existing source prediction model on the target domain, since labeled source data may not be available in some real-world scenarios due to data privacy issues. For this purpose, we propose a new… ▽ More

    Submitted 26 February, 2025; originally announced February 2025.

    Comments: accepted by CVPR2020

    Journal ref: https://openaccess.thecvf.com/content_CVPR_2020/html/Li_Model_Adaptation_Unsupervised_Domain_Adaptation_Without_Source_Data_CVPR_2020_paper.html

  9. arXiv:2502.08903  [pdf, other

    cs.RO cs.AI

    3D-Grounded Vision-Language Framework for Robotic Task Planning: Automated Prompt Synthesis and Supervised Reasoning

    Authors: Guoqin Tang, Qingxuan Jia, Zeyuan Huang, Gang Chen, Ning Ji, Zhipeng Yao

    Abstract: Vision-language models (VLMs) have achieved remarkable success in scene understanding and perception tasks, enabling robots to plan and execute actions adaptively in dynamic environments. However, most multimodal large language models lack robust 3D scene localization capabilities, limiting their effectiveness in fine-grained robotic operations. Additionally, challenges such as low recognition acc… ▽ More

    Submitted 12 February, 2025; originally announced February 2025.

  10. arXiv:2502.08661  [pdf, other

    cs.CL cs.AI

    Few-shot LLM Synthetic Data with Distribution Matching

    Authors: Jiyuan Ren, Zhaocheng Du, Zhihao Wen, Qinglin Jia, Sunhao Dai, Chuhan Wu, Zhenhua Dong

    Abstract: As large language models (LLMs) advance, their ability to perform in-context learning and few-shot language generation has improved significantly. This has spurred using LLMs to produce high-quality synthetic data to enhance the performance of smaller models like online retrievers or weak LLMs. However, LLM-generated synthetic data often differs from the real data in key language attributes (e.g.,… ▽ More

    Submitted 14 February, 2025; v1 submitted 9 February, 2025; originally announced February 2025.

    Comments: 10 pages, 5 figures, accepted at www 2025

  11. arXiv:2502.00022  [pdf, other

    cs.AI cs.HC

    A Dynamic and High-Precision Method for Scenario-Based HRA Synthetic Data Collection in Multi-Agent Collaborative Environments Driven by LLMs

    Authors: Xingyu Xiao, Peng Chen, Qianqian Jia, Jiejuan Tong, Jingang Liang, Haitao Wang

    Abstract: HRA (Human Reliability Analysis) data is crucial for advancing HRA methodologies. however, existing data collection methods lack the necessary granularity, and most approaches fail to capture dynamic features. Additionally, many methods require expert knowledge as input, making them time-consuming and labor-intensive. To address these challenges, we propose a new paradigm for the automated collect… ▽ More

    Submitted 16 January, 2025; originally announced February 2025.

  12. arXiv:2501.15880  [pdf, ps, other

    cs.IT eess.SP

    Movable Antennas Meet Intelligent Reflecting Surface: Friends or Foes?

    Authors: Xin Wei, Weidong Mei, Qingqing Wu, Qiaoran Jia, Boyu Ning, Zhi Chen, Jun Fang

    Abstract: Movable antenna (MA) and intelligent reflecting surface (IRS) are considered promising technologies for the next-generation wireless communication systems due to their shared channel reconfiguration capabilities. This, however, raises a fundamental question: Does the performance gain of MAs over conventional fixed-position antennas (FPAs) still exist in the presence of the IRS? To answer this ques… ▽ More

    Submitted 27 January, 2025; originally announced January 2025.

  13. arXiv:2412.17754  [pdf, other

    cs.SE

    ADC: Enhancing Function Calling Via Adversarial Datasets and Code Line-Level Feedback

    Authors: Wei Zhang, Yi Zhang, Li Zhu, Qianghuai Jia, Feijun Jiang, Hongcheng Guo, Zhoujun Li, Mengping Zhou

    Abstract: Large Language Models (LLMs) have made significant strides in Natural Language Processing and coding, yet they struggle with robustness and accuracy in complex function calls. To tackle these challenges, this paper introduces ADC, an innovative approach that enhances LLMs' ability to follow function formats and match complex parameters. ADC utilizes a high-quality code fine-tuning dataset with lin… ▽ More

    Submitted 24 December, 2024; v1 submitted 23 December, 2024; originally announced December 2024.

  14. arXiv:2412.17574  [pdf, other

    cs.CV cs.AI

    HumanVBench: Exploring Human-Centric Video Understanding Capabilities of MLLMs with Synthetic Benchmark Data

    Authors: Ting Zhou, Daoyuan Chen, Qirui Jiao, Bolin Ding, Yaliang Li, Ying Shen

    Abstract: In the domain of Multimodal Large Language Models (MLLMs), achieving human-centric video understanding remains a formidable challenge. Existing benchmarks primarily emphasize object and action recognition, often neglecting the intricate nuances of human emotions, behaviors, and speech-visual alignment within video content. We present HumanVBench, an innovative benchmark meticulously crafted to bri… ▽ More

    Submitted 11 March, 2025; v1 submitted 23 December, 2024; originally announced December 2024.

    Comments: 22 pages, 23 figures, 7 tables

  15. arXiv:2412.17365  [pdf, other

    cs.CL cs.AI

    Boosting LLM via Learning from Data Iteratively and Selectively

    Authors: Qi Jia, Siyu Ren, Ziheng Qin, Fuzhao Xue, Jinjie Ni, Yang You

    Abstract: Datasets nowadays are generally constructed from multiple sources and using different synthetic techniques, making data de-noising and de-duplication crucial before being used for post-training. In this work, we propose to perform instruction tuning by iterative data selection (\ApproachName{}). We measure the quality of a sample from complexity and diversity simultaneously. Instead of calculating… ▽ More

    Submitted 23 December, 2024; originally announced December 2024.

  16. arXiv:2412.11512  [pdf, other

    cs.CV

    SpatialMe: Stereo Video Conversion Using Depth-Warping and Blend-Inpainting

    Authors: Jiale Zhang, Qianxi Jia, Yang Liu, Wei Zhang, Wei Wei, Xin Tian

    Abstract: Stereo video conversion aims to transform monocular videos into immersive stereo format. Despite the advancements in novel view synthesis, it still remains two major challenges: i) difficulty of achieving high-fidelity and stable results, and ii) insufficiency of high-quality stereo video data. In this paper, we introduce SpatialMe, a novel stereo video conversion framework based on depth-warping… ▽ More

    Submitted 16 December, 2024; originally announced December 2024.

  17. arXiv:2412.11068  [pdf, other

    cs.IR cs.AI

    RecSys Arena: Pair-wise Recommender System Evaluation with Large Language Models

    Authors: Zhuo Wu, Qinglin Jia, Chuhan Wu, Zhaocheng Du, Shuai Wang, Zan Wang, Zhenhua Dong

    Abstract: Evaluating the quality of recommender systems is critical for algorithm design and optimization. Most evaluation methods are computed based on offline metrics for quick algorithm evolution, since online experiments are usually risky and time-consuming. However, offline evaluation usually cannot fully reflect users' preference for the outcome of different recommendation algorithms, and the results… ▽ More

    Submitted 15 December, 2024; originally announced December 2024.

  18. arXiv:2411.01739  [pdf, other

    cs.CV

    Not Just Object, But State: Compositional Incremental Learning without Forgetting

    Authors: Yanyi Zhang, Binglin Qiu, Qi Jia, Yu Liu, Ran He

    Abstract: Most incremental learners excessively prioritize coarse classes of objects while neglecting various kinds of states (e.g. color and material) attached to the objects. As a result, they are limited in the ability to reason fine-grained compositionality of state-object pairs. To remedy this limitation, we propose a novel task called Compositional Incremental Learning (composition-IL), enabling the m… ▽ More

    Submitted 27 March, 2025; v1 submitted 3 November, 2024; originally announced November 2024.

    Comments: NeurIPS 2024

  19. arXiv:2410.15171  [pdf, other

    cs.AI

    Linguistic Fuzzy Information Evolution with Random Leader Election Mechanism for Decision-Making Systems

    Authors: Qianlei Jia, Witold Pedrycz

    Abstract: Linguistic fuzzy information evolution is crucial in understanding information exchange among agents. However, different agent weights may lead to different convergence results in the classic DeGroot model. Similarly, in the Hegselmann-Krause bounded confidence model (HK model), changing the confidence threshold values of agents can lead to differences in the final results. To address these limita… ▽ More

    Submitted 19 October, 2024; originally announced October 2024.

  20. arXiv:2410.01733  [pdf, other

    cs.CL

    Visual Perception in Text Strings

    Authors: Qi Jia, Xiang Yue, Shanshan Huang, Ziheng Qin, Yizhu Liu, Bill Yuchen Lin, Yang You

    Abstract: Understanding visual semantics embedded in consecutive characters is a crucial capability for both large language models (LLMs) and multi-modal large language models (MLLMs). This type of artifact possesses the unique characteristic that identical information can be readily formulated in both texts and images, making them a significant proxy for analyzing modern LLMs' and MLLMs' capabilities in mo… ▽ More

    Submitted 2 October, 2024; originally announced October 2024.

  21. arXiv:2409.07641  [pdf, ps, other

    cs.CL

    SimulBench: Evaluating Language Models with Creative Simulation Tasks

    Authors: Qi Jia, Xiang Yue, Tianyu Zheng, Jie Huang, Bill Yuchen Lin

    Abstract: We introduce SimulBench, a benchmark designed to evaluate large language models (LLMs) across a diverse collection of creative simulation scenarios, such as acting as a Linux terminal or playing text games with users. While these simulation tasks serve as effective measures of an LLM's general intelligence, they are seldom incorporated into existing benchmarks. A major challenge is to develop an e… ▽ More

    Submitted 11 September, 2024; originally announced September 2024.

    Comments: Website: https://simulbench.github.io/

  22. Learning Resilient Formation Control of Drones with Graph Attention Network

    Authors: Jiaping Xiao, Xu Fang, Qianlei Jia, Mir Feroskhan

    Abstract: The rapid advancement of drone technology has significantly impacted various sectors, including search and rescue, environmental surveillance, and industrial inspection. Multidrone systems offer notable advantages such as enhanced efficiency, scalability, and redundancy over single-drone operations. Despite these benefits, ensuring resilient formation control in dynamic and adversarial environment… ▽ More

    Submitted 3 September, 2024; originally announced September 2024.

    Comments: This work has been submitted to the IEEE for possible publication

    Journal ref: IEEE Internet of Things Journal, 2025

  23. arXiv:2408.12130  [pdf, other

    cs.AI

    S-EPOA: Overcoming the Indistinguishability of Segments with Skill-Driven Preference-Based Reinforcement Learning

    Authors: Ni Mu, Yao Luan, Yiqin Yang, Bo Xu, Qing-shan Jia

    Abstract: Preference-based reinforcement learning (PbRL) stands out by utilizing human preferences as a direct reward signal, eliminating the need for intricate reward engineering. However, despite its potential, traditional PbRL methods are often constrained by the indistinguishability of segments, which impedes the learning process. In this paper, we introduce Skill-Enhanced Preference Optimization Algori… ▽ More

    Submitted 13 May, 2025; v1 submitted 22 August, 2024; originally announced August 2024.

    Comments: IJCAI 2025

  24. arXiv:2408.10135  [pdf, other

    cs.CV

    $R^2$-Mesh: Reinforcement Learning Powered Mesh Reconstruction via Geometry and Appearance Refinement

    Authors: Haoyang Wang, Liming Liu, Quanlu Jia, Jiangkai Wu, Haodan Zhang, Peiheng Wang, Xinggong Zhang

    Abstract: Mesh reconstruction based on Neural Radiance Fields (NeRF) is popular in a variety of applications such as computer graphics, virtual reality, and medical imaging due to its efficiency in handling complex geometric structures and facilitating real-time rendering. However, existing works often fail to capture fine geometric details accurately and struggle with optimizing rendering quality. To addre… ▽ More

    Submitted 19 August, 2024; originally announced August 2024.

  25. arXiv:2408.09333  [pdf, other

    cs.CL

    SkyScript-100M: 1,000,000,000 Pairs of Scripts and Shooting Scripts for Short Drama

    Authors: Jing Tang, Quanlu Jia, Yuqiang Xie, Zeyu Gong, Xiang Wen, Jiayi Zhang, Yalong Guo, Guibin Chen, Jiangping Yang

    Abstract: Generating high-quality shooting scripts containing information such as scene and shot language is essential for short drama script generation. We collect 6,660 popular short drama episodes from the Internet, each with an average of 100 short episodes, and the total number of short episodes is about 80,000, with a total duration of about 2,000 hours and totaling 10 terabytes (TB). We perform keyfr… ▽ More

    Submitted 28 August, 2024; v1 submitted 17 August, 2024; originally announced August 2024.

    Comments: 18 pages, 12 figures

  26. Unseen No More: Unlocking the Potential of CLIP for Generative Zero-shot HOI Detection

    Authors: Yixin Guo, Yu Liu, Jianghao Li, Weimin Wang, Qi Jia

    Abstract: Zero-shot human-object interaction (HOI) detector is capable of generalizing to HOI categories even not encountered during training. Inspired by the impressive zero-shot capabilities offered by CLIP, latest methods strive to leverage CLIP embeddings for improving zero-shot HOI detection. However, these embedding-based methods train the classifier on seen classes only, inevitably resulting in seen-… ▽ More

    Submitted 12 August, 2024; originally announced August 2024.

    Comments: Accepted by ACM MM 2024

  27. arXiv:2408.04594  [pdf, other

    cs.CV cs.AI

    Img-Diff: Contrastive Data Synthesis for Multimodal Large Language Models

    Authors: Qirui Jiao, Daoyuan Chen, Yilun Huang, Bolin Ding, Yaliang Li, Ying Shen

    Abstract: High-performance Multimodal Large Language Models (MLLMs) are heavily dependent on data quality. To advance fine-grained image recognition within MLLMs, we introduce a novel data synthesis method inspired by contrastive learning and image difference captioning. Our key idea involves challenging the model to discern both matching and distinct elements by scrutinizing object differences in detailed… ▽ More

    Submitted 19 December, 2024; v1 submitted 8 August, 2024; originally announced August 2024.

    Comments: 22 pages, 10 figures, 16 tables

  28. arXiv:2407.06115  [pdf, other

    cs.CV cs.AI cs.CL

    Infer Induced Sentiment of Comment Response to Video: A New Task, Dataset and Baseline

    Authors: Qi Jia, Baoyu Fan, Cong Xu, Lu Liu, Liang Jin, Guoguang Du, Zhenhua Guo, Yaqian Zhao, Xuanjing Huang, Rengang Li

    Abstract: Existing video multi-modal sentiment analysis mainly focuses on the sentiment expression of people within the video, yet often neglects the induced sentiment of viewers while watching the videos. Induced sentiment of viewers is essential for inferring the public response to videos, has broad application in analyzing public societal sentiment, effectiveness of advertising and other areas. The micro… ▽ More

    Submitted 15 May, 2024; originally announced July 2024.

  29. arXiv:2405.13711  [pdf, other

    cs.LG cs.AI math.DS physics.ao-ph

    VAE-Var: Variational-Autoencoder-Enhanced Variational Assimilation

    Authors: Yi Xiao, Qilong Jia, Wei Xue, Lei Bai

    Abstract: Data assimilation refers to a set of algorithms designed to compute the optimal estimate of a system's state by refining the prior prediction (known as background states) using observed data. Variational assimilation methods rely on the maximum likelihood approach to formulate a variational cost, with the optimal state estimate derived by minimizing this cost. Although traditional variational meth… ▽ More

    Submitted 22 May, 2024; originally announced May 2024.

  30. arXiv:2405.12892  [pdf, other

    cs.IR cs.LG

    Retrievable Domain-Sensitive Feature Memory for Multi-Domain Recommendation

    Authors: Yuang Zhao, Zhaocheng Du, Qinglin Jia, Linxuan Zhang, Zhenhua Dong, Ruiming Tang

    Abstract: With the increase in the business scale and number of domains in online advertising, multi-domain ad recommendation has become a mainstream solution in the industry. The core of multi-domain recommendation is effectively modeling the commonalities and distinctions among domains. Existing works are dedicated to designing model architectures for implicit multi-domain modeling while overlooking an in… ▽ More

    Submitted 21 May, 2024; originally announced May 2024.

  31. arXiv:2405.06283  [pdf, other

    cs.CV

    Novel Class Discovery for Ultra-Fine-Grained Visual Categorization

    Authors: Yu Liu, Yaqi Cai, Qi Jia, Binglin Qiu, Weimin Wang, Nan Pu

    Abstract: Ultra-fine-grained visual categorization (Ultra-FGVC) aims at distinguishing highly similar sub-categories within fine-grained objects, such as different soybean cultivars. Compared to traditional fine-grained visual categorization, Ultra-FGVC encounters more hurdles due to the small inter-class and large intra-class variation. Given these challenges, relying on human annotation for Ultra-FGVC is… ▽ More

    Submitted 10 May, 2024; originally announced May 2024.

    Comments: 10 pages, 6 figures

  32. arXiv:2404.16484  [pdf, other

    cs.CV eess.IV

    Real-Time 4K Super-Resolution of Compressed AVIF Images. AIS 2024 Challenge Survey

    Authors: Marcos V. Conde, Zhijun Lei, Wen Li, Cosmin Stejerean, Ioannis Katsavounidis, Radu Timofte, Kihwan Yoon, Ganzorig Gankhuyag, Jiangtao Lv, Long Sun, Jinshan Pan, Jiangxin Dong, Jinhui Tang, Zhiyuan Li, Hao Wei, Chenyang Ge, Dongyang Zhang, Tianle Liu, Huaian Chen, Yi Jin, Menghan Zhou, Yiqiang Yan, Si Gao, Biao Wu, Shaoli Liu , et al. (50 additional authors not shown)

    Abstract: This paper introduces a novel benchmark as part of the AIS 2024 Real-Time Image Super-Resolution (RTSR) Challenge, which aims to upscale compressed images from 540p to 4K resolution (4x factor) in real-time on commercial GPUs. For this, we use a diverse test set containing a variety of 4K images ranging from digital art to gaming and photography. The images are compressed using the modern AVIF cod… ▽ More

    Submitted 25 April, 2024; originally announced April 2024.

    Comments: CVPR 2024, AI for Streaming (AIS) Workshop

  33. arXiv:2404.12611  [pdf, other

    cs.CV

    Rethinking Clothes Changing Person ReID: Conflicts, Synthesis, and Optimization

    Authors: Junjie Li, Guanshuo Wang, Fufu Yu, Yichao Yan, Qiong Jia, Shouhong Ding, Xingdong Sheng, Yunhui Liu, Xiaokang Yang

    Abstract: Clothes-changing person re-identification (CC-ReID) aims to retrieve images of the same person wearing different outfits. Mainstream researches focus on designing advanced model structures and strategies to capture identity information independent of clothing. However, the same-clothes discrimination as the standard ReID learning objective in CC-ReID is persistently ignored in previous researches.… ▽ More

    Submitted 18 April, 2024; originally announced April 2024.

  34. arXiv:2403.09671  [pdf, other

    cs.DC cs.AI

    CoRaiS: Lightweight Real-Time Scheduler for Multi-Edge Cooperative Computing

    Authors: Yujiao Hu, Qingmin Jia, Jinchao Chen, Yuan Yao, Yan Pan, Renchao Xie, F. Richard Yu

    Abstract: Multi-edge cooperative computing that combines constrained resources of multiple edges into a powerful resource pool has the potential to deliver great benefits, such as a tremendous computing power, improved response time, more diversified services. However, the mass heterogeneous resources composition and lack of scheduling strategies make the modeling and cooperating of multi-edge computing sys… ▽ More

    Submitted 20 May, 2024; v1 submitted 4 February, 2024; originally announced March 2024.

    Comments: Accepted by IEEE Internet of Things Journal

  35. arXiv:2403.05924  [pdf, other

    cs.CV

    CSCNET: Class-Specified Cascaded Network for Compositional Zero-Shot Learning

    Authors: Yanyi Zhang, Qi Jia, Xin Fan, Yu Liu, Ran He

    Abstract: Attribute and object (A-O) disentanglement is a fundamental and critical problem for Compositional Zero-shot Learning (CZSL), whose aim is to recognize novel A-O compositions based on foregone knowledge. Existing methods based on disentangled representation learning lose sight of the contextual dependency between the A-O primitive pairs. Inspired by this, we propose a novel A-O disentangled framew… ▽ More

    Submitted 13 March, 2024; v1 submitted 9 March, 2024; originally announced March 2024.

    Comments: ICASSP 2024

  36. arXiv:2402.17655  [pdf, other

    cs.LG

    Confidence-Aware Multi-Field Model Calibration

    Authors: Yuang Zhao, Chuhan Wu, Qinglin Jia, Hong Zhu, Jia Yan, Libin Zong, Linxuan Zhang, Zhenhua Dong, Muyu Zhang

    Abstract: Accurately predicting the probabilities of user feedback, such as clicks and conversions, is critical for advertisement ranking and bidding. However, there often exist unwanted mismatches between predicted probabilities and true likelihoods due to the rapid shift of data distributions and intrinsic model biases. Calibration aims to address this issue by post-processing model predictions, and field… ▽ More

    Submitted 21 May, 2024; v1 submitted 27 February, 2024; originally announced February 2024.

  37. arXiv:2402.16870  [pdf, other

    cs.NI

    Pioneering Deterministic Scheduling and Network Structure Optimization for Time-Critical Computing Tasks in Industrial IoT

    Authors: Yujiao Hu, Yining Zhu, Huayu Zhang, Yan Pan, Qingmin Jia, Renchao Xie, Gang Yang, F. Richard Yu

    Abstract: The Industrial Internet of Things (IIoT) has become a critical technology to accelerate the process of digital and intelligent transformation of industries. As the cooperative relationship between smart devices in IIoT becomes more complex, getting deterministic responses of IIoT periodic time-critical computing tasks becomes a crucial and nontrivial problem. However, few current works in cloud/ed… ▽ More

    Submitted 23 January, 2024; originally announced February 2024.

    Comments: Under Review

  38. arXiv:2402.02381  [pdf, other

    cs.NI cs.AI

    Empowering Computing and Networks Convergence System with Distributed Cooperative Routing

    Authors: Yujiao Hu, Qingmin Jia, Meng Shen, Renchao Xie, Tao Huang, F. Richard Yu

    Abstract: The emergence of intelligent applications and recent advances in the fields of computing and networks are driving the development of computing and networks convergence (CNC) system. However, existing researches failed to achieve comprehensive scheduling optimization of computing and network resources. This shortfall results in some requirements of computing requests unable to be guaranteed in an e… ▽ More

    Submitted 4 February, 2024; originally announced February 2024.

    Comments: Submit to IEEE Network

  39. arXiv:2401.17981  [pdf, other

    cs.CV cs.AI

    From Training-Free to Adaptive: Empirical Insights into MLLMs' Understanding of Detection Information

    Authors: Qirui Jiao, Daoyuan Chen, Yilun Huang, Yaliang Li, Ying Shen

    Abstract: Despite the impressive capabilities of Multimodal Large Language Models (MLLMs) in integrating text and image modalities, challenges remain in accurately interpreting detailed visual elements. Vision detection models excel at recognizing fine-grained image details, prompting researchers to use them to enhance MLLMs. One effective strategy is to infuse detection information in text format, which ha… ▽ More

    Submitted 19 December, 2024; v1 submitted 31 January, 2024; originally announced January 2024.

    Comments: 32 pages, 22 tables, 7 figures

  40. arXiv:2401.17812  [pdf, other

    cs.NI cs.AI

    Deterministic Computing Power Networking: Architecture, Technologies and Prospects

    Authors: Qingmin Jia, Yujiao Hu, Xiaomao Zhou, Qianpiao Ma, Kai Guo, Huayu Zhang, Renchao Xie, Tao Huang, Yunjie Liu

    Abstract: With the development of new Internet services such as computation-intensive and delay-sensitive tasks, the traditional "Best Effort" network transmission mode has been greatly challenged. The network system is urgently required to provide end-to-end transmission determinacy and computing determinacy for new applications to ensure the safe and efficient operation of services. Based on the research… ▽ More

    Submitted 31 January, 2024; originally announced January 2024.

  41. arXiv:2401.17268  [pdf, other

    cs.CL cs.AI cs.LG

    Weaver: Foundation Models for Creative Writing

    Authors: Tiannan Wang, Jiamin Chen, Qingrui Jia, Shuai Wang, Ruoyu Fang, Huilin Wang, Zhaowei Gao, Chunzhao Xie, Chuou Xu, Jihong Dai, Yibin Liu, Jialong Wu, Shengwei Ding, Long Li, Zhiwei Huang, Xinle Deng, Teng Yu, Gangan Ma, Han Xiao, Zixin Chen, Danjun Xiang, Yunxia Wang, Yuanyuan Zhu, Yi Xiao, Jing Wang , et al. (21 additional authors not shown)

    Abstract: This work introduces Weaver, our first family of large language models (LLMs) dedicated to content creation. Weaver is pre-trained on a carefully selected corpus that focuses on improving the writing capabilities of large language models. We then fine-tune Weaver for creative and professional writing purposes and align it to the preference of professional writers using a suit of novel methods for… ▽ More

    Submitted 30 January, 2024; originally announced January 2024.

  42. arXiv:2401.06477  [pdf, other

    cs.CL cs.AI

    Kun: Answer Polishment for Chinese Self-Alignment with Instruction Back-Translation

    Authors: Tianyu Zheng, Shuyue Guo, Xingwei Qu, Jiawei Guo, Xinrun Du, Qi Jia, Chenghua Lin, Wenhao Huang, Jie Fu, Ge Zhang

    Abstract: In this paper, we introduce Kun, a novel approach for creating high-quality instruction-tuning datasets for large language models (LLMs) without relying on manual annotations. Adapting a self-training algorithm based on instruction back-translation and answer polishment, Kun leverages unlabelled data from diverse sources such as Wudao, Wanjuan, and SkyPile to generate a substantial dataset of over… ▽ More

    Submitted 5 November, 2024; v1 submitted 12 January, 2024; originally announced January 2024.

    Comments: 12 pages, 12 figures

  43. Industrial Internet of Things Intelligence Empowering Smart Manufacturing: A Literature Review

    Authors: Yujiao Hu, Qingmin Jia, Yuao Yao, Yong Lee, Mengjie Lee, Chenyi Wang, Xiaomao Zhou, Renchao Xie, F. Richard Yu

    Abstract: The fiercely competitive business environment and increasingly personalized customization needs are driving the digital transformation and upgrading of the manufacturing industry. IIoT intelligence, which can provide innovative and efficient solutions for various aspects of the manufacturing value chain, illuminates the path of transformation for the manufacturing industry. It's time to provide a… ▽ More

    Submitted 21 February, 2024; v1 submitted 2 December, 2023; originally announced December 2023.

    Comments: Accepted by IoTJ

    Journal ref: IEEE Internet of Things Journal,2024

  44. arXiv:2310.13349  [pdf, other

    stat.ML cs.CV cs.LG

    DeepFDR: A Deep Learning-based False Discovery Rate Control Method for Neuroimaging Data

    Authors: Taehyo Kim, Hai Shu, Qiran Jia, Mony J. de Leon

    Abstract: Voxel-based multiple testing is widely used in neuroimaging data analysis. Traditional false discovery rate (FDR) control methods often ignore the spatial dependence among the voxel-based tests and thus suffer from substantial loss of testing power. While recent spatial FDR control methods have emerged, their validity and optimality remain questionable when handling the complex spatial dependencie… ▽ More

    Submitted 10 March, 2024; v1 submitted 20 October, 2023; originally announced October 2023.

    Journal ref: Proceedings of The 27th International Conference on Artificial Intelligence and Statistics (AISTATS 2024), PMLR 238:946-954, 2024

  45. arXiv:2310.11648  [pdf, other

    cs.CL

    Zero-shot Faithfulness Evaluation for Text Summarization with Foundation Language Model

    Authors: Qi Jia, Siyu Ren, Yizhu Liu, Kenny Q. Zhu

    Abstract: Despite tremendous improvements in natural language generation, summarization models still suffer from the unfaithfulness issue. Previous work evaluates faithfulness either using models trained on the other tasks or in-domain synthetic data, or prompting a large model such as ChatGPT. This paper proposes to do zero-shot faithfulness evaluation simply with a moderately-sized foundation language mod… ▽ More

    Submitted 14 December, 2023; v1 submitted 17 October, 2023; originally announced October 2023.

    Comments: Accepted by EMNLP2023

  46. arXiv:2310.10068  [pdf, other

    cs.CV

    Generalizable Person Search on Open-world User-Generated Video Content

    Authors: Junjie Li, Guanshuo Wang, Yichao Yan, Fufu Yu, Qiong Jia, Jie Qin, Shouhong Ding, Xiaokang Yang

    Abstract: Person search is a challenging task that involves detecting and retrieving individuals from a large set of un-cropped scene images. Existing person search applications are mostly trained and deployed in the same-origin scenarios. However, collecting and annotating training samples for each scene is often difficult due to the limitation of resources and the labor cost. Moreover, large-scale intra-d… ▽ More

    Submitted 16 October, 2023; originally announced October 2023.

  47. arXiv:2310.08152  [pdf, other

    cs.CL

    Context Compression for Auto-regressive Transformers with Sentinel Tokens

    Authors: Siyu Ren, Qi Jia, Kenny Q. Zhu

    Abstract: The quadratic complexity of the attention module makes it gradually become the bulk of compute in Transformer-based LLMs during generation. Moreover, the excessive key-value cache that arises when dealing with long inputs also brings severe issues on memory footprint and inference latency. In this work, we propose a plug-and-play approach that is able to incrementally compress the intermediate act… ▽ More

    Submitted 15 October, 2023; v1 submitted 12 October, 2023; originally announced October 2023.

    Comments: To appear at EMNLP 2023 main conference

  48. SoccerNet 2023 Challenges Results

    Authors: Anthony Cioppa, Silvio Giancola, Vladimir Somers, Floriane Magera, Xin Zhou, Hassan Mkhallati, Adrien Deliège, Jan Held, Carlos Hinojosa, Amir M. Mansourian, Pierre Miralles, Olivier Barnich, Christophe De Vleeschouwer, Alexandre Alahi, Bernard Ghanem, Marc Van Droogenbroeck, Abdullah Kamal, Adrien Maglo, Albert Clapés, Amr Abdelaziz, Artur Xarles, Astrid Orcesi, Atom Scott, Bin Liu, Byoungkwon Lim , et al. (77 additional authors not shown)

    Abstract: The SoccerNet 2023 challenges were the third annual video understanding challenges organized by the SoccerNet team. For this third edition, the challenges were composed of seven vision-based tasks split into three main themes. The first theme, broadcast video understanding, is composed of three high-level tasks related to describing events occurring in the video broadcasts: (1) action spotting, fo… ▽ More

    Submitted 12 September, 2023; originally announced September 2023.

  49. Unbiased Delayed Feedback Label Correction for Conversion Rate Prediction

    Authors: Yifan Wang, Peijie Sun, Min Zhang, Qinglin Jia, Jingjie Li, Shaoping Ma

    Abstract: Conversion rate prediction is critical to many online applications such as digital display advertising. To capture dynamic data distribution, industrial systems often require retraining models on recent data daily or weekly. However, the delay of conversion behavior usually leads to incorrect labeling, which is called delayed feedback problem. Existing work may fail to introduce the correct inform… ▽ More

    Submitted 24 July, 2023; originally announced July 2023.

    Comments: accepted by KDD 2023

  50. arXiv:2307.03034  [pdf, other

    stat.ML cs.LG

    PCL-Indexability and Whittle Index for Restless Bandits with General Observation Models

    Authors: Keqin Liu, Qizhen Jia, Chengzhong Zhang

    Abstract: In this paper, we consider a general observation model for restless multi-armed bandit problems. The operation of the player needs to be based on certain feedback mechanism that is error-prone due to resource constraints or environmental or intrinsic noises. By establishing a general probabilistic model for dynamics of feedback/observation, we formulate the problem as a restless bandit with a coun… ▽ More

    Submitted 22 April, 2025; v1 submitted 6 July, 2023; originally announced July 2023.