Skip to main content

Showing 1–50 of 147 results for author: Ge, J

Searching in archive cs. Search in all archives.
.
  1. arXiv:2504.19261  [pdf, other

    cs.CV

    Rendering Anywhere You See: Renderability Field-guided Gaussian Splatting

    Authors: Xiaofeng Jin, Yan Fang, Matteo Frosi, Jianfei Ge, Jiangjian Xiao, Matteo Matteucci

    Abstract: Scene view synthesis, which generates novel views from limited perspectives, is increasingly vital for applications like virtual reality, augmented reality, and robotics. Unlike object-based tasks, such as generating 360° views of a car, scene view synthesis handles entire environments where non-uniform observations pose unique challenges for stable rendering quality. To address this issue, we pro… ▽ More

    Submitted 27 April, 2025; originally announced April 2025.

    Comments: 8 pages,8 figures

    MSC Class: 65D18; 68U05 ACM Class: I.3.7; I.4.8

  2. arXiv:2504.18765  [pdf, other

    cs.AI

    A Vision for Auto Research with LLM Agents

    Authors: Chengwei Liu, Chong Wang, Jiayue Cao, Jingquan Ge, Kun Wang, Lvye Zhang, Ming-Ming Cheng, Penghai Zhao, Tianlin Li, Xiaojun Jia, Xiang Li, Xinfeng Li, Yang Liu, Yebo Feng, Yihao Huang, Yijia Xu, Yuqiang Sun, Zhenhong Zhou, Zhengzi Xu

    Abstract: This paper introduces Agent-Based Auto Research, a structured multi-agent framework designed to automate, coordinate, and optimize the full lifecycle of scientific research. Leveraging the capabilities of large language models (LLMs) and modular agent collaboration, the system spans all major research phases, including literature review, ideation, methodology planning, experimentation, paper writi… ▽ More

    Submitted 25 April, 2025; originally announced April 2025.

  3. arXiv:2504.15327  [pdf, other

    cs.RO cs.LG

    Advancing Embodied Intelligence in Robotic-Assisted Endovascular Procedures: A Systematic Review of AI Solutions

    Authors: Tianliang Yao, Bo Lu, Markus Kowarschik, Yixuan Yuan, Hubin Zhao, Sebastien Ourselin, Kaspar Althoefer, Junbo Ge, Peng Qi

    Abstract: Endovascular procedures have revolutionized the treatment of vascular diseases thanks to minimally invasive solutions that significantly reduce patient recovery time and enhance clinical outcomes. However, the precision and dexterity required during these procedures poses considerable challenges for interventionists. Robotic systems have emerged offering transformative solutions, addressing issues… ▽ More

    Submitted 23 April, 2025; v1 submitted 21 April, 2025; originally announced April 2025.

    Comments: 41 pages, 7 figures

  4. arXiv:2504.13169  [pdf, other

    cs.CV

    Generate, but Verify: Reducing Hallucination in Vision-Language Models with Retrospective Resampling

    Authors: Tsung-Han Wu, Heekyung Lee, Jiaxin Ge, Joseph E. Gonzalez, Trevor Darrell, David M. Chan

    Abstract: Vision-Language Models (VLMs) excel at visual understanding but often suffer from visual hallucinations, where they generate descriptions of nonexistent objects, actions, or concepts, posing significant risks in safety-critical applications. Existing hallucination mitigation methods typically follow one of two paradigms: generation adjustment, which modifies decoding behavior to align text with vi… ▽ More

    Submitted 17 April, 2025; originally announced April 2025.

    Comments: Preprint. Project Page: https://reverse-vlm.github.io

  5. arXiv:2504.10479  [pdf, other

    cs.CV

    InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models

    Authors: Jinguo Zhu, Weiyun Wang, Zhe Chen, Zhaoyang Liu, Shenglong Ye, Lixin Gu, Hao Tian, Yuchen Duan, Weijie Su, Jie Shao, Zhangwei Gao, Erfei Cui, Xuehui Wang, Yue Cao, Yangzhou Liu, Xingguang Wei, Hongjie Zhang, Haomin Wang, Weiye Xu, Hao Li, Jiahao Wang, Nianchen Deng, Songze Li, Yinan He, Tan Jiang , et al. (26 additional authors not shown)

    Abstract: We introduce InternVL3, a significant advancement in the InternVL series featuring a native multimodal pre-training paradigm. Rather than adapting a text-only large language model (LLM) into a multimodal large language model (MLLM) that supports visual inputs, InternVL3 jointly acquires multimodal and linguistic capabilities from both diverse multimodal data and pure-text corpora during a single p… ▽ More

    Submitted 18 April, 2025; v1 submitted 14 April, 2025; originally announced April 2025.

    Comments: Technical Report

  6. arXiv:2504.04419  [pdf, other

    cs.RO cs.AI

    Driving-RAG: Driving Scenarios Embedding, Search, and RAG Applications

    Authors: Cheng Chang, Jingwei Ge, Jiazhe Guo, Zelin Guo, Binghong Jiang, Li Li

    Abstract: Driving scenario data play an increasingly vital role in the development of intelligent vehicles and autonomous driving. Accurate and efficient scenario data search is critical for both online vehicle decision-making and planning, and offline scenario generation and simulations, as it allows for leveraging the scenario experiences to improve the overall performance. Especially with the application… ▽ More

    Submitted 6 April, 2025; originally announced April 2025.

  7. arXiv:2503.18108  [pdf, other

    cs.RO cs.CV

    Unraveling the Effects of Synthetic Data on End-to-End Autonomous Driving

    Authors: Junhao Ge, Zuhong Liu, Longteng Fan, Yifan Jiang, Jiaqi Su, Yiming Li, Zhejun Zhang, Siheng Chen

    Abstract: End-to-end (E2E) autonomous driving (AD) models require diverse, high-quality data to perform well across various driving scenarios. However, collecting large-scale real-world data is expensive and time-consuming, making high-fidelity synthetic data essential for enhancing data diversity and model robustness. Existing driving simulators for synthetic data generation have significant limitations: g… ▽ More

    Submitted 23 March, 2025; originally announced March 2025.

  8. arXiv:2503.16545  [pdf, other

    cs.CY cs.CL

    EmpathyAgent: Can Embodied Agents Conduct Empathetic Actions?

    Authors: Xinyan Chen, Jiaxin Ge, Hongming Dai, Qiang Zhou, Qiuxuan Feng, Jingtong Hu, Yizhou Wang, Jiaming Liu, Shanghang Zhang

    Abstract: Empathy is fundamental to human interactions, yet it remains unclear whether embodied agents can provide human-like empathetic support. Existing works have studied agents' tasks solving and social interactions abilities, but whether agents can understand empathetic needs and conduct empathetic behaviors remains overlooked. To address this, we introduce EmpathyAgent, the first benchmark to evaluate… ▽ More

    Submitted 19 March, 2025; originally announced March 2025.

  9. arXiv:2503.13983  [pdf, other

    cs.CV

    SpaceVLLM: Endowing Multimodal Large Language Model with Spatio-Temporal Video Grounding Capability

    Authors: Jiankang Wang, Zhihan Zhang, Zhihang Liu, Yang Li, Jiannan Ge, Hongtao Xie, Yongdong Zhang

    Abstract: Multimodal large language models (MLLMs) have made remarkable progress in either temporal or spatial localization. However, they struggle to perform spatio-temporal video grounding. This limitation stems from two major challenges. Firstly, it is difficult to extract accurate spatio-temporal information of each frame in the video. Secondly, the substantial number of visual tokens makes it challengi… ▽ More

    Submitted 11 April, 2025; v1 submitted 18 March, 2025; originally announced March 2025.

  10. arXiv:2503.12698  [pdf, other

    eess.IV cs.CV

    A Continual Learning-driven Model for Accurate and Generalizable Segmentation of Clinically Comprehensive and Fine-grained Whole-body Anatomies in CT

    Authors: Dazhou Guo, Zhanghexuan Ji, Yanzhou Su, Dandan Zheng, Heng Guo, Puyang Wang, Ke Yan, Yirui Wang, Qinji Yu, Zi Li, Minfeng Xu, Jianfeng Zhang, Haoshen Li, Jia Ge, Tsung-Ying Ho, Bing-Shen Huang, Tashan Ai, Kuaile Zhao, Na Shen, Qifeng Wang, Yun Bian, Tingyu Wu, Peng Du, Hua Zhang, Feng-Ming Kong , et al. (9 additional authors not shown)

    Abstract: Precision medicine in the quantitative management of chronic diseases and oncology would be greatly improved if the Computed Tomography (CT) scan of any patient could be segmented, parsed and analyzed in a precise and detailed way. However, there is no such fully annotated CT dataset with all anatomies delineated for training because of the exceptionally high manual cost, the need for specialized… ▽ More

    Submitted 16 March, 2025; originally announced March 2025.

  11. arXiv:2503.10410  [pdf, other

    cs.CV

    RoCo-Sim: Enhancing Roadside Collaborative Perception through Foreground Simulation

    Authors: Yuwen Du, Anning Hu, Zichen Chao, Yifan Lu, Junhao Ge, Genjia Liu, Weitao Wu, Lanjun Wang, Siheng Chen

    Abstract: Roadside Collaborative Perception refers to a system where multiple roadside units collaborate to pool their perceptual data, assisting vehicles in enhancing their environmental awareness. Existing roadside perception methods concentrate on model design but overlook data issues like calibration errors, sparse information, and multi-view consistency, leading to poor performance on recent published… ▽ More

    Submitted 13 March, 2025; originally announced March 2025.

  12. arXiv:2503.04826  [pdf, other

    eess.IV cs.CV

    Rethinking Few-Shot Medical Image Segmentation by SAM2: A Training-Free Framework with Augmentative Prompting and Dynamic Matching

    Authors: Haiyue Zu, Jun Ge, Heting Xiao, Jile Xie, Zhangzhe Zhou, Yifan Meng, Jiayi Ni, Junjie Niu, Linlin Zhang, Li Ni, Huilin Yang

    Abstract: The reliance on large labeled datasets presents a significant challenge in medical image segmentation. Few-shot learning offers a potential solution, but existing methods often still require substantial training data. This paper proposes a novel approach that leverages the Segment Anything Model 2 (SAM2), a vision foundation model with strong video segmentation capabilities. We conceptualize 3D me… ▽ More

    Submitted 5 March, 2025; originally announced March 2025.

  13. arXiv:2503.04722  [pdf, other

    cs.CL cs.AI cs.LG

    Enough Coin Flips Can Make LLMs Act Bayesian

    Authors: Ritwik Gupta, Rodolfo Corona, Jiaxin Ge, Eric Wang, Dan Klein, Trevor Darrell, David M. Chan

    Abstract: Large language models (LLMs) exhibit the ability to generalize given few-shot examples in their input prompt, an emergent capability known as in-context learning (ICL). We investigate whether LLMs utilize ICL to perform structured reasoning in ways that are consistent with a Bayesian framework or rely on pattern matching. Using a controlled setting of biased coin flips, we find that: (1) LLMs ofte… ▽ More

    Submitted 6 March, 2025; originally announced March 2025.

  14. arXiv:2503.02265  [pdf, other

    cs.RO

    Towards Fluorescence-Guided Autonomous Robotic Partial Nephrectomy on Novel Tissue-Mimicking Hydrogel Phantoms

    Authors: Ethan Kilmer, Joseph Chen, Jiawei Ge, Preksha Sarda, Richard Cha, Kevin Cleary, Lauren Shepard, Ahmed Ezzat Ghazi, Paul Maria Scheikl, Axel Krieger

    Abstract: Autonomous robotic systems hold potential for improving renal tumor resection accuracy and patient outcomes. We present a fluorescence-guided robotic system capable of planning and executing incision paths around exophytic renal tumors with a clinically relevant resection margin. Leveraging point cloud observations, the system handles irregular tumor shapes and distinguishes healthy from tumorous… ▽ More

    Submitted 3 March, 2025; originally announced March 2025.

    Comments: 8 pages. 7 figures. Preprint of an article accepted for publication in the Journal of Medical Robotics Research, 2025. Copyright World Scientific Publishing Company [https://worldscientific.com/worldscinet/jmrr]

  15. arXiv:2502.18586  [pdf, other

    cs.RO cs.AI cs.CV

    Autonomous Vision-Guided Resection of Central Airway Obstruction

    Authors: M. E. Smith, N. Yilmaz, T. Watts, P. M. Scheikl, J. Ge, A. Deguet, A. Kuntz, A. Krieger

    Abstract: Existing tracheal tumor resection methods often lack the precision required for effective airway clearance, and robotic advancements offer new potential for autonomous resection. We present a vision-guided, autonomous approach for palliative resection of tracheal tumors. This system models the tracheal surface with a fifth-degree polynomial to plan tool trajectories, while a custom Faster R-CNN se… ▽ More

    Submitted 25 February, 2025; originally announced February 2025.

    Comments: Submitted to World Scientific, Journal of Medical Robotics Research (JMRR) 2025. 10 pages, 11 figures

  16. arXiv:2502.13663  [pdf, ps, other

    cs.IT eess.SP

    User Association and Coordinated Beamforming in Cognitive Aerial-Terrestrial Networks: A Safe Reinforcement Learning Approach

    Authors: Zizhen Zhou, Jungang Ge, Ying-Chang Liang

    Abstract: Cognitive aerial-terrestrial networks (CATNs) offer a solution to spectrum scarcity by sharing spectrum between aerial and terrestrial networks. However, aerial users (AUs) experience significant interference from numerous terrestrial base stations (BSs). To alleviate such interference, we investigate a user association and coordinated beamforming (CBF) problem in CATN, where the aerial network se… ▽ More

    Submitted 19 February, 2025; originally announced February 2025.

    Comments: This work has been submitted to the IEEE for possible publication

  17. arXiv:2502.06453  [pdf, other

    cs.LG cs.AI cs.CL

    MATH-Perturb: Benchmarking LLMs' Math Reasoning Abilities against Hard Perturbations

    Authors: Kaixuan Huang, Jiacheng Guo, Zihao Li, Xiang Ji, Jiawei Ge, Wenzhe Li, Yingqing Guo, Tianle Cai, Hui Yuan, Runzhe Wang, Yue Wu, Ming Yin, Shange Tang, Yangsibo Huang, Chi Jin, Xinyun Chen, Chiyuan Zhang, Mengdi Wang

    Abstract: Large language models have demonstrated impressive performance on challenging mathematical reasoning tasks, which has triggered the discussion of whether the performance is achieved by true reasoning capability or memorization. To investigate this question, prior work has constructed mathematical benchmarks when questions undergo simple perturbations -- modifications that still preserve the underl… ▽ More

    Submitted 12 February, 2025; v1 submitted 10 February, 2025; originally announced February 2025.

    Comments: v2: fix bugs in Fig. 1

  18. arXiv:2501.14249  [pdf, other

    cs.LG cs.AI cs.CL

    Humanity's Last Exam

    Authors: Long Phan, Alice Gatti, Ziwen Han, Nathaniel Li, Josephina Hu, Hugh Zhang, Chen Bo Calvin Zhang, Mohamed Shaaban, John Ling, Sean Shi, Michael Choi, Anish Agrawal, Arnav Chopra, Adam Khoja, Ryan Kim, Richard Ren, Jason Hausenloy, Oliver Zhang, Mantas Mazeika, Dmitry Dodonov, Tung Nguyen, Jaeho Lee, Daron Anderson, Mikhail Doroshenko, Alun Cennyth Stokes , et al. (1084 additional authors not shown)

    Abstract: Benchmarks are important tools for tracking the rapid advancements in large language model (LLM) capabilities. However, benchmarks are not keeping pace in difficulty: LLMs now achieve over 90\% accuracy on popular benchmarks like MMLU, limiting informed measurement of state-of-the-art LLM capabilities. In response, we introduce Humanity's Last Exam (HLE), a multi-modal benchmark at the frontier of… ▽ More

    Submitted 19 April, 2025; v1 submitted 24 January, 2025; originally announced January 2025.

    Comments: 29 pages, 6 figures

  19. arXiv:2501.13411  [pdf, other

    cs.SE

    VulnBot: Autonomous Penetration Testing for A Multi-Agent Collaborative Framework

    Authors: He Kong, Die Hu, Jingguo Ge, Liangxiong Li, Tong Li, Bingzhen Wu

    Abstract: Penetration testing is a vital practice for identifying and mitigating vulnerabilities in cybersecurity systems, but its manual execution is labor-intensive and time-consuming. Existing large language model (LLM)-assisted or automated penetration testing approaches often suffer from inefficiencies, such as a lack of contextual understanding and excessive, unstructured data generation. This paper p… ▽ More

    Submitted 23 January, 2025; originally announced January 2025.

  20. arXiv:2501.09980  [pdf

    cs.CV cs.AI cs.LG

    Aneumo: A Large-Scale Comprehensive Synthetic Dataset of Aneurysm Hemodynamics

    Authors: Xigui Li, Yuanye Zhou, Feiyang Xiao, Xin Guo, Yichi Zhang, Chen Jiang, Jianchao Ge, Xiansheng Wang, Qimeng Wang, Taiwei Zhang, Chensen Lin, Yuan Cheng, Yuan Qi

    Abstract: Intracranial aneurysm (IA) is a common cerebrovascular disease that is usually asymptomatic but may cause severe subarachnoid hemorrhage (SAH) if ruptured. Although clinical practice is usually based on individual factors and morphological features of the aneurysm, its pathophysiology and hemodynamic mechanisms remain controversial. To address the limitations of current research, this study constr… ▽ More

    Submitted 17 January, 2025; originally announced January 2025.

  21. arXiv:2501.07783  [pdf, other

    cs.CV cs.CL

    Parameter-Inverted Image Pyramid Networks for Visual Perception and Multimodal Understanding

    Authors: Zhaokai Wang, Xizhou Zhu, Xue Yang, Gen Luo, Hao Li, Changyao Tian, Wenhan Dou, Junqi Ge, Lewei Lu, Yu Qiao, Jifeng Dai

    Abstract: Image pyramids are widely adopted in top-performing methods to obtain multi-scale features for precise visual perception and understanding. However, current image pyramids use the same large-scale model to process multiple resolutions of images, leading to significant computational cost. To address this challenge, we propose a novel network architecture, called Parameter-Inverted Image Pyramid Net… ▽ More

    Submitted 13 January, 2025; originally announced January 2025.

  22. arXiv:2501.06719  [pdf, other

    cs.RO eess.SY

    Hierarchical Sampling-based Planner with LTL Constraints and Text Prompting

    Authors: Jingzhan Ge, Zi-Hao Zhang, Sheng-En Huang

    Abstract: This project introduces a hierarchical planner integrating Linear Temporal Logic (LTL) constraints with natural language prompting for robot motion planning. The framework decomposes maps into regions, generates directed graphs, and converts them into transition systems for high-level planning. Text instructions are translated into LTL formulas and converted to Deterministic Finite Automata (DFA)… ▽ More

    Submitted 12 January, 2025; originally announced January 2025.

    Comments: 8 pages, 17 figures

  23. arXiv:2501.00912  [pdf, other

    cs.CV cs.CL

    AutoPresent: Designing Structured Visuals from Scratch

    Authors: Jiaxin Ge, Zora Zhiruo Wang, Xuhui Zhou, Yi-Hao Peng, Sanjay Subramanian, Qinyue Tan, Maarten Sap, Alane Suhr, Daniel Fried, Graham Neubig, Trevor Darrell

    Abstract: Designing structured visuals such as presentation slides is essential for communicative needs, necessitating both content creation and visual planning skills. In this work, we tackle the challenge of automated slide generation, where models produce slide presentations from natural language (NL) instructions. We first introduce the SlidesBench benchmark, the first benchmark for slide generation wit… ▽ More

    Submitted 1 January, 2025; originally announced January 2025.

  24. arXiv:2412.09799  [pdf, other

    cs.CV cs.AI

    CP-DETR: Concept Prompt Guide DETR Toward Stronger Universal Object Detection

    Authors: Qibo Chen, Weizhong Jin, Jianyue Ge, Mengdi Liu, Yuchao Yan, Jian Jiang, Li Yu, Xuanjiang Guo, Shuchang Li, Jianzhong Chen

    Abstract: Recent research on universal object detection aims to introduce language in a SoTA closed-set detector and then generalize the open-set concepts by constructing large-scale (text-region) datasets for training. However, these methods face two main challenges: (i) how to efficiently use the prior information in the prompts to genericise objects and (ii) how to reduce alignment bias in the downstream… ▽ More

    Submitted 12 December, 2024; originally announced December 2024.

    Comments: Accepted by AAAI2025

  25. arXiv:2412.09616  [pdf, other

    cs.CV

    V2PE: Improving Multimodal Long-Context Capability of Vision-Language Models with Variable Visual Position Encoding

    Authors: Junqi Ge, Ziyi Chen, Jintao Lin, Jinguo Zhu, Xihui Liu, Jifeng Dai, Xizhou Zhu

    Abstract: Vision-Language Models (VLMs) have shown promising capabilities in handling various multimodal tasks, yet they struggle in long-context scenarios, particularly in tasks involving videos, high-resolution images, or lengthy image-text documents. In our work, we first conduct an empirical analysis of the long-context capabilities of VLMs using our augmented long-context multimodal datasets. Our findi… ▽ More

    Submitted 12 December, 2024; v1 submitted 12 December, 2024; originally announced December 2024.

    Comments: The code and models will be available at https://github.com/OpenGVLab/V2PE

  26. arXiv:2412.09596  [pdf, other

    cs.CV cs.AI cs.CL

    InternLM-XComposer2.5-OmniLive: A Comprehensive Multimodal System for Long-term Streaming Video and Audio Interactions

    Authors: Pan Zhang, Xiaoyi Dong, Yuhang Cao, Yuhang Zang, Rui Qian, Xilin Wei, Lin Chen, Yifei Li, Junbo Niu, Shuangrui Ding, Qipeng Guo, Haodong Duan, Xin Chen, Han Lv, Zheng Nie, Min Zhang, Bin Wang, Wenwei Zhang, Xinyue Zhang, Jiaye Ge, Wei Li, Jingwen Li, Zhongying Tu, Conghui He, Xingcheng Zhang , et al. (4 additional authors not shown)

    Abstract: Creating AI systems that can interact with environments over long periods, similar to human cognition, has been a longstanding research goal. Recent advancements in multimodal large language models (MLLMs) have made significant strides in open-world understanding. However, the challenge of continuous and simultaneous streaming perception, memory, and reasoning remains largely unexplored. Current M… ▽ More

    Submitted 12 December, 2024; originally announced December 2024.

    Comments: Github Repo: https://github.com/InternLM/InternLM-XComposer/tree/main/InternLM-XComposer-2.5-OmniLive

  27. arXiv:2412.07924  [pdf, other

    cs.CY

    A large language model-based approach to quantifying the effects of social determinants in liver transplant decisions

    Authors: Emily Robitschek, Asal Bastani, Kathryn Horwath, Savyon Sordean, Mark J. Pletcher, Jennifer C. Lai, Sergio Galletta, Elliott Ash, Jin Ge, Irene Y. Chen

    Abstract: Patient life circumstances, including social determinants of health (SDOH), shape both health outcomes and care access, contributing to persistent disparities across gender, race, and socioeconomic status. Liver transplantation exemplifies these challenges, requiring complex eligibility and allocation decisions where SDOH directly influence patient evaluation. We developed an artificial intelligen… ▽ More

    Submitted 9 January, 2025; v1 submitted 10 December, 2024; originally announced December 2024.

    Comments: Spotlight Paper, ML4H 2024; Leonidas H. Berry Health Equity Research Award, ACG 2024; Plenary Presentation, AASLD 2024

  28. arXiv:2412.05271  [pdf, other

    cs.CV

    Expanding Performance Boundaries of Open-Source Multimodal Models with Model, Data, and Test-Time Scaling

    Authors: Zhe Chen, Weiyun Wang, Yue Cao, Yangzhou Liu, Zhangwei Gao, Erfei Cui, Jinguo Zhu, Shenglong Ye, Hao Tian, Zhaoyang Liu, Lixin Gu, Xuehui Wang, Qingyun Li, Yimin Ren, Zixuan Chen, Jiapeng Luo, Jiahao Wang, Tan Jiang, Bo Wang, Conghui He, Botian Shi, Xingcheng Zhang, Han Lv, Yi Wang, Wenqi Shao , et al. (17 additional authors not shown)

    Abstract: We introduce InternVL 2.5, an advanced multimodal large language model (MLLM) series that builds upon InternVL 2.0, maintaining its core model architecture while introducing significant enhancements in training and testing strategies as well as data quality. In this work, we delve into the relationship between model scaling and performance, systematically exploring the performance trends in vision… ▽ More

    Submitted 13 January, 2025; v1 submitted 6 December, 2024; originally announced December 2024.

    Comments: Technical Report

  29. arXiv:2411.18290  [pdf, other

    eess.IV cs.CV

    Leveraging Semantic Asymmetry for Precise Gross Tumor Volume Segmentation of Nasopharyngeal Carcinoma in Planning CT

    Authors: Zi Li, Ying Chen, Zeli Chen, Yanzhou Su, Tai Ma, Tony C. W. Mok, Yan-Jie Zhou, Yunhai Bai, Zhinlin Zheng, Le Lu, Yirui Wang, Jia Ge, Xianghua Ye, Senxiang Yan, Dakai Jin

    Abstract: In the radiation therapy of nasopharyngeal carcinoma (NPC), clinicians typically delineate the gross tumor volume (GTV) using non-contrast planning computed tomography to ensure accurate radiation dose delivery. However, the low contrast between tumors and adjacent normal tissues necessitates that radiation oncologists manually delineate the tumors, often relying on diagnostic MRI for guidance. %… ▽ More

    Submitted 18 December, 2024; v1 submitted 27 November, 2024; originally announced November 2024.

  30. arXiv:2411.02715  [pdf, other

    cs.CV

    CIT: Rethinking Class-incremental Semantic Segmentation with a Class Independent Transformation

    Authors: Jinchao Ge, Bowen Zhang, Akide Liu, Minh Hieu Phan, Qi Chen, Yangyang Shu, Yang Zhao

    Abstract: Class-incremental semantic segmentation (CSS) requires that a model learn to segment new classes without forgetting how to segment previous ones: this is typically achieved by distilling the current knowledge and incorporating the latest data. However, bypassing iterative distillation by directly transferring outputs of initial classes to the current learning task is not supported in existing clas… ▽ More

    Submitted 4 November, 2024; originally announced November 2024.

    Comments: 11 pages, 5 figures

  31. arXiv:2411.02619  [pdf, other

    cs.RO cs.CV

    Tracking Tumors under Deformation from Partial Point Clouds using Occupancy Networks

    Authors: Pit Henrich, Jiawei Liu, Jiawei Ge, Samuel Schmidgall, Lauren Shepard, Ahmed Ezzat Ghazi, Franziska Mathis-Ullrich, Axel Krieger

    Abstract: To track tumors during surgery, information from preoperative CT scans is used to determine their position. However, as the surgeon operates, the tumor may be deformed which presents a major hurdle for accurately resecting the tumor, and can lead to surgical inaccuracy, increased operation time, and excessive margins. This issue is particularly pronounced in robot-assisted partial nephrectomy (RAP… ▽ More

    Submitted 4 November, 2024; originally announced November 2024.

    Comments: Accepted at IROS 2024

  32. arXiv:2410.13647  [pdf

    cs.CE cs.MM

    Multimodal growth and development assessment model

    Authors: Ying Li, Zichen Song, Zijie Gong, Sitan Huang, Jiewei Ge

    Abstract: With the development of social economy and the improvement of people's attention to health, the growth and development of children and adolescents has become an important indicator to measure the level of national health. Therefore, accurate and timely assessment of children's growth and development has become increasingly important. At the same time, global health inequalities, especially child m… ▽ More

    Submitted 17 October, 2024; originally announced October 2024.

    Comments: 7 Pages 7 Figures

  33. arXiv:2410.06351  [pdf, other

    cs.SE

    Moving Faster and Reducing Risk: Using LLMs in Release Deployment

    Authors: Rui Abreu, Vijayaraghavan Murali, Peter C Rigby, Chandra Maddila, Weiyan Sun, Jun Ge, Kaavya Chinniah, Audris Mockus, Megh Mehta, Nachiappan Nagappan

    Abstract: Release engineering has traditionally focused on continuously delivering features and bug fixes to users, but at a certain scale, it becomes impossible for a release engineering team to determine what should be released. At Meta's scale, the responsibility appropriately and necessarily falls back on the engineer writing and reviewing the code. To address this challenge, we developed models of diff… ▽ More

    Submitted 8 October, 2024; originally announced October 2024.

  34. arXiv:2408.14744  [pdf, other

    cs.CV cs.AI

    RSTeller: Scaling Up Visual Language Modeling in Remote Sensing with Rich Linguistic Semantics from Openly Available Data and Large Language Models

    Authors: Junyao Ge, Xu Zhang, Yang Zheng, Kaitai Guo, Jimin Liang

    Abstract: Abundant, well-annotated multimodal data in remote sensing are pivotal for aligning complex visual remote sensing (RS) scenes with human language, enabling the development of specialized vision language models across diverse RS interpretation tasks. However, annotating RS images with rich linguistic semantics at scale demands expertise in RS and substantial human labor, making it costly and often… ▽ More

    Submitted 16 April, 2025; v1 submitted 26 August, 2024; originally announced August 2024.

    Comments: Submitted to ISPRS

    ACM Class: I.4.8; I.2.10

  35. arXiv:2408.13491  [pdf, other

    cs.CV

    ESA: Annotation-Efficient Active Learning for Semantic Segmentation

    Authors: Jinchao Ge, Zeyu Zhang, Minh Hieu Phan, Bowen Zhang, Akide Liu, Yang Zhao

    Abstract: Active learning enhances annotation efficiency by selecting the most revealing samples for labeling, thereby reducing reliance on extensive human input. Previous methods in semantic segmentation have centered on individual pixels or small areas, neglecting the rich patterns in natural images and the power of advanced pre-trained models. To address these challenges, we propose three key contributio… ▽ More

    Submitted 24 August, 2024; originally announced August 2024.

  36. arXiv:2408.09474  [pdf, other

    cs.CR cs.CL cs.CV

    Image-Based Geolocation Using Large Vision-Language Models

    Authors: Yi Liu, Junchen Ding, Gelei Deng, Yuekang Li, Tianwei Zhang, Weisong Sun, Yaowen Zheng, Jingquan Ge, Yang Liu

    Abstract: Geolocation is now a vital aspect of modern life, offering numerous benefits but also presenting serious privacy concerns. The advent of large vision-language models (LVLMs) with advanced image-processing capabilities introduces new risks, as these models can inadvertently reveal sensitive geolocation information. This paper presents the first in-depth study analyzing the challenges posed by tradi… ▽ More

    Submitted 18 August, 2024; originally announced August 2024.

  37. arXiv:2408.07894  [pdf, other

    cs.NI cs.LG

    System States Forecasting of Microservices with Dynamic Spatio-Temporal Data

    Authors: Yifei Xu, Jingguo Ge, Haina Tang, Shuai Ding, Tong Li, Hui Li

    Abstract: In the AIOps (Artificial Intelligence for IT Operations) era, accurately forecasting system states is crucial. In microservices systems, this task encounters the challenge of dynamic and complex spatio-temporal relationships among microservice instances, primarily due to dynamic deployments, diverse call paths, and cascading effects among instances. Current time-series forecasting methods, which f… ▽ More

    Submitted 14 August, 2024; originally announced August 2024.

  38. arXiv:2407.07365  [pdf, other

    cs.CV

    High-Resolution Cloud Detection Network

    Authors: Jingsheng Li, Tianxiang Xue, Jiayi Zhao, Jingmin Ge, Yufang Min, Wei Su, Kun Zhan

    Abstract: The complexity of clouds, particularly in terms of texture detail at high resolutions, has not been well explored by most existing cloud detection networks. This paper introduces the High-Resolution Cloud Detection Network (HR-cloud-Net), which utilizes a hierarchical high-resolution integration approach. HR-cloud-Net integrates a high-resolution representation module, layer-wise cascaded feature… ▽ More

    Submitted 10 July, 2024; originally announced July 2024.

    Comments: Journal of Electronic Imaging

  39. arXiv:2407.05463  [pdf, other

    cs.CL

    Training Task Experts through Retrieval Based Distillation

    Authors: Jiaxin Ge, Xueying Jia, Vijay Viswanathan, Hongyin Luo, Graham Neubig

    Abstract: One of the most reliable ways to create deployable models for specialized tasks is to obtain an adequate amount of high-quality task-specific data. However, for specialized tasks, often such datasets do not exist. Existing methods address this by creating such data from large language models (LLMs) and then distilling such knowledge into smaller models. However, these methods are limited by the qu… ▽ More

    Submitted 7 July, 2024; originally announced July 2024.

  40. arXiv:2407.03595  [pdf, other

    econ.GN cs.LG

    Machine Learning for Economic Forecasting: An Application to China's GDP Growth

    Authors: Yanqing Yang, Xingcheng Xu, Jinfeng Ge, Yan Xu

    Abstract: This paper aims to explore the application of machine learning in forecasting Chinese macroeconomic variables. Specifically, it employs various machine learning models to predict the quarterly real GDP growth of China, and analyzes the factors contributing to the performance differences among these models. Our findings indicate that the average forecast errors of machine learning models are genera… ▽ More

    Submitted 3 July, 2024; originally announced July 2024.

  41. arXiv:2406.14887  [pdf, other

    cs.CL

    InternLM-Law: An Open Source Chinese Legal Large Language Model

    Authors: Zhiwei Fei, Songyang Zhang, Xiaoyu Shen, Dawei Zhu, Xiao Wang, Maosong Cao, Fengzhe Zhou, Yining Li, Wenwei Zhang, Dahua Lin, Kai Chen, Jidong Ge

    Abstract: While large language models (LLMs) have showcased impressive capabilities, they struggle with addressing legal queries due to the intricate complexities and specialized expertise required in the legal field. In this paper, we introduce InternLM-Law, a specialized LLM tailored for addressing diverse legal queries related to Chinese laws, spanning from responding to standard legal questions (e.g., l… ▽ More

    Submitted 21 June, 2024; originally announced June 2024.

    Comments: Our dataset, code and models will be released at https://github.com/InternLM/InternLM-Law

  42. arXiv:2406.04330  [pdf, other

    cs.CV

    Parameter-Inverted Image Pyramid Networks

    Authors: Xizhou Zhu, Xue Yang, Zhaokai Wang, Hao Li, Wenhan Dou, Junqi Ge, Lewei Lu, Yu Qiao, Jifeng Dai

    Abstract: Image pyramids are commonly used in modern computer vision tasks to obtain multi-scale features for precise understanding of images. However, image pyramids process multiple resolutions of images using the same large-scale model, which requires significant computational cost. To overcome this issue, we propose a novel network architecture known as the Parameter-Inverted Image Pyramid Networks (PII… ▽ More

    Submitted 28 October, 2024; v1 submitted 6 June, 2024; originally announced June 2024.

  43. arXiv:2406.04201  [pdf, ps, other

    cs.LG cs.MA math.OC stat.ML

    Securing Equal Share: A Principled Approach for Learning Multiplayer Symmetric Games

    Authors: Jiawei Ge, Yuanhao Wang, Wenzhe Li, Chi Jin

    Abstract: This paper examines multiplayer symmetric constant-sum games with more than two players in a competitive setting, including examples like Mahjong, Poker, and various board and video games. In contrast to two-player zero-sum games, equilibria in multiplayer games are neither unique nor non-exploitable, failing to provide meaningful guarantees when competing against opponents who play different equi… ▽ More

    Submitted 2 October, 2024; v1 submitted 6 June, 2024; originally announced June 2024.

  44. arXiv:2405.17418  [pdf, other

    cs.CV

    A Self-Correcting Vision-Language-Action Model for Fast and Slow System Manipulation

    Authors: Chenxuan Li, Jiaming Liu, Guanqun Wang, Xiaoqi Li, Sixiang Chen, Liang Heng, Chuyan Xiong, Jiaxin Ge, Renrui Zhang, Kaichen Zhou, Shanghang Zhang

    Abstract: Recently, some studies have integrated Multimodal Large Language Models into robotic manipulation, constructing vision-language-action models (VLAs) to interpret multimodal information and predict SE(3) poses. While VLAs have shown promising progress, they may suffer from failures when faced with novel and complex tasks. To emulate human-like reasoning for more robust manipulation, we propose the… ▽ More

    Submitted 18 March, 2025; v1 submitted 27 May, 2024; originally announced May 2024.

  45. arXiv:2405.10302  [pdf, other

    stat.ME cs.LG math.ST stat.ML

    Optimal Aggregation of Prediction Intervals under Unsupervised Domain Shift

    Authors: Jiawei Ge, Debarghya Mukherjee, Jianqing Fan

    Abstract: As machine learning models are increasingly deployed in dynamic environments, it becomes paramount to assess and quantify uncertainties associated with distribution shifts. A distribution shift occurs when the underlying data-generating process changes, leading to a deviation in the model's performance. The prediction interval, which captures the range of likely outcomes for a given prediction, se… ▽ More

    Submitted 7 October, 2024; v1 submitted 16 May, 2024; originally announced May 2024.

  46. arXiv:2405.04966  [pdf, other

    cs.IT cs.CV cs.MA

    Communication-Efficient Collaborative Perception via Information Filling with Codebook

    Authors: Yue Hu, Juntong Peng, Sifei Liu, Junhao Ge, Si Liu, Siheng Chen

    Abstract: Collaborative perception empowers each agent to improve its perceptual ability through the exchange of perceptual messages with other agents. It inherently results in a fundamental trade-off between perception ability and communication cost. To address this bottleneck issue, our core idea is to optimize the collaborative messages from two key aspects: representation and selection. The proposed cod… ▽ More

    Submitted 8 May, 2024; originally announced May 2024.

    Comments: 10 pages, Accepted by CVPR 2024

  47. arXiv:2405.00696  [pdf, other

    cs.RO

    Life-long Learning and Testing for Automated Vehicles via Adaptive Scenario Sampling as A Continuous Optimization Process

    Authors: Jingwei Ge, Pengbo Wang, Cheng Chang, Yi Zhang, Danya Yao, Li Li

    Abstract: Sampling critical testing scenarios is an essential step in intelligence testing for Automated Vehicles (AVs). However, due to the lack of prior knowledge on the distribution of critical scenarios in sampling space, we can hardly efficiently find the critical scenarios or accurately evaluate the intelligence of AVs. To solve this problem, we formulate the testing as a continuous optimization proce… ▽ More

    Submitted 28 March, 2024; originally announced May 2024.

  48. arXiv:2404.16611  [pdf, ps, other

    cs.IT eess.SP

    Towards Symbiotic SAGIN Through Inter-operator Resource and Service Sharing: Joint Orchestration of User Association and Radio Resources

    Authors: Shizhao He, Jungang Ge, Ying-Chang Liang, Dusit Niyato

    Abstract: The space-air-ground integrated network (SAGIN) is a pivotal architecture to support ubiquitous connectivity in the upcoming 6G era. Inter-operator resource and service sharing is a promising way to realize such a huge network, utilizing resources efficiently and reducing construction costs. Given the rationality of operators, the configuration of resources and services in SAGIN should focus on bo… ▽ More

    Submitted 25 April, 2024; originally announced April 2024.

  49. arXiv:2404.09496  [pdf, other

    cs.CV

    Towards Collaborative Autonomous Driving: Simulation Platform and End-to-End System

    Authors: Genjia Liu, Yue Hu, Chenxin Xu, Weibo Mao, Junhao Ge, Zhengxiang Huang, Yifan Lu, Yinda Xu, Junkai Xia, Yafei Wang, Siheng Chen

    Abstract: Vehicle-to-everything-aided autonomous driving (V2X-AD) has a huge potential to provide a safer driving solution. Despite extensive researches in transportation and communication to support V2X-AD, the actual utilization of these infrastructures and communication resources in enhancing driving performances remains largely unexplored. This highlights the necessity of collaborative autonomous drivin… ▽ More

    Submitted 9 April, 2025; v1 submitted 15 April, 2024; originally announced April 2024.

    Comments: Accepted by IEEE T-PAMI

  50. arXiv:2404.06201  [pdf, other

    cs.SE cs.AI

    Open-Source AI-based SE Tools: Opportunities and Challenges of Collaborative Software Learning

    Authors: Zhihao Lin, Wei Ma, Tao Lin, Yaowen Zheng, Jingquan Ge, Jun Wang, Jacques Klein, Tegawende Bissyande, Yang Liu, Li Li

    Abstract: Large Language Models (LLMs) have become instrumental in advancing software engineering (SE) tasks, showcasing their efficacy in code understanding and beyond. Like traditional SE tools, open-source collaboration is key in realising the excellent products. However, with AI models, the essential need is in data. The collaboration of these AI-based SE models hinges on maximising the sources of high-… ▽ More

    Submitted 9 April, 2024; originally announced April 2024.