Skip to main content

Showing 1–15 of 15 results for author: Rong, X

Searching in archive cs. Search in all archives.
.
  1. arXiv:2503.18366  [pdf, other

    cs.RO

    Reinforcement Learning for Adaptive Planner Parameter Tuning: A Perspective on Hierarchical Architecture

    Authors: Lu Wangtao, Wei Yufei, Xu Jiadong, Jia Wenhao, Li Liang, Xiong Rong, Wang Yue

    Abstract: Automatic parameter tuning methods for planning algorithms, which integrate pipeline approaches with learning-based techniques, are regarded as promising due to their stability and capability to handle highly constrained environments. While existing parameter tuning methods have demonstrated considerable success, further performance improvements require a more structured approach. In this paper, w… ▽ More

    Submitted 24 March, 2025; originally announced March 2025.

  2. arXiv:2503.04543  [pdf, other

    cs.CL cs.AI

    Keeping Yourself is Important in Downstream Tuning Multimodal Large Language Model

    Authors: Wenke Huang, Jian Liang, Xianda Guo, Yiyang Fang, Guancheng Wan, Xuankun Rong, Chi Wen, Zekun Shi, Qingyun Li, Didi Zhu, Yanbiao Ma, Ke Liang, Bin Yang, He Li, Jiawei Shao, Mang Ye, Bo Du

    Abstract: Multi-modal Large Language Models (MLLMs) integrate visual and linguistic reasoning to address complex tasks such as image captioning and visual question answering. While MLLMs demonstrate remarkable versatility, MLLMs appears limited performance on special applications. But tuning MLLMs for downstream tasks encounters two key challenges: Task-Expert Specialization, where distribution shifts betwe… ▽ More

    Submitted 6 March, 2025; originally announced March 2025.

  3. arXiv:2502.14881  [pdf, other

    cs.CR cs.CV

    A Survey of Safety on Large Vision-Language Models: Attacks, Defenses and Evaluations

    Authors: Mang Ye, Xuankun Rong, Wenke Huang, Bo Du, Nenghai Yu, Dacheng Tao

    Abstract: With the rapid advancement of Large Vision-Language Models (LVLMs), ensuring their safety has emerged as a crucial area of research. This survey provides a comprehensive analysis of LVLM safety, covering key aspects such as attacks, defenses, and evaluation methods. We introduce a unified framework that integrates these interrelated components, offering a holistic perspective on the vulnerabilitie… ▽ More

    Submitted 14 February, 2025; originally announced February 2025.

    Comments: 22 pages, 2 figures

  4. arXiv:2502.14224  [pdf, other

    eess.AS cs.SD

    Adaptive Convolution for CNN-based Speech Enhancement Models

    Authors: Dahan Wang, Xiaobin Rong, Shiruo Sun, Yuxiang Hu, Changbao Zhu, Jing Lu

    Abstract: Deep learning-based speech enhancement methods have significantly improved speech quality and intelligibility. Convolutional neural networks (CNNs) have been proven to be essential components of many high-performance models. In this paper, we introduce adaptive convolution, an efficient and versatile convolutional module that enhances the model's capability to adaptively represent speech signals.… ▽ More

    Submitted 19 February, 2025; originally announced February 2025.

    Comments: Submitted to IEEE/ACM Transactions on Audio, Speech, and Language Processing

  5. Finding Reproducible and Prognostic Radiomic Features in Variable Slice Thickness Contrast Enhanced CT of Colorectal Liver Metastases

    Authors: Jacob J. Peoples, Mohammad Hamghalam, Imani James, Maida Wasim, Natalie Gangai, Hyunseon Christine Kang, X. John Rong, Yun Shin Chun, Richard K. G. Do, Amber L. Simpson

    Abstract: Establishing the reproducibility of radiomic signatures is a critical step in the path to clinical adoption of quantitative imaging biomarkers; however, radiomic signatures must also be meaningfully related to an outcome of clinical importance to be of value for personalized medicine. In this study, we analyze both the reproducibility and prognostic value of radiomic features extracted from the li… ▽ More

    Submitted 19 January, 2025; originally announced January 2025.

    Comments: Accepted for publication at the Journal of Machine Learning for Biomedical Imaging (MELBA) https://melba-journal.org/2024:032

    Journal ref: Machine.Learning.for.Biomedical.Imaging. 2 (2025)

  6. arXiv:2408.10502  [pdf, ps, other

    stat.ML cs.LG math.ST

    Asymptotic Classification Error for Heavy-Tailed Renewal Processes

    Authors: Xinhui Rong, Victor Solo

    Abstract: Despite the widespread occurrence of classification problems and the increasing collection of point process data across many disciplines, study of error probability for point process classification only emerged very recently. Here, we consider classification of renewal processes. We obtain asymptotic expressions for the Bhattacharyya bound on misclassification error probabilities for heavy-tailed… ▽ More

    Submitted 19 August, 2024; originally announced August 2024.

    Comments: 11 pages, 2 figures

  7. arXiv:2407.00949  [pdf, ps, other

    cs.CV eess.IV

    SpectralKAN: Kolmogorov-Arnold Network for Hyperspectral Images Change Detection

    Authors: Yanheng Wang, Xiaohan Yu, Yongsheng Gao, Jianjun Sha, Jian Wang, Lianru Gao, Yonggang Zhang, Xianhui Rong

    Abstract: It has been verified that deep learning methods, including convolutional neural networks (CNNs), graph neural networks (GNNs), and transformers, can accurately extract features from hyperspectral images (HSIs). These algorithms perform exceptionally well on HSIs change detection (HSIs-CD). However, the downside of these impressive results is the enormous number of parameters, FLOPs, GPU memory, tr… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

  8. arXiv:2210.05828  [pdf, other

    cs.CV cs.AI cs.LG

    AMICO: Amodal Instance Composition

    Authors: Peiye Zhuang, Jia-bin Huang, Ayush Saraf, Xuejian Rong, Changil Kim, Denis Demandolx

    Abstract: Image composition aims to blend multiple objects to form a harmonized image. Existing approaches often assume precisely segmented and intact objects. Such assumptions, however, are hard to satisfy in unconstrained scenarios. We present Amodal Instance Composition for compositing imperfect -- potentially incomplete and/or coarsely segmented -- objects onto a target image. We first develop object sh… ▽ More

    Submitted 11 October, 2022; originally announced October 2022.

    Comments: Accepted to BMVC 2021, 20 oages, 12 figures

  9. arXiv:2204.09860  [pdf

    cs.CV cs.IR cs.MM

    Remote Sensing Cross-Modal Text-Image Retrieval Based on Global and Local Information

    Authors: Zhiqiang Yuan, Wenkai Zhang, Changyuan Tian, Xuee Rong, Zhengyuan Zhang, Hongqi Wang, Kun Fu, Xian Sun

    Abstract: Cross-modal remote sensing text-image retrieval (RSCTIR) has recently become an urgent research hotspot due to its ability of enabling fast and flexible information extraction on remote sensing (RS) images. However, current RSCTIR methods mainly focus on global features of RS images, which leads to the neglect of local features that reflect target relationships and saliency. In this article, we fi… ▽ More

    Submitted 20 April, 2022; originally announced April 2022.

    Journal ref: in IEEE Transactions on Geoscience and Remote Sensing, vol. 60, pp. 1-16, 2022, Art no. 5620616

  10. arXiv:2012.05901  [pdf, other

    cs.CV

    Robust Consistent Video Depth Estimation

    Authors: Johannes Kopf, Xuejian Rong, Jia-Bin Huang

    Abstract: We present an algorithm for estimating consistent dense depth maps and camera poses from a monocular video. We integrate a learning-based depth prior, in the form of a convolutional neural network trained for single-image depth estimation, with geometric optimization, to estimate a smooth camera trajectory as well as detailed and stable depth reconstruction. Our algorithm combines two complementar… ▽ More

    Submitted 21 June, 2021; v1 submitted 10 December, 2020; originally announced December 2020.

    Comments: Project website: https://robust-cvd.github.io/

  11. arXiv:2004.12498  [pdf, other

    cs.CV

    Weakly Supervised Semantic Segmentation in 3D Graph-Structured Point Clouds of Wild Scenes

    Authors: Haiyan Wang, Xuejian Rong, Liang Yang, Jinglun Feng, Jizhong Xiao, Yingli Tian

    Abstract: The deficiency of 3D segmentation labels is one of the main obstacles to effective point cloud segmentation, especially for scenes in the wild with varieties of different objects. To alleviate this issue, we propose a novel deep graph convolutional network-based framework for large-scale semantic scene segmentation in point clouds with sole 2D supervision. Different with numerous preceding multi-v… ▽ More

    Submitted 17 May, 2020; v1 submitted 26 April, 2020; originally announced April 2020.

    Comments: 13 pages, 8 figures, Under review as a journal paper at CVIU

  12. arXiv:1811.12297  [pdf, other

    cs.CV cs.LG

    Incremental Scene Synthesis

    Authors: Benjamin Planche, Xuejian Rong, Ziyan Wu, Srikrishna Karanam, Harald Kosch, YingLi Tian, Jan Ernst, Andreas Hutter

    Abstract: We present a method to incrementally generate complete 2D or 3D scenes with the following properties: (a) it is globally consistent at each step according to a learned scene prior, (b) real observations of a scene can be incorporated while observing global consistency, (c) unobserved regions can be hallucinated locally in consistence with previous observations, hallucinations and global priors, an… ▽ More

    Submitted 13 November, 2019; v1 submitted 29 November, 2018; originally announced November 2018.

    Journal ref: 33rd Conference on Neural Information Processing Systems (NeurIPS 2019)

  13. arXiv:1810.11367  [pdf, other

    cs.CL cs.HC cs.LG stat.ML

    LAMVI-2: A Visual Tool for Comparing and Tuning Word Embedding Models

    Authors: Xin Rong, Joshua Luckson, Eytan Adar

    Abstract: Tuning machine learning models, particularly deep learning architectures, is a complex process. Automated hyperparameter tuning algorithms often depend on specific optimization metrics. However, in many situations, a developer trades one metric against another: accuracy versus overfitting, precision versus recall, smaller models and accuracy, etc. With deep learning, not only are the model's repre… ▽ More

    Submitted 22 October, 2018; originally announced October 2018.

  14. arXiv:1411.2738  [pdf, other

    cs.CL

    word2vec Parameter Learning Explained

    Authors: Xin Rong

    Abstract: The word2vec model and application by Mikolov et al. have attracted a great amount of attention in recent two years. The vector representations of words learned by word2vec models have been shown to carry semantic meanings and are useful in various NLP tasks. As an increasing number of researchers would like to experiment with word2vec or similar techniques, I notice that there lacks a material th… ▽ More

    Submitted 5 June, 2016; v1 submitted 11 November, 2014; originally announced November 2014.

  15. arXiv:1010.3177  [pdf

    cs.AI

    Introduction to the iDian

    Authors: Xin Rong

    Abstract: The iDian (previously named as the Operation Agent System) is a framework designed to enable computer users to operate software in natural language. Distinct from current speech-recognition systems, our solution supports format-free combinations of orders, and is open to both developers and customers. We used a multi-layer structure to build the entire framework, approached rule-based natural lang… ▽ More

    Submitted 15 October, 2010; originally announced October 2010.

    Comments: 4 pages