Skip to main content

Showing 1–50 of 176 results for author: Ge, X

Searching in archive cs. Search in all archives.
.
  1. arXiv:2504.20938  [pdf, other

    cs.LG cs.CL

    Towards Understanding the Nature of Attention with Low-Rank Sparse Decomposition

    Authors: Zhengfu He, Junxuan Wang, Rui Lin, Xuyang Ge, Wentao Shu, Qiong Tang, Junping Zhang, Xipeng Qiu

    Abstract: We propose Low-Rank Sparse Attention (Lorsa), a sparse replacement model of Transformer attention layers to disentangle original Multi Head Self Attention (MHSA) into individually comprehensible components. Lorsa is designed to address the challenge of attention superposition to understand attention-mediated interaction between features in different token positions. We show that Lorsa heads find c… ▽ More

    Submitted 29 April, 2025; originally announced April 2025.

  2. arXiv:2504.18114  [pdf, other

    cs.CL cs.AI cs.LG

    Evaluating Evaluation Metrics -- The Mirage of Hallucination Detection

    Authors: Atharva Kulkarni, Yuan Zhang, Joel Ruben Antony Moniz, Xiou Ge, Bo-Hsiang Tseng, Dhivya Piraviperumal, Swabha Swayamdipta, Hong Yu

    Abstract: Hallucinations pose a significant obstacle to the reliability and widespread adoption of language models, yet their accurate measurement remains a persistent challenge. While many task- and domain-specific metrics have been proposed to assess faithfulness and factuality concerns, the robustness and generalization of these metrics are still untested. In this paper, we conduct a large-scale empirica… ▽ More

    Submitted 25 April, 2025; originally announced April 2025.

  3. arXiv:2504.14788  [pdf, ps, other

    cs.IR

    The 1st EReL@MIR Workshop on Efficient Representation Learning for Multimodal Information Retrieval

    Authors: Junchen Fu, Xuri Ge, Xin Xin, Haitao Yu, Yue Feng, Alexandros Karatzoglou, Ioannis Arapakis, Joemon M. Jose

    Abstract: Multimodal representation learning has garnered significant attention in the AI community, largely due to the success of large pre-trained multimodal foundation models like LLaMA, GPT, Mistral, and CLIP. These models have achieved remarkable performance across various tasks of multimodal information retrieval (MIR), including web search, cross-modal retrieval, and recommender systems, etc. However… ▽ More

    Submitted 20 April, 2025; originally announced April 2025.

    Comments: WWW2025 Workshop Summary

  4. arXiv:2504.10351  [pdf, other

    cs.CV

    Multimodal Representation Learning Techniques for Comprehensive Facial State Analysis

    Authors: Kaiwen Zheng, Xuri Ge, Junchen Fu, Jun Peng, Joemon M. Jose

    Abstract: Multimodal foundation models have significantly improved feature representation by integrating information from multiple modalities, making them highly suitable for a broader set of applications. However, the exploration of multimodal facial representation for understanding perception has been limited. Understanding and analyzing facial states, such as Action Units (AUs) and emotions, require a co… ▽ More

    Submitted 14 April, 2025; originally announced April 2025.

    Comments: Accepted by ICME2025

    Journal ref: ICME2025

  5. arXiv:2504.10307  [pdf, other

    cs.IR

    CROSSAN: Towards Efficient and Effective Adaptation of Multiple Multimodal Foundation Models for Sequential Recommendation

    Authors: Junchen Fu, Yongxin Ni, Joemon M. Jose, Ioannis Arapakis, Kaiwen Zheng, Youhua Li, Xuri Ge

    Abstract: Multimodal Foundation Models (MFMs) excel at representing diverse raw modalities (e.g., text, images, audio, videos, etc.). As recommender systems increasingly incorporate these modalities, leveraging MFMs to generate better representations has great potential. However, their application in sequential recommendation remains largely unexplored. This is primarily because mainstream adaptation method… ▽ More

    Submitted 14 April, 2025; originally announced April 2025.

  6. arXiv:2504.08875  [pdf, other

    q-bio.QM cs.HC cs.LG stat.AP

    DataMap: A Portable Application for Visualizing High-Dimensional Data

    Authors: Xijin Ge

    Abstract: Motivation: The visualization and analysis of high-dimensional data are essential in biomedical research. There is a need for secure, scalable, and reproducible tools to facilitate data exploration and interpretation. Results: We introduce DataMap, a browser-based application for visualization of high-dimensional data using heatmaps, principal component analysis (PCA), and t-distributed stochastic… ▽ More

    Submitted 11 April, 2025; originally announced April 2025.

  7. arXiv:2503.19717  [pdf, other

    cs.LG cs.AI

    Invertible Koopman neural operator for data-driven modeling of partial differential equations

    Authors: Yuhong Jin, Andong Cong, Lei Hou, Qiang Gao, Xiangdong Ge, Chonglong Zhu, Yongzhi Feng, Jun Li

    Abstract: Koopman operator theory is a popular candidate for data-driven modeling because it provides a global linearization representation for nonlinear dynamical systems. However, existing Koopman operator-based methods suffer from shortcomings in constructing the well-behaved observable function and its inverse and are inefficient enough when dealing with partial differential equations (PDEs). To address… ▽ More

    Submitted 25 March, 2025; originally announced March 2025.

    Comments: 25 pages, 10 figures

  8. arXiv:2502.12945  [pdf, other

    cs.CL cs.CV

    LLMPopcorn: An Empirical Study of LLMs as Assistants for Popular Micro-video Generation

    Authors: Junchen Fu, Xuri Ge, Kaiwen Zheng, Ioannis Arapakis, Xin Xin, Joemon M. Jose

    Abstract: Popular Micro-videos, dominant on platforms like TikTok and YouTube, hold significant commercial value. The rise of high-quality AI-generated content has spurred interest in AI-driven micro-video creation. However, despite the advanced capabilities of large language models (LLMs) like ChatGPT and DeepSeek in text generation and reasoning, their potential to assist the creation of popular micro-vid… ▽ More

    Submitted 18 February, 2025; v1 submitted 18 February, 2025; originally announced February 2025.

  9. arXiv:2501.18913  [pdf, other

    cs.CV

    Rethinking Diffusion Posterior Sampling: From Conditional Score Estimator to Maximizing a Posterior

    Authors: Tongda Xu, Xiyan Cai, Xinjie Zhang, Xingtong Ge, Dailan He, Ming Sun, Jingjing Liu, Ya-Qin Zhang, Jian Li, Yan Wang

    Abstract: Recent advancements in diffusion models have been leveraged to address inverse problems without additional training, and Diffusion Posterior Sampling (DPS) (Chung et al., 2022a) is among the most popular approaches. Previous analyses suggest that DPS accomplishes posterior sampling by approximating the conditional score. While in this paper, we demonstrate that the conditional score approximation… ▽ More

    Submitted 31 January, 2025; originally announced January 2025.

    Comments: ICLR 2025

  10. arXiv:2501.07884  [pdf

    cs.LG q-bio.QM

    MD-Syn: Synergistic drug combination prediction based on the multidimensional feature fusion method and attention mechanisms

    Authors: XinXin Ge, Yi-Ting Lee, Shan-Ju Yeh

    Abstract: Drug combination therapies have shown promising therapeutic efficacy in complex diseases and have demonstrated the potential to reduce drug resistance. However, the huge number of possible drug combinations makes it difficult to screen them all in traditional experiments. In this study, we proposed MD-Syn, a computational framework, which is based on the multidimensional feature fusion method and… ▽ More

    Submitted 14 January, 2025; originally announced January 2025.

  11. arXiv:2501.02524  [pdf, other

    cs.AR

    A Full-System Simulation Framework for CXL-Based SSD Memory System

    Authors: Yaohui Wang, Zicong Wang, Fanfeng Meng, Yanjing Wang, Yang Ou, Lizhou Wu, Wentao Hong, Xuran Ge, Jijun Cao

    Abstract: Compute eXpress Link (CXL) is a promising technology for memory disaggregation and expansion. Especially, CXL makes it more effectively for large-capacity storage devices such as Solid State Drive (SSD) to be deployed in the memory pool. However, CXL-based SSDs are still in early stages, necessitating the development of reliable simulation tools. In this paper, we propose CXL-SSD-Sim, the first op… ▽ More

    Submitted 5 January, 2025; originally announced January 2025.

  12. arXiv:2412.20151  [pdf, ps, other

    cs.NI cs.DC

    Contention-Aware Microservice Deployment in Collaborative Mobile Edge Networks

    Authors: Xinlei Ge, Yang Li, Xing Zhang, Yukun Sun, Yunji Zhao

    Abstract: As an emerging computing paradigm, mobile edge computing (MEC) provides processing capabilities at the network edge, aiming to reduce latency and improve user experience. Meanwhile, the advancement of containerization technology facilitates the deployment of microservice-based applications via edge node collaboration, ensuring highly efficient service delivery. However, existing research overlooks… ▽ More

    Submitted 1 January, 2025; v1 submitted 28 December, 2024; originally announced December 2024.

  13. arXiv:2412.18393  [pdf, other

    cs.SE

    Static Code Analyzer Recommendation via Preference Mining

    Authors: Xiuting Ge, Chunrong Fang, Xuanye Li, Ye Shang, Mengyao Zhang, Ya Pan

    Abstract: Static Code Analyzers (SCAs) have played a critical role in software quality assurance. However, SCAs with various static analysis techniques suffer from different levels of false positives and false negatives, thereby yielding the varying performance in SCAs. To detect more defects in a given project, it is a possible way to use more available SCAs for scanning this project. Due to producing unac… ▽ More

    Submitted 24 December, 2024; originally announced December 2024.

  14. arXiv:2412.07292  [pdf, other

    cs.MM cs.CL

    Multimodal Sentiment Analysis Based on Causal Reasoning

    Authors: Fuhai Chen, Pengpeng Huang, Xuri Ge, Jie Huang, Zishuo Bao

    Abstract: With the rapid development of multimedia, the shift from unimodal textual sentiment analysis to multimodal image-text sentiment analysis has obtained academic and industrial attention in recent years. However, multimodal sentiment analysis is affected by unimodal data bias, e.g., text sentiment is misleading due to explicit sentiment semantic, leading to low accuracy in the final sentiment classif… ▽ More

    Submitted 10 December, 2024; originally announced December 2024.

  15. arXiv:2411.02992  [pdf, other

    cs.IR cs.CV

    Efficient and Effective Adaptation of Multimodal Foundation Models in Sequential Recommendation

    Authors: Junchen Fu, Xuri Ge, Xin Xin, Alexandros Karatzoglou, Ioannis Arapakis, Kaiwen Zheng, Yongxin Ni, Joemon M. Jose

    Abstract: Multimodal foundation models (MFMs) have revolutionized sequential recommender systems through advanced representation learning. While Parameter-efficient Fine-tuning (PEFT) is commonly used to adapt these models, studies often prioritize parameter efficiency, neglecting GPU memory and training speed. To address this, we introduced the IISAN framework, significantly enhancing efficiency. However,… ▽ More

    Submitted 5 November, 2024; originally announced November 2024.

    Comments: The extension of IISAN in SIGIR2024

  16. arXiv:2410.20598  [pdf, ps, other

    cs.IR

    R^3AG: First Workshop on Refined and Reliable Retrieval Augmented Generation

    Authors: Zihan Wang, Xuri Ge, Joemon M. Jose, Haitao Yu, Weizhi Ma, Zhaochun Ren, Xin Xin

    Abstract: Retrieval-augmented generation (RAG) has gained wide attention as the key component to improve generative models with external knowledge augmentation from information retrieval. It has shown great prominence in enhancing the functionality and performance of large language model (LLM)-based applications. However, with the comprehensive application of RAG, more and more problems and limitations have… ▽ More

    Submitted 5 November, 2024; v1 submitted 27 October, 2024; originally announced October 2024.

    Comments: R^3AG workshop overview at SIGIR-AP 2024

  17. arXiv:2410.20526  [pdf, other

    cs.LG cs.CL

    Llama Scope: Extracting Millions of Features from Llama-3.1-8B with Sparse Autoencoders

    Authors: Zhengfu He, Wentao Shu, Xuyang Ge, Lingjie Chen, Junxuan Wang, Yunhua Zhou, Frances Liu, Qipeng Guo, Xuanjing Huang, Zuxuan Wu, Yu-Gang Jiang, Xipeng Qiu

    Abstract: Sparse Autoencoders (SAEs) have emerged as a powerful unsupervised method for extracting sparse representations from language models, yet scalable training remains a significant challenge. We introduce a suite of 256 SAEs, trained on each layer and sublayer of the Llama-3.1-8B-Base model, with 32K and 128K features. Modifications to a state-of-the-art SAE variant, Top-K SAEs, are evaluated across… ▽ More

    Submitted 27 October, 2024; originally announced October 2024.

    Comments: 22pages, 12 figures

  18. arXiv:2410.13613  [pdf, other

    cs.CV cs.GR

    MEGA: Memory-Efficient 4D Gaussian Splatting for Dynamic Scenes

    Authors: Xinjie Zhang, Zhening Liu, Yifan Zhang, Xingtong Ge, Dailan He, Tongda Xu, Yan Wang, Zehong Lin, Shuicheng Yan, Jun Zhang

    Abstract: 4D Gaussian Splatting (4DGS) has recently emerged as a promising technique for capturing complex dynamic 3D scenes with high fidelity. It utilizes a 4D Gaussian representation and a GPU-friendly rasterizer, enabling rapid rendering speeds. Despite its advantages, 4DGS faces significant challenges, notably the requirement of millions of 4D Gaussians, each with extensive associated attributes, leadi… ▽ More

    Submitted 17 October, 2024; originally announced October 2024.

  19. HpEIS: Learning Hand Pose Embeddings for Multimedia Interactive Systems

    Authors: Songpei Xu, Xuri Ge, Chaitanya Kaul, Roderick Murray-Smith

    Abstract: We present a novel Hand-pose Embedding Interactive System (HpEIS) as a virtual sensor, which maps users' flexible hand poses to a two-dimensional visual space using a Variational Autoencoder (VAE) trained on a variety of hand poses. HpEIS enables visually interpretable and guidable support for user explorations in multimedia collections, using only a camera as an external hand pose acquisition dev… ▽ More

    Submitted 11 October, 2024; originally announced October 2024.

    Comments: 6 pages, 8 figures, 3 tables

  20. arXiv:2410.06672  [pdf, other

    cs.CL

    Towards Universality: Studying Mechanistic Similarity Across Language Model Architectures

    Authors: Junxuan Wang, Xuyang Ge, Wentao Shu, Qiong Tang, Yunhua Zhou, Zhengfu He, Xipeng Qiu

    Abstract: The hypothesis of Universality in interpretability suggests that different neural networks may converge to implement similar algorithms on similar tasks. In this work, we investigate two mainstream architectures for language modeling, namely Transformers and Mambas, to explore the extent of their mechanistic similarity. We propose to use Sparse Autoencoders (SAEs) to isolate interpretable features… ▽ More

    Submitted 10 October, 2024; v1 submitted 9 October, 2024; originally announced October 2024.

    Comments: 22 pages, 13 figures

  21. arXiv:2409.08572  [pdf, other

    cs.CV

    DiffFAS: Face Anti-Spoofing via Generative Diffusion Models

    Authors: Xinxu Ge, Xin Liu, Zitong Yu, Jingang Shi, Chun Qi, Jie Li, Heikki Kälviäinen

    Abstract: Face anti-spoofing (FAS) plays a vital role in preventing face recognition (FR) systems from presentation attacks. Nowadays, FAS systems face the challenge of domain shift, impacting the generalization performance of existing FAS methods. In this paper, we rethink about the inherence of domain shift and deconstruct it into two factors: image style and image quality. Quality influences the purity o… ▽ More

    Submitted 13 September, 2024; originally announced September 2024.

    Comments: ECCV 24

  22. arXiv:2409.08069  [pdf, other

    cs.AI cs.CL

    TravelAgent: An AI Assistant for Personalized Travel Planning

    Authors: Aili Chen, Xuyang Ge, Ziquan Fu, Yanghua Xiao, Jiangjie Chen

    Abstract: As global tourism expands and artificial intelligence technology advances, intelligent travel planning services have emerged as a significant research focus. Within dynamic real-world travel scenarios with multi-dimensional constraints, services that support users in automatically creating practical and customized travel itineraries must address three key objectives: Rationality, Comprehensiveness… ▽ More

    Submitted 12 September, 2024; originally announced September 2024.

  23. CooTest: An Automated Testing Approach for V2X Communication Systems

    Authors: An Guo, Xinyu Gao, Zhenyu Chen, Yuan Xiao, Jiakai Liu, Xiuting Ge, Weisong Sun, Chunrong Fang

    Abstract: Perceiving the complex driving environment precisely is crucial to the safe operation of autonomous vehicles. With the tremendous advancement of deep learning and communication technology, Vehicle-to-Everything (V2X) collaboration has the potential to address limitations in sensing distant objects and occlusion for a single-agent perception system. However, despite spectacular progress, several co… ▽ More

    Submitted 29 August, 2024; originally announced August 2024.

    Journal ref: Proceedings of the 33rd ACM SIGSOFT International Symposium on Software Testing and Analysis (ISSTA '24), September 16--20, 2024, Vienna, Austria

  24. arXiv:2408.04283  [pdf, other

    eess.SP cs.LG

    Prompt-Assisted Semantic Interference Cancellation on Moderate Interference Channels

    Authors: Zian Meng, Qiang Li, Ashish Pandharipande, Xiaohu Ge

    Abstract: The performance of conventional interference management strategies degrades when interference power is comparable to signal power. We consider a new perspective on interference management using semantic communication. Specifically, a multi-user semantic communication system is considered on moderate interference channels (ICs), for which a novel framework of deep learning-based prompt-assisted sem… ▽ More

    Submitted 8 August, 2024; originally announced August 2024.

    Comments: 5 pages, 5 figures

  25. Towards End-to-End Explainable Facial Action Unit Recognition via Vision-Language Joint Learning

    Authors: Xuri Ge, Junchen Fu, Fuhai Chen, Shan An, Nicu Sebe, Joemon M. Jose

    Abstract: Facial action units (AUs), as defined in the Facial Action Coding System (FACS), have received significant research interest owing to their diverse range of applications in facial state analysis. Current mainstream FAU recognition models have a notable limitation, i.e., focusing only on the accuracy of AU recognition and overlooking explanations of corresponding AU states. In this paper, we propos… ▽ More

    Submitted 1 August, 2024; originally announced August 2024.

    Comments: 10 pages, 5 figures, 4 tables

    Journal ref: ACM Multimedia 2024

  26. arXiv:2407.00719  [pdf

    cs.CR cs.DC cs.LG

    A Whole-Process Certifiably Robust Aggregation Method Against Backdoor Attacks in Federated Learning

    Authors: Anqi Zhou, Yezheng Liu, Yidong Chai, Hongyi Zhu, Xinyue Ge, Yuanchun Jiang, Meng Wang

    Abstract: Federated Learning (FL) has garnered widespread adoption across various domains such as finance, healthcare, and cybersecurity. Nonetheless, FL remains under significant threat from backdoor attacks, wherein malicious actors insert triggers into trained models, enabling them to perform certain tasks while still meeting FL's primary objectives. In response, robust aggregation methods have been prop… ▽ More

    Submitted 30 June, 2024; originally announced July 2024.

    Comments: 14 pages

  27. arXiv:2406.18579  [pdf, other

    cs.CV cs.IR

    Hire: Hybrid-modal Interaction with Multiple Relational Enhancements for Image-Text Matching

    Authors: Xuri Ge, Fuhai Chen, Songpei Xu, Fuxiang Tao, Jie Wang, Joemon M. Jose

    Abstract: Image-text matching (ITM) is a fundamental problem in computer vision. The key issue lies in jointly learning the visual and textual representation to estimate their similarity accurately. Most existing methods focus on feature enhancement within modality or feature interaction across modalities, which, however, neglects the contextual information of the object representation based on the inter-ob… ▽ More

    Submitted 5 June, 2024; originally announced June 2024.

    Comments: 22pages, 5 Figures, 6 tables, the extension of CMSEI in WACV23, and submitted to ACM TIST. arXiv admin note: text overlap with arXiv:2210.08908

  28. arXiv:2406.04496  [pdf, other

    cs.CL cs.AI cs.LG

    Time Sensitive Knowledge Editing through Efficient Finetuning

    Authors: Xiou Ge, Ali Mousavi, Edouard Grave, Armand Joulin, Kun Qian, Benjamin Han, Mostafa Arefiyan, Yunyao Li

    Abstract: Large Language Models (LLMs) have demonstrated impressive capability in different tasks and are bringing transformative changes to many domains. However, keeping the knowledge in LLMs up-to-date remains a challenge once pretraining is complete. It is thus essential to design effective methods to both update obsolete knowledge and induce new knowledge into LLMs. Existing locate-and-edit knowledge e… ▽ More

    Submitted 22 July, 2024; v1 submitted 6 June, 2024; originally announced June 2024.

    Comments: ACL 2024 main

  29. arXiv:2405.16701  [pdf, other

    cs.CV

    Detail-Enhanced Intra- and Inter-modal Interaction for Audio-Visual Emotion Recognition

    Authors: Tong Shi, Xuri Ge, Joemon M. Jose, Nicolas Pugeault, Paul Henderson

    Abstract: Capturing complex temporal relationships between video and audio modalities is vital for Audio-Visual Emotion Recognition (AVER). However, existing methods lack attention to local details, such as facial state changes between video frames, which can reduce the discriminability of features and thus lower recognition accuracy. In this paper, we propose a Detail-Enhanced Intra- and Inter-modal Intera… ▽ More

    Submitted 26 May, 2024; originally announced May 2024.

    Comments: Submitted to 27th International Conference of Pattern Recognition (ICPR 2024)

  30. arXiv:2405.14293  [pdf, other

    cs.GT

    Sybil-Proof Mechanism for Information Propagation with Budgets

    Authors: Junjie Zheng, Xu Ge, Bin Li, Dengji Zhao

    Abstract: This paper examines the problem of distributing rewards on social networks to improve the efficiency of crowdsourcing tasks for sponsors. To complete the tasks efficiently, we aim to design reward mechanisms that incentivize early-joining agents to invite more participants to the tasks. Nonetheless, participants could potentially engage in strategic behaviors, e.g., not inviting others to the task… ▽ More

    Submitted 23 May, 2024; originally announced May 2024.

  31. arXiv:2405.13868  [pdf, other

    cs.LG cs.CL

    Automatically Identifying Local and Global Circuits with Linear Computation Graphs

    Authors: Xuyang Ge, Fukang Zhu, Wentao Shu, Junxuan Wang, Zhengfu He, Xipeng Qiu

    Abstract: Circuit analysis of any certain model behavior is a central task in mechanistic interpretability. We introduce our circuit discovery pipeline with Sparse Autoencoders (SAEs) and a variant called Transcoders. With these two modules inserted into the model, the model's computation graph with respect to OV and MLP circuits becomes strictly linear. Our methods do not require linear approximation to co… ▽ More

    Submitted 21 July, 2024; v1 submitted 22 May, 2024; originally announced May 2024.

  32. arXiv:2405.07472  [pdf, other

    cs.CV

    GaussianVTON: 3D Human Virtual Try-ON via Multi-Stage Gaussian Splatting Editing with Image Prompting

    Authors: Haodong Chen, Yongle Huang, Haojian Huang, Xiangsheng Ge, Dian Shao

    Abstract: The increasing prominence of e-commerce has underscored the importance of Virtual Try-On (VTON). However, previous studies predominantly focus on the 2D realm and rely heavily on extensive data for training. Research on 3D VTON primarily centers on garment-body shape compatibility, a topic extensively covered in 2D VTON. Thanks to advances in 3D scene editing, a 2D diffusion model has now been ada… ▽ More

    Submitted 23 May, 2024; v1 submitted 13 May, 2024; originally announced May 2024.

    Comments: On-going work

  33. arXiv:2404.18362  [pdf, other

    eess.SY cs.LG

    Physics-informed Convolutional Neural Network for Microgrid Economic Dispatch

    Authors: Xiaoyu Ge, Javad Khazaei

    Abstract: The variability of renewable energy generation and the unpredictability of electricity demand create a need for real-time economic dispatch (ED) of assets in microgrids. However, solving numerical optimization problems in real-time can be incredibly challenging. This study proposes using a convolutional neural network (CNN) based on deep learning to address these challenges. Compared to traditiona… ▽ More

    Submitted 1 May, 2024; v1 submitted 28 April, 2024; originally announced April 2024.

  34. 3SHNet: Boosting Image-Sentence Retrieval via Visual Semantic-Spatial Self-Highlighting

    Authors: Xuri Ge, Songpei Xu, Fuhai Chen, Jie Wang, Guoxin Wang, Shan An, Joemon M. Jose

    Abstract: In this paper, we propose a novel visual Semantic-Spatial Self-Highlighting Network (termed 3SHNet) for high-precision, high-efficiency and high-generalization image-sentence retrieval. 3SHNet highlights the salient identification of prominent objects and their spatial locations within the visual modality, thus allowing the integration of visual semantics-spatial interactions and maintaining indep… ▽ More

    Submitted 26 April, 2024; originally announced April 2024.

    Comments: Accepted Information Processing and Management (IP&M), 10 pages, 9 figures and 8 tables

    Journal ref: Information Processing & Management, Volume 61, Issue 4, July 2024, 103716

  35. arXiv:2404.06364  [pdf, other

    cs.CL

    SurveyAgent: A Conversational System for Personalized and Efficient Research Survey

    Authors: Xintao Wang, Jiangjie Chen, Nianqi Li, Lida Chen, Xinfeng Yuan, Wei Shi, Xuyang Ge, Rui Xu, Yanghua Xiao

    Abstract: In the rapidly advancing research fields such as AI, managing and staying abreast of the latest scientific literature has become a significant challenge for researchers. Although previous efforts have leveraged AI to assist with literature searches, paper recommendations, and question-answering, a comprehensive support system that addresses the holistic needs of researchers has been lacking. This… ▽ More

    Submitted 9 April, 2024; originally announced April 2024.

    Comments: 6 pages

  36. arXiv:2404.04848  [pdf, other

    eess.IV cs.AI cs.CV

    Task-Aware Encoder Control for Deep Video Compression

    Authors: Xingtong Ge, Jixiang Luo, Xinjie Zhang, Tongda Xu, Guo Lu, Dailan He, Jing Geng, Yan Wang, Jun Zhang, Hongwei Qin

    Abstract: Prior research on deep video compression (DVC) for machine tasks typically necessitates training a unique codec for each specific task, mandating a dedicated decoder per task. In contrast, traditional video codecs employ a flexible encoder controller, enabling the adaptation of a single codec to different tasks through mechanisms like mode prediction. Drawing inspiration from this, we introduce an… ▽ More

    Submitted 20 April, 2024; v1 submitted 7 April, 2024; originally announced April 2024.

    Comments: Accepted by CVPR 2024

  37. IISAN: Efficiently Adapting Multimodal Representation for Sequential Recommendation with Decoupled PEFT

    Authors: Junchen Fu, Xuri Ge, Xin Xin, Alexandros Karatzoglou, Ioannis Arapakis, Jie Wang, Joemon M. Jose

    Abstract: Multimodal foundation models are transformative in sequential recommender systems, leveraging powerful representation learning capabilities. While Parameter-efficient Fine-tuning (PEFT) is commonly used to adapt foundation models for recommendation tasks, most research prioritizes parameter efficiency, often overlooking critical factors like GPU memory efficiency and training speed. Addressing thi… ▽ More

    Submitted 21 July, 2024; v1 submitted 2 April, 2024; originally announced April 2024.

    Comments: Accepted by SIGIR2024

  38. arXiv:2403.08551  [pdf, other

    eess.IV cs.AI cs.CV cs.MM

    GaussianImage: 1000 FPS Image Representation and Compression by 2D Gaussian Splatting

    Authors: Xinjie Zhang, Xingtong Ge, Tongda Xu, Dailan He, Yan Wang, Hongwei Qin, Guo Lu, Jing Geng, Jun Zhang

    Abstract: Implicit neural representations (INRs) recently achieved great success in image representation and compression, offering high visual quality and fast rendering speeds with 10-1000 FPS, assuming sufficient GPU resources are available. However, this requirement often hinders their use on low-end devices with limited memory. In response, we propose a groundbreaking paradigm of image representation an… ▽ More

    Submitted 9 July, 2024; v1 submitted 13 March, 2024; originally announced March 2024.

    Comments: Accepted by ECCV 2024. Project Page:https://xingtongge.github.io/GaussianImage-page/ Code: https://github.com/Xinjie-Q/GaussianImage

  39. arXiv:2403.08505  [pdf, other

    eess.IV cs.AI cs.CV cs.MM

    CAMSIC: Content-aware Masked Image Modeling Transformer for Stereo Image Compression

    Authors: Xinjie Zhang, Shenyuan Gao, Zhening Liu, Jiawei Shao, Xingtong Ge, Dailan He, Tongda Xu, Yan Wang, Jun Zhang

    Abstract: Existing learning-based stereo image codec adopt sophisticated transformation with simple entropy models derived from single image codecs to encode latent representations. However, those entropy models struggle to effectively capture the spatial-disparity characteristics inherent in stereo images, which leads to suboptimal rate-distortion results. In this paper, we propose a stereo image compressi… ▽ More

    Submitted 8 February, 2025; v1 submitted 13 March, 2024; originally announced March 2024.

    Comments: Accepted by AAAI 2025

  40. AI as a Child of Mother Earth: Regrounding Human-AI Interaction in Ecological Thinking

    Authors: Chunchen Xu, Xiao Ge

    Abstract: The anthropocentric cultural idea that humans are active agents exerting control over their environments has been largely normalized and inscribed in practices, policies, and products of contemporary industrialized societies. This view underlies a human-ecology relationship based on resource and knowledge extraction. To create a more sustainable and equitable future, it is essential to consider al… ▽ More

    Submitted 11 March, 2024; originally announced March 2024.

    Comments: To appear in alt.chi in CHI24

  41. arXiv:2403.06734  [pdf, other

    cs.AI cs.CL cs.CV

    Real-Time Multimodal Cognitive Assistant for Emergency Medical Services

    Authors: Keshara Weerasinghe, Saahith Janapati, Xueren Ge, Sion Kim, Sneha Iyer, John A. Stankovic, Homa Alemzadeh

    Abstract: Emergency Medical Services (EMS) responders often operate under time-sensitive conditions, facing cognitive overload and inherent risks, requiring essential skills in critical thinking and rapid decision-making. This paper presents CognitiveEMS, an end-to-end wearable cognitive assistant system that can act as a collaborative virtual partner engaging in the real-time acquisition and analysis of mu… ▽ More

    Submitted 11 March, 2024; originally announced March 2024.

    Comments: This work has been submitted to the IEEE for possible publication

  42. How Culture Shapes What People Want From AI

    Authors: Xiao Ge, Chunchen Xu, Daigo Misaki, Hazel Rose Markus, Jeanne L Tsai

    Abstract: There is an urgent need to incorporate the perspectives of culturally diverse groups into AI developments. We present a novel conceptual framework for research that aims to expand, reimagine, and reground mainstream visions of AI using independent and interdependent cultural models of the self and the environment. Two survey studies support this framework and provide preliminary evidence that peop… ▽ More

    Submitted 8 March, 2024; originally announced March 2024.

    Comments: To appear at CHI 2024

  43. arXiv:2403.02716  [pdf, other

    cs.SE

    Pre-trained Model-based Actionable Warning Identification: A Feasibility Study

    Authors: Xiuting Ge, Chunrong Fang, Quanjun Zhang, Daoyuan Wu, Bowen Yu, Qirui Zheng, An Guo, Shangwei Lin, Zhihong Zhao, Yang Liu, Zhenyu Chen

    Abstract: Actionable Warning Identification (AWI) plays a pivotal role in improving the usability of static code analyzers. Currently, Machine Learning (ML)-based AWI approaches, which mainly learn an AWI classifier from labeled warnings, are notably common. However, these approaches still face the problem of restricted performance due to the direct reliance on a limited number of labeled warnings to develo… ▽ More

    Submitted 5 March, 2024; originally announced March 2024.

  44. arXiv:2402.18152  [pdf, other

    eess.IV cs.AI cs.CV

    Boosting Neural Representations for Videos with a Conditional Decoder

    Authors: Xinjie Zhang, Ren Yang, Dailan He, Xingtong Ge, Tongda Xu, Yan Wang, Hongwei Qin, Jun Zhang

    Abstract: Implicit neural representations (INRs) have emerged as a promising approach for video storage and processing, showing remarkable versatility across various video tasks. However, existing methods often fail to fully leverage their representation capabilities, primarily due to inadequate alignment of intermediate features during target frame decoding. This paper introduces a universal boosting frame… ▽ More

    Submitted 16 March, 2024; v1 submitted 28 February, 2024; originally announced February 2024.

    Comments: Accept by CVPR 2024

  45. arXiv:2402.15276  [pdf, other

    cs.IR cs.AI cs.CV

    CFIR: Fast and Effective Long-Text To Image Retrieval for Large Corpora

    Authors: Zijun Long, Xuri Ge, Richard Mccreadie, Joemon Jose

    Abstract: Text-to-image retrieval aims to find the relevant images based on a text query, which is important in various use-cases, such as digital libraries, e-commerce, and multimedia databases. Although Multimodal Large Language Models (MLLMs) demonstrate state-of-the-art performance, they exhibit limitations in handling large-scale, diverse, and ambiguous real-world needs of retrieval, due to the computa… ▽ More

    Submitted 2 April, 2024; v1 submitted 23 February, 2024; originally announced February 2024.

  46. arXiv:2402.12201  [pdf, other

    cs.LG

    Dictionary Learning Improves Patch-Free Circuit Discovery in Mechanistic Interpretability: A Case Study on Othello-GPT

    Authors: Zhengfu He, Xuyang Ge, Qiong Tang, Tianxiang Sun, Qinyuan Cheng, Xipeng Qiu

    Abstract: Sparse dictionary learning has been a rapidly growing technique in mechanistic interpretability to attack superposition and extract more human-understandable features from model activations. We ask a further question based on the extracted more monosemantic features: How do we recognize circuits connecting the enormous amount of dictionary features? We propose a circuit discovery framework alterna… ▽ More

    Submitted 19 February, 2024; originally announced February 2024.

    Comments: 24 pages, 13 figures. Not final version. Better dictionary training in progress

  47. arXiv:2401.01007  [pdf, other

    cs.NI cs.AI cs.DC

    Towards Net-Zero Carbon Emissions in Network AI for 6G and Beyond

    Authors: Peng Zhang, Yong Xiao, Yingyu Li, Xiaohu Ge, Guangming Shi, Yang Yang

    Abstract: A global effort has been initiated to reduce the worldwide greenhouse gas (GHG) emissions, primarily carbon emissions, by half by 2030 and reach net-zero by 2050. The development of 6G must also be compliant with this goal. Unfortunately, developing a sustainable and net-zero emission systems to meet the users' fast growing demands on mobile services, especially smart services and applications, ma… ▽ More

    Submitted 18 September, 2023; originally announced January 2024.

    Journal ref: published as Early Access at the IEEE Communications Magazine, 2023 (URL: https://ieeexplore.ieee.org/abstract/document/10247147)

  48. arXiv:2312.01751  [pdf, other

    cs.DC

    Joint Task Partitioning and Parallel Scheduling in Device-Assisted Mobile Edge Networks

    Authors: Yang Li, Xinlei Ge, Bo Lei, Xing Zhang, Wenbo Wang

    Abstract: With the development of the Internet of Things (IoT), certain IoT devices have the capability to not only accomplish their own tasks but also simultaneously assist other resource-constrained devices. Therefore, this paper considers a device-assisted mobile edge computing system that leverages auxiliary IoT devices to alleviate the computational burden on the edge computing server and enhance the o… ▽ More

    Submitted 4 December, 2023; originally announced December 2023.

    Comments: Accepted to IEEE Internet of Things Journal

  49. arXiv:2312.00324  [pdf, other

    cs.SE

    Machine Learning for Actionable Warning Identification: A Comprehensive Survey

    Authors: Xiuting Ge, Chunrong Fang, Xuanye Li, Weisong Sun, Daoyuan Wu, Juan Zhai, Shangwei Lin, Zhihong Zhao, Yang Liu, Zhenyu Chen

    Abstract: Actionable Warning Identification (AWI) plays a crucial role in improving the usability of static code analyzers. With recent advances in Machine Learning (ML), various approaches have been proposed to incorporate ML techniques into AWI. These ML-based AWI approaches, benefiting from ML's strong ability to learn subtle and previously unseen patterns from historical data, have demonstrated superior… ▽ More

    Submitted 6 October, 2024; v1 submitted 30 November, 2023; originally announced December 2023.

    Comments: Accepted by CSUR

  50. arXiv:2311.07107  [pdf, other

    cs.SE

    A Survey of Source Code Search: A 3-Dimensional Perspective

    Authors: Weisong Sun, Chunrong Fang, Yifei Ge, Yuling Hu, Yuchen Chen, Quanjun Zhang, Xiuting Ge, Yang Liu, Zhenyu Chen

    Abstract: (Source) code search is widely concerned by software engineering researchers because it can improve the productivity and quality of software development. Given a functionality requirement usually described in a natural language sentence, a code search system can retrieve code snippets that satisfy the requirement from a large-scale code corpus, e.g., GitHub. To realize effective and efficient code… ▽ More

    Submitted 13 November, 2023; originally announced November 2023.

    Comments: submitted to ACM Transactions on Software Engineering and Methodology

    MSC Class: 68-04 ACM Class: D.2.3