Skip to main content

Showing 1–50 of 284 results for author: Lu, T

Searching in archive cs. Search in all archives.
.
  1. arXiv:2505.07654  [pdf, ps, other

    eess.IV cs.CV

    Breast Cancer Classification in Deep Ultraviolet Fluorescence Images Using a Patch-Level Vision Transformer Framework

    Authors: Pouya Afshin, David Helminiak, Tongtong Lu, Tina Yen, Julie M. Jorns, Mollie Patton, Bing Yu, Dong Hye Ye

    Abstract: Breast-conserving surgery (BCS) aims to completely remove malignant lesions while maximizing healthy tissue preservation. Intraoperative margin assessment is essential to achieve a balance between thorough cancer resection and tissue conservation. A deep ultraviolet fluorescence scanning microscope (DUV-FSM) enables rapid acquisition of whole surface images (WSIs) for excised tissue, providing con… ▽ More

    Submitted 12 May, 2025; originally announced May 2025.

  2. arXiv:2504.19373  [pdf, other

    cs.CR cs.AI

    Doxing via the Lens: Revealing Privacy Leakage in Image Geolocation for Agentic Multi-Modal Large Reasoning Model

    Authors: Weidi Luo, Qiming Zhang, Tianyu Lu, Xiaogeng Liu, Yue Zhao, Zhen Xiang, Chaowei Xiao

    Abstract: The increasing capabilities of agentic multi-modal large reasoning models, such as ChatGPT o3, have raised critical concerns regarding privacy leakage through inadvertent image geolocation. In this paper, we conduct the first systematic and controlled study on the potential privacy risks associated with visual reasoning abilities of ChatGPT o3. We manually collect and construct a dataset comprisin… ▽ More

    Submitted 29 April, 2025; v1 submitted 27 April, 2025; originally announced April 2025.

  3. arXiv:2504.15271  [pdf, other

    cs.CV

    Eagle 2.5: Boosting Long-Context Post-Training for Frontier Vision-Language Models

    Authors: Guo Chen, Zhiqi Li, Shihao Wang, Jindong Jiang, Yicheng Liu, Lidong Lu, De-An Huang, Wonmin Byeon, Matthieu Le, Tuomas Rintamaki, Tyler Poon, Max Ehrlich, Tuomas Rintamaki, Tyler Poon, Tong Lu, Limin Wang, Bryan Catanzaro, Jan Kautz, Andrew Tao, Zhiding Yu, Guilin Liu

    Abstract: We introduce Eagle 2.5, a family of frontier vision-language models (VLMs) for long-context multimodal learning. Our work addresses the challenges in long video comprehension and high-resolution image understanding, introducing a generalist framework for both tasks. The proposed training framework incorporates Automatic Degrade Sampling and Image Area Preservation, two techniques that preserve con… ▽ More

    Submitted 21 April, 2025; originally announced April 2025.

  4. arXiv:2504.15134  [pdf, other

    cs.CV

    Instance-Adaptive Keypoint Learning with Local-to-Global Geometric Aggregation for Category-Level Object Pose Estimation

    Authors: Xiao Zhang, Lu Zou, Tao Lu, Yuan Yao, Zhangjin Huang, Guoping Wang

    Abstract: Category-level object pose estimation aims to predict the 6D pose and size of previously unseen instances from predefined categories, requiring strong generalization across diverse object instances. Although many previous methods attempt to mitigate intra-class variations, they often struggle with instances exhibiting complex geometries or significant deviations from canonical shapes. To address t… ▽ More

    Submitted 26 April, 2025; v1 submitted 21 April, 2025; originally announced April 2025.

  5. arXiv:2504.14208  [pdf, other

    cs.IR

    FedCIA: Federated Collaborative Information Aggregation for Privacy-Preserving Recommendation

    Authors: Mingzhe Han, Dongsheng Li, Jiafeng Xia, Jiahao Liu, Hansu Gu, Peng Zhang, Ning Gu, Tun Lu

    Abstract: Recommendation algorithms rely on user historical interactions to deliver personalized suggestions, which raises significant privacy concerns. Federated recommendation algorithms tackle this issue by combining local model training with server-side model aggregation, where most existing algorithms use a uniform weighted summation to aggregate item embeddings from different client models. This appro… ▽ More

    Submitted 19 April, 2025; originally announced April 2025.

  6. Active Learning of Symbolic NetKAT Automata

    Authors: Mark Moeller, Tiago Ferreira, Thomas Lu, Nate Foster, Alexandra Silva

    Abstract: NetKAT is a domain-specific programming language and logic that has been successfully used to specify and verify the behavior of packet-switched networks. This paper develops techniques for automatically learning NetKAT models of unknown networks using active learning. Prior work has explored active learning for a wide range of automata (e.g., deterministic, register, Büchi, timed etc.) and also d… ▽ More

    Submitted 18 April, 2025; originally announced April 2025.

    Comments: Appearing in PLDI 2025

  7. arXiv:2504.10918  [pdf, other

    cs.HC

    Adaptive Human-Agent Teaming: A Review of Empirical Studies from the Process Dynamics Perspective

    Authors: Mengyao Wang, Jiayun Wu, Shuai Ma, Nuo Li, Peng Zhang, Ning Gu, Tun Lu

    Abstract: The rapid advancement of AI, including Large Language Models, has propelled autonomous agents forward, accelerating the human-agent teaming (HAT) paradigm to leverage complementary strengths. However, HAT research remains fragmented, often focusing on isolated team development phases or specific challenges like trust calibration while overlooking the real-world need for adaptability. Addressing th… ▽ More

    Submitted 15 April, 2025; originally announced April 2025.

  8. arXiv:2504.10479  [pdf, other

    cs.CV

    InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models

    Authors: Jinguo Zhu, Weiyun Wang, Zhe Chen, Zhaoyang Liu, Shenglong Ye, Lixin Gu, Hao Tian, Yuchen Duan, Weijie Su, Jie Shao, Zhangwei Gao, Erfei Cui, Xuehui Wang, Yue Cao, Yangzhou Liu, Xingguang Wei, Hongjie Zhang, Haomin Wang, Weiye Xu, Hao Li, Jiahao Wang, Nianchen Deng, Songze Li, Yinan He, Tan Jiang , et al. (26 additional authors not shown)

    Abstract: We introduce InternVL3, a significant advancement in the InternVL series featuring a native multimodal pre-training paradigm. Rather than adapting a text-only large language model (LLM) into a multimodal large language model (MLLM) that supports visual inputs, InternVL3 jointly acquires multimodal and linguistic capabilities from both diverse multimodal data and pure-text corpora during a single p… ▽ More

    Submitted 18 April, 2025; v1 submitted 14 April, 2025; originally announced April 2025.

    Comments: Technical Report

  9. arXiv:2504.10466  [pdf, other

    cs.CV

    Art3D: Training-Free 3D Generation from Flat-Colored Illustration

    Authors: Xiaoyan Cong, Jiayi Shen, Zekun Li, Rao Fu, Tao Lu, Srinath Sridhar

    Abstract: Large-scale pre-trained image-to-3D generative models have exhibited remarkable capabilities in diverse shape generations. However, most of them struggle to synthesize plausible 3D assets when the reference image is flat-colored like hand drawings due to the lack of 3D illusion, which are often the most user-friendly input modalities in art content creation. To this end, we propose Art3D, a traini… ▽ More

    Submitted 14 April, 2025; originally announced April 2025.

    Comments: Technical Report. Course Project of Brown CSCI 1430 Computer Vision. Project Page: https://joy-jy11.github.io/

  10. arXiv:2504.07433  [pdf, other

    cs.CL

    From Token to Line: Enhancing Code Generation with a Long-Term Perspective

    Authors: Tingwei Lu, Yangning Li, Liyuan Wang, Binghuai Lin, Jiwei Tang, Wanshi Xu, Hai-Tao Zheng, Yinghui Li, Bingxu An, Zhao Wei, Yong Xu

    Abstract: The emergence of large language models (LLMs) has significantly promoted the development of code generation task, sparking a surge in pertinent literature. Current research is hindered by redundant generation results and a tendency to overfit local patterns in the short term. Although existing studies attempt to alleviate the issue by adopting a multi-token prediction strategy, there remains limit… ▽ More

    Submitted 18 April, 2025; v1 submitted 10 April, 2025; originally announced April 2025.

  11. arXiv:2504.05755  [pdf, other

    cs.HC cs.AI econ.GN

    Unraveling Human-AI Teaming: A Review and Outlook

    Authors: Bowen Lou, Tian Lu, T. S. Raghu, Yingjie Zhang

    Abstract: Artificial Intelligence (AI) is advancing at an unprecedented pace, with clear potential to enhance decision-making and productivity. Yet, the collaborative decision-making process between humans and AI remains underdeveloped, often falling short of its transformative possibilities. This paper explores the evolution of AI agents from passive tools to active collaborators in human-AI teams, emphasi… ▽ More

    Submitted 9 April, 2025; v1 submitted 8 April, 2025; originally announced April 2025.

  12. arXiv:2504.02698  [pdf, other

    cs.LG cs.AI q-bio.QM

    SCMPPI: Supervised Contrastive Multimodal Framework for Predicting Protein-Protein Interactions

    Authors: Shengrui XU, Tianchi Lu, Zikun Wang, Jixiu Zhai

    Abstract: Protein-protein interaction (PPI) prediction plays a pivotal role in deciphering cellular functions and disease mechanisms. To address the limitations of traditional experimental methods and existing computational approaches in cross-modal feature fusion and false-negative suppression, we propose SCMPPI-a novel supervised contrastive multimodal framework. By effectively integrating sequence-based… ▽ More

    Submitted 27 April, 2025; v1 submitted 3 April, 2025; originally announced April 2025.

    Comments: 20 pages,9 figures,conference

    MSC Class: 92C40; 68T07 ACM Class: I.2.6; J.3

  13. arXiv:2503.22231  [pdf, other

    cs.CV

    CoGen: 3D Consistent Video Generation via Adaptive Conditioning for Autonomous Driving

    Authors: Yishen Ji, Ziyue Zhu, Zhenxin Zhu, Kaixin Xiong, Ming Lu, Zhiqi Li, Lijun Zhou, Haiyang Sun, Bing Wang, Tong Lu

    Abstract: Recent progress in driving video generation has shown significant potential for enhancing self-driving systems by providing scalable and controllable training data. Although pretrained state-of-the-art generation models, guided by 2D layout conditions (e.g., HD maps and bounding boxes), can produce photorealistic driving videos, achieving controllable multi-view videos with high 3D consistency rem… ▽ More

    Submitted 5 April, 2025; v1 submitted 28 March, 2025; originally announced March 2025.

  14. arXiv:2503.21841  [pdf

    cs.CV

    HyperFree: A Channel-adaptive and Tuning-free Foundation Model for Hyperspectral Remote Sensing Imagery

    Authors: Jingtao Li, Yingyi Liu, Xinyu Wang, Yunning Peng, Chen Sun, Shaoyu Wang, Zhendong Sun, Tian Ke, Xiao Jiang, Tangwei Lu, Anran Zhao, Yanfei Zhong

    Abstract: Advanced interpretation of hyperspectral remote sensing images benefits many precise Earth observation tasks. Recently, visual foundation models have promoted the remote sensing interpretation but concentrating on RGB and multispectral images. Due to the varied hyperspectral channels,existing foundation models would face image-by-image tuning situation, imposing great pressure on hardware and time… ▽ More

    Submitted 27 March, 2025; originally announced March 2025.

    Comments: Accepted by CVPR2025

  15. arXiv:2503.18962  [pdf, other

    cs.SI cs.LG

    Representative Ranking for Deliberation in the Public Sphere

    Authors: Manon Revel, Smitha Milli, Tyler Lu, Jamelle Watson-Daniels, Max Nickel

    Abstract: Online comment sections, such as those on news sites or social media, have the potential to foster informal public deliberation, However, this potential is often undermined by the frequency of toxic or low-quality exchanges that occur in these settings. To combat this, platforms increasingly leverage algorithmic ranking to facilitate higher-quality discussions, e.g., by using civility classifiers… ▽ More

    Submitted 19 March, 2025; originally announced March 2025.

  16. arXiv:2503.17938  [pdf, other

    cs.CV

    Selecting and Pruning: A Differentiable Causal Sequentialized State-Space Model for Two-View Correspondence Learning

    Authors: Xiang Fang, Shihua Zhang, Hao Zhang, Tao Lu, Huabing Zhou, Jiayi Ma

    Abstract: Two-view correspondence learning aims to discern true and false correspondences between image pairs by recognizing their underlying different information. Previous methods either treat the information equally or require the explicit storage of the entire context, tending to be laborious in real-world scenarios. Inspired by Mamba's inherent selectivity, we propose \textbf{CorrMamba}, a \textbf{Corr… ▽ More

    Submitted 23 March, 2025; originally announced March 2025.

  17. arXiv:2503.03511  [pdf, other

    cs.RO cs.AI

    NeuGrasp: Generalizable Neural Surface Reconstruction with Background Priors for Material-Agnostic Object Grasp Detection

    Authors: Qingyu Fan, Yinghao Cai, Chao Li, Wenzhe He, Xudong Zheng, Tao Lu, Bin Liang, Shuo Wang

    Abstract: Robotic grasping in scenes with transparent and specular objects presents great challenges for methods relying on accurate depth information. In this paper, we introduce NeuGrasp, a neural surface reconstruction method that leverages background priors for material-agnostic grasp detection. NeuGrasp integrates transformers and global prior volumes to aggregate multi-view features with spatial encod… ▽ More

    Submitted 5 March, 2025; originally announced March 2025.

    Comments: 7 pages, 5 figures. IEEE International Conference on Robotics and Automation (ICRA) 2025

    ACM Class: I.2.9; I.2.10

  18. arXiv:2503.02151  [pdf, other

    cs.HC

    YouthCare: Building a Personalized Collaborative Video Censorship Tool to Support Parent-Child Joint Media Engagement

    Authors: Wenxin Zhao, Fangyu Yu, Peng Zhang, Hansu Gu, Lin Wang, Siyuan Qiao, Tun Lu, Ning Gu

    Abstract: To mitigate the negative impacts of online videos on teenagers, existing research and platforms have implemented various parental mediation mechanisms, such as Parent-Child Joint Media Engagement (JME). However, JME generally relies heavily on parents' time, knowledge, and experience. To fill this gap, we aim to design an automatic tool to help parents/children censor videos more effectively and e… ▽ More

    Submitted 3 March, 2025; originally announced March 2025.

  19. arXiv:2503.01358  [pdf, other

    cs.HC

    RemiHaven: Integrating "In-Town" and "Out-of-Town" Peers to Provide Personalized Reminiscence Support for Older Drifters

    Authors: Xuechen Zhang, Changyang He, Peng Zhang, Hansu Gu, Ning Gu, Qi Shen, Zhan Hu, Tun Lu

    Abstract: With increasing social mobility and an aging society, more older adults in China are migrating to new cities, known as "older drifters." Due to fewer social connections and cultural adaptation challenges, they face negative emotions such as loneliness and depression. While reminiscence-based interventions have been used to improve older adults' psychological well-being, challenges such as the lack… ▽ More

    Submitted 3 March, 2025; originally announced March 2025.

  20. arXiv:2502.19893  [pdf, other

    math.NA cs.LG

    A Multiple Transferable Neural Network Method with Domain Decomposition for Elliptic Interface Problems

    Authors: Tianzheng Lu, Lili Ju, Liyong Zhu

    Abstract: The transferable neural network (TransNet) is a two-layer shallow neural network with pre-determined and uniformly distributed neurons in the hidden layer, and the least-squares solvers can be particularly used to compute the parameters of its output layer when applied to the solution of partial differential equations. In this paper, we integrate the TransNet technique with the nonoverlapping doma… ▽ More

    Submitted 27 February, 2025; originally announced February 2025.

  21. arXiv:2502.15610  [pdf, other

    cs.LG cs.AI

    A general language model for peptide identification

    Authors: Jixiu Zhai, Tianchi Lu, Haitian Zhong, Ziyang Xu, Yuhuan Liu, Shengrui Xu, Jingwan Wang, Dan Huang

    Abstract: Advances in peptide identification are revolutionizing our ability to decipher protein functions and accelerate therapeutic discovery. We present PDeepPP, a deep learning framework that integrates pretrained protein language models with parallel transformer-CNN architectures, achieving state-of-the-art performance in peptide characterization tasks. The model's hybrid architecture demonstrates uniq… ▽ More

    Submitted 17 April, 2025; v1 submitted 21 February, 2025; originally announced February 2025.

    Comments: 21 pages, 9 figures, 4 tables, submitted to arXiv

    MSC Class: 92C40; 68T07 ACM Class: I.2.6; J.3

  22. arXiv:2502.13845  [pdf, other

    cs.IR cs.AI

    Improving LLM-powered Recommendations with Personalized Information

    Authors: Jiahao Liu, Xueshuo Yan, Dongsheng Li, Guangping Zhang, Hansu Gu, Peng Zhang, Tun Lu, Li Shang, Ning Gu

    Abstract: Due to the lack of explicit reasoning modeling, existing LLM-powered recommendations fail to leverage LLMs' reasoning capabilities effectively. In this paper, we propose a pipeline called CoT-Rec, which integrates two key Chain-of-Thought (CoT) processes -- user preference analysis and item perception analysis -- into LLM-powered recommendations, thereby enhancing the utilization of LLMs' reasonin… ▽ More

    Submitted 18 April, 2025; v1 submitted 19 February, 2025; originally announced February 2025.

    Comments: Accepted by SIGIR 2025, 7 pages

  23. arXiv:2502.13843  [pdf, other

    cs.IR cs.AI

    AgentCF++: Memory-enhanced LLM-based Agents for Popularity-aware Cross-domain Recommendations

    Authors: Jiahao Liu, Shengkang Gu, Dongsheng Li, Guangping Zhang, Mingzhe Han, Hansu Gu, Peng Zhang, Tun Lu, Li Shang, Ning Gu

    Abstract: LLM-based user agents, which simulate user interaction behavior, are emerging as a promising approach to enhancing recommender systems. In real-world scenarios, users' interactions often exhibit cross-domain characteristics and are influenced by others. However, the memory design in current methods causes user agents to introduce significant irrelevant information during decision-making in cross-d… ▽ More

    Submitted 18 April, 2025; v1 submitted 19 February, 2025; originally announced February 2025.

    Comments: Accepted by SIGIR 2025, 6 pages

  24. arXiv:2502.13840  [pdf, ps, other

    cs.IR cs.AI

    Unbiased Collaborative Filtering with Fair Sampling

    Authors: Jiahao Liu, Dongsheng Li, Hansu Gu, Peng Zhang, Tun Lu, Li Shang, Ning Gu

    Abstract: Recommender systems leverage extensive user interaction data to model preferences; however, directly modeling these data may introduce biases that disproportionately favor popular items. In this paper, we demonstrate that popularity bias arises from the influence of propensity factors during training. Building on this insight, we propose a fair sampling (FS) method that ensures each user and each… ▽ More

    Submitted 18 April, 2025; v1 submitted 19 February, 2025; originally announced February 2025.

    Comments: Accept by SIGIR 2025, 5 pages

  25. arXiv:2502.07461  [pdf, other

    cs.SD cs.AI

    JamendoMaxCaps: A Large Scale Music-caption Dataset with Imputed Metadata

    Authors: Abhinaba Roy, Renhang Liu, Tongyu Lu, Dorien Herremans

    Abstract: We introduce JamendoMaxCaps, a large-scale music-caption dataset featuring over 200,000 freely licensed instrumental tracks from the renowned Jamendo platform. The dataset includes captions generated by a state-of-the-art captioning model, enhanced with imputed metadata. We also introduce a retrieval system that leverages both musical features and metadata to identify similar songs, which are then… ▽ More

    Submitted 11 February, 2025; originally announced February 2025.

    Comments: 8 pages, 5 figures

  26. Ego vs. Exo and Active vs. Passive: Investigating the Effects of Viewpoint and Navigation on Spatial Immersion and Understanding in Immersive Storytelling

    Authors: Tao Lu, Qian Zhu, Tiffany Ma, Wong Kam-Kwai, Anlan Xie, Alex Endert, Yalong Yang

    Abstract: Visual storytelling combines visuals and narratives to communicate important insights. While web-based visual storytelling is well-established, leveraging the next generation of digital technologies for visual storytelling, specifically immersive technologies, remains underexplored. We investigated the impact of the story viewpoint (from the audience's perspective) and navigation (when progressing… ▽ More

    Submitted 6 February, 2025; originally announced February 2025.

  27. arXiv:2502.04522  [pdf, other

    cs.SD cs.AI eess.AS

    ImprovNet -- Generating Controllable Musical Improvisations with Iterative Corruption Refinement

    Authors: Keshav Bhandari, Sungkyun Chang, Tongyu Lu, Fareza R. Enus, Louis B. Bradshaw, Dorien Herremans, Simon Colton

    Abstract: Despite deep learning's remarkable advances in style transfer across various domains, generating controllable performance-level musical style transfer for complete symbolically represented musical works remains a challenging area of research. Much of this is owed to limited datasets, especially for genres such as jazz, and the lack of unified models that can handle multiple music generation tasks.… ▽ More

    Submitted 15 May, 2025; v1 submitted 6 February, 2025; originally announced February 2025.

    Comments: 10 pages, 6 figures, IJCNN 2025 conference

  28. arXiv:2501.14818  [pdf, other

    cs.CV cs.AI cs.LG

    Eagle 2: Building Post-Training Data Strategies from Scratch for Frontier Vision-Language Models

    Authors: Zhiqi Li, Guo Chen, Shilong Liu, Shihao Wang, Vibashan VS, Yishen Ji, Shiyi Lan, Hao Zhang, Yilin Zhao, Subhashree Radhakrishnan, Nadine Chang, Karan Sapra, Amala Sanjay Deshmukh, Tuomas Rintamaki, Matthieu Le, Ilia Karmanov, Lukas Voegtle, Philipp Fischer, De-An Huang, Timo Roman, Tong Lu, Jose M. Alvarez, Bryan Catanzaro, Jan Kautz, Andrew Tao , et al. (2 additional authors not shown)

    Abstract: Recently, promising progress has been made by open-source vision-language models (VLMs) in bringing their capabilities closer to those of proprietary frontier models. However, most open-source models only publish their final model weights, leaving the critical details of data strategies and implementation largely opaque. In this work, we address VLM post-training from a data-centric perspective, s… ▽ More

    Submitted 20 January, 2025; originally announced January 2025.

  29. arXiv:2501.14762  [pdf, other

    cs.CY cs.SI

    Linked Data on Geo-annotated Events and Use Cases for the Resilience of Ukraine

    Authors: Manar Attar, Shuai Wang, Ronald Siebes, Eirik Kultorp, Zhisheng Huang, Tianyang Lu

    Abstract: The mission of resilience of Ukrainian cities calls for international collaboration with the scientific community to increase the quality of information by identifying and integrating information from various news and social media sources. Linked Data technology can be used to unify, enrich, and integrate data from multiple sources. In our work, we focus on datasets about damaging events in Ukrain… ▽ More

    Submitted 24 December, 2024; originally announced January 2025.

    Comments: The paper is an extended version of our 2023 paper titled Converting and Enriching Geo-annotated Event Data: Integrating Information for Ukraine Resilience presented at the ACM International Conference on Advances in Geographic Information Systems (SIGSPATIAL conference). The use cases were presented at the BNAIC-BeNeLearn Joint International Scientific Conferences on A.I. and Machine Learning

    ACM Class: E.2; I.7.0; J.0

  30. arXiv:2501.07071  [pdf, other

    cs.AI

    Value Compass Leaderboard: A Platform for Fundamental and Validated Evaluation of LLMs Values

    Authors: Jing Yao, Xiaoyuan Yi, Shitong Duan, Jindong Wang, Yuzhuo Bai, Muhua Huang, Peng Zhang, Tun Lu, Zhicheng Dou, Maosong Sun, Xing Xie

    Abstract: As Large Language Models (LLMs) achieve remarkable breakthroughs, aligning their values with humans has become imperative for their responsible development and customized applications. However, there still lack evaluations of LLMs values that fulfill three desirable goals. (1) Value Clarification: We expect to clarify the underlying values of LLMs precisely and comprehensively, while current evalu… ▽ More

    Submitted 13 January, 2025; originally announced January 2025.

  31. arXiv:2501.04012  [pdf, other

    cs.MM cs.LG

    FlexCache: Flexible Approximate Cache System for Video Diffusion

    Authors: Desen Sun, Henry Tian, Tim Lu, Sihang Liu

    Abstract: Text-to-Video applications receive increasing attention from the public. Among these, diffusion models have emerged as the most prominent approach, offering impressive quality in visual content generation. However, it still suffers from substantial computational complexity, often requiring several minutes to generate a single video. While prior research has addressed the computational overhead in… ▽ More

    Submitted 17 December, 2024; originally announced January 2025.

  32. arXiv:2412.16938  [pdf, other

    cs.CV

    ImagineMap: Enhanced HD Map Construction with SD Maps

    Authors: Yishen Ji, Zhiqi Li, Tong Lu

    Abstract: Track Mapless demands models to process multi-view images and Standard-Definition (SD) maps, outputting lane and traffic element perceptions along with their topological relationships. We propose a novel architecture that integrates SD map priors to improve lane line and area detection performance. Inspired by TopoMLP, our model employs a two-stage structure: perception and reasoning. The downstre… ▽ More

    Submitted 22 December, 2024; originally announced December 2024.

    Comments: 4 pages, 1 figures, technical report

  33. arXiv:2412.16537  [pdf, other

    cs.CR

    Accelerating Private Large Transformers Inference through Fine-grained Collaborative Computation

    Authors: Yuntian Chen, Zhanyong Tang, Tianpei Lu, Bingsheng Zhang, Zhiying Shi, Zheng Wang

    Abstract: Homomorphic encryption (HE) and secret sharing (SS) enable computations on encrypted data, providing significant privacy benefits for large transformer-based models (TBM) in sensitive sectors like medicine and finance. However, private TBM inference incurs significant costs due to the coarse-grained application of HE and SS. We present FASTLMPI, a new approach to accelerate private TBM inference t… ▽ More

    Submitted 25 January, 2025; v1 submitted 21 December, 2024; originally announced December 2024.

    Comments: 14 Pages (with 4 Pages appendix; 14 Figures)

  34. arXiv:2412.13547  [pdf, other

    cs.CV

    Turbo-GS: Accelerating 3D Gaussian Fitting for High-Quality Radiance Fields

    Authors: Tao Lu, Ankit Dhiman, R Srinath, Emre Arslan, Angela Xing, Yuanbo Xiangli, R Venkatesh Babu, Srinath Sridhar

    Abstract: Novel-view synthesis is an important problem in computer vision with applications in 3D reconstruction, mixed reality, and robotics. Recent methods like 3D Gaussian Splatting (3DGS) have become the preferred method for this task, providing high-quality novel views in real time. However, the training time of a 3DGS model is slow, often taking 30 minutes for a scene with 200 views. In contrast, our… ▽ More

    Submitted 18 December, 2024; originally announced December 2024.

    Comments: Project page: https://ivl.cs.brown.edu/research/turbo-gs

  35. arXiv:2412.12075  [pdf, other

    cs.CV

    CG-Bench: Clue-grounded Question Answering Benchmark for Long Video Understanding

    Authors: Guo Chen, Yicheng Liu, Yifei Huang, Yuping He, Baoqi Pei, Jilan Xu, Yali Wang, Tong Lu, Limin Wang

    Abstract: Most existing video understanding benchmarks for multimodal large language models (MLLMs) focus only on short videos. The limited number of benchmarks for long video understanding often rely solely on multiple-choice questions (MCQs). However, because of the inherent limitation of MCQ-based evaluation and the increasing reasoning ability of MLLMs, models can give the current answer purely by combi… ▽ More

    Submitted 16 December, 2024; originally announced December 2024.

    Comments: 14 pages, 9 figures

  36. arXiv:2412.09624  [pdf, other

    cs.CV cs.RO

    GenEx: Generating an Explorable World

    Authors: Taiming Lu, Tianmin Shu, Junfei Xiao, Luoxin Ye, Jiahao Wang, Cheng Peng, Chen Wei, Daniel Khashabi, Rama Chellappa, Alan Yuille, Jieneng Chen

    Abstract: Understanding, navigating, and exploring the 3D physical real world has long been a central challenge in the development of artificial intelligence. In this work, we take a step toward this goal by introducing GenEx, a system capable of planning complex embodied world exploration, guided by its generative imagination that forms priors (expectations) about the surrounding environments. GenEx genera… ▽ More

    Submitted 20 January, 2025; v1 submitted 12 December, 2024; originally announced December 2024.

    Comments: Website: GenEx.world

  37. arXiv:2412.07660  [pdf, other

    cs.CV

    Proc-GS: Procedural Building Generation for City Assembly with 3D Gaussians

    Authors: Yixuan Li, Xingjian Ran, Linning Xu, Tao Lu, Mulin Yu, Zhenzhi Wang, Yuanbo Xiangli, Dahua Lin, Bo Dai

    Abstract: Buildings are primary components of cities, often featuring repeated elements such as windows and doors. Traditional 3D building asset creation is labor-intensive and requires specialized skills to develop design rules. Recent generative models for building creation often overlook these patterns, leading to low visual fidelity and limited scalability. Drawing inspiration from procedural modeling t… ▽ More

    Submitted 10 December, 2024; originally announced December 2024.

    Comments: Project page: https://city-super.github.io/procgs/

  38. arXiv:2412.07247  [pdf, other

    cs.CV

    Driving with InternVL: Oustanding Champion in the Track on Driving with Language of the Autonomous Grand Challenge at CVPR 2024

    Authors: Jiahan Li, Zhiqi Li, Tong Lu

    Abstract: This technical report describes the methods we employed for the Driving with Language track of the CVPR 2024 Autonomous Grand Challenge. We utilized a powerful open-source multimodal model, InternVL-1.5, and conducted a full-parameter fine-tuning on the competition dataset, DriveLM-nuScenes. To effectively handle the multi-view images of nuScenes and seamlessly inherit InternVL's outstanding multi… ▽ More

    Submitted 10 December, 2024; originally announced December 2024.

  39. arXiv:2412.05271  [pdf, other

    cs.CV

    Expanding Performance Boundaries of Open-Source Multimodal Models with Model, Data, and Test-Time Scaling

    Authors: Zhe Chen, Weiyun Wang, Yue Cao, Yangzhou Liu, Zhangwei Gao, Erfei Cui, Jinguo Zhu, Shenglong Ye, Hao Tian, Zhaoyang Liu, Lixin Gu, Xuehui Wang, Qingyun Li, Yimin Ren, Zixuan Chen, Jiapeng Luo, Jiahao Wang, Tan Jiang, Bo Wang, Conghui He, Botian Shi, Xingcheng Zhang, Han Lv, Yi Wang, Wenqi Shao , et al. (17 additional authors not shown)

    Abstract: We introduce InternVL 2.5, an advanced multimodal large language model (MLLM) series that builds upon InternVL 2.0, maintaining its core model architecture while introducing significant enhancements in training and testing strategies as well as data quality. In this work, we delve into the relationship between model scaling and performance, systematically exploring the performance trends in vision… ▽ More

    Submitted 13 January, 2025; v1 submitted 6 December, 2024; originally announced December 2024.

    Comments: Technical Report

  40. arXiv:2412.04690  [pdf, other

    cs.CL cs.AI

    LLM-Align: Utilizing Large Language Models for Entity Alignment in Knowledge Graphs

    Authors: Xuan Chen, Tong Lu, Zhichun Wang

    Abstract: Entity Alignment (EA) seeks to identify and match corresponding entities across different Knowledge Graphs (KGs), playing a crucial role in knowledge fusion and integration. Embedding-based entity alignment (EA) has recently gained considerable attention, resulting in the emergence of many innovative approaches. Initially, these approaches concentrated on learning entity embeddings based on the st… ▽ More

    Submitted 5 December, 2024; originally announced December 2024.

  41. arXiv:2412.01745  [pdf, other

    cs.CV

    Horizon-GS: Unified 3D Gaussian Splatting for Large-Scale Aerial-to-Ground Scenes

    Authors: Lihan Jiang, Kerui Ren, Mulin Yu, Linning Xu, Junting Dong, Tao Lu, Feng Zhao, Dahua Lin, Bo Dai

    Abstract: Seamless integration of both aerial and street view images remains a significant challenge in neural scene reconstruction and rendering. Existing methods predominantly focus on single domain, limiting their applications in immersive environments, which demand extensive free view exploration with large view changes both horizontally and vertically. We introduce Horizon-GS, a novel approach built up… ▽ More

    Submitted 2 December, 2024; originally announced December 2024.

  42. arXiv:2412.00813  [pdf, other

    cs.IR

    Oracle-guided Dynamic User Preference Modeling for Sequential Recommendation

    Authors: Jiafeng Xia, Dongsheng Li, Hansu Gu, Tun Lu, Peng Zhang, Li Shang, Ning Gu

    Abstract: Sequential recommendation methods can capture dynamic user preferences from user historical interactions to achieve better performance. However, most existing methods only use past information extracted from user historical interactions to train the models, leading to the deviations of user preference modeling. Besides past information, future information is also available during training, which c… ▽ More

    Submitted 1 December, 2024; originally announced December 2024.

  43. arXiv:2411.11844  [pdf, other

    cs.CV cs.RO

    Generative World Explorer

    Authors: Taiming Lu, Tianmin Shu, Alan Yuille, Daniel Khashabi, Jieneng Chen

    Abstract: Planning with partial observation is a central challenge in embodied AI. A majority of prior works have tackled this challenge by developing agents that physically explore their environment to update their beliefs about the world state. In contrast, humans can $\textit{imagine}$ unseen parts of the world through a mental exploration and $\textit{revise}$ their beliefs with imagined observations. S… ▽ More

    Submitted 19 November, 2024; v1 submitted 18 November, 2024; originally announced November 2024.

    Comments: Website: generative-world-explorer.github.io

  44. arXiv:2411.09287  [pdf, other

    cs.CR

    The Communication-Friendly Privacy-Preserving Machine Learning against Malicious Adversaries

    Authors: Tianpei Lu, Bingsheng Zhang, Lichun Li, Kui Ren

    Abstract: With the increasing emphasis on privacy regulations, such as GDPR, protecting individual privacy and ensuring compliance have become critical concerns for both individuals and organizations. Privacy-preserving machine learning (PPML) is an innovative approach that allows for secure data analysis while safeguarding sensitive information. It enables organizations to extract valuable insights from da… ▽ More

    Submitted 14 November, 2024; originally announced November 2024.

  45. arXiv:2411.01844  [pdf, other

    cs.HC cs.AI cs.SI

    DeMod: A Holistic Tool with Explainable Detection and Personalized Modification for Toxicity Censorship

    Authors: Yaqiong Li, Peng Zhang, Hansu Gu, Tun Lu, Siyuan Qiao, Yubo Shu, Yiyang Shao, Ning Gu

    Abstract: Although there have been automated approaches and tools supporting toxicity censorship for social posts, most of them focus on detection. Toxicity censorship is a complex process, wherein detection is just an initial task and a user can have further needs such as rationale understanding and content modification. For this problem, we conduct a needfinding study to investigate people's diverse needs… ▽ More

    Submitted 4 November, 2024; originally announced November 2024.

  46. arXiv:2410.16261  [pdf, other

    cs.CV

    Mini-InternVL: A Flexible-Transfer Pocket Multimodal Model with 5% Parameters and 90% Performance

    Authors: Zhangwei Gao, Zhe Chen, Erfei Cui, Yiming Ren, Weiyun Wang, Jinguo Zhu, Hao Tian, Shenglong Ye, Junjun He, Xizhou Zhu, Lewei Lu, Tong Lu, Yu Qiao, Jifeng Dai, Wenhai Wang

    Abstract: Multimodal large language models (MLLMs) have demonstrated impressive performance in vision-language tasks across a broad spectrum of domains. However, the large model scale and associated high computational costs pose significant challenges for training and deploying MLLMs on consumer-grade GPUs or edge devices, thereby hindering their widespread application. In this work, we introduce Mini-Inter… ▽ More

    Submitted 7 November, 2024; v1 submitted 21 October, 2024; originally announced October 2024.

    Comments: Technical report

  47. arXiv:2410.11829  [pdf, other

    cs.CV

    MMFuser: Multimodal Multi-Layer Feature Fuser for Fine-Grained Vision-Language Understanding

    Authors: Yue Cao, Yangzhou Liu, Zhe Chen, Guangchen Shi, Wenhai Wang, Danhuai Zhao, Tong Lu

    Abstract: Despite significant advancements in Multimodal Large Language Models (MLLMs) for understanding complex human intentions through cross-modal interactions, capturing intricate image details remains challenging. Previous methods integrating multiple vision encoders to enhance visual detail introduce redundancy and computational overhead. We observe that most MLLMs utilize only the last-layer feature… ▽ More

    Submitted 15 October, 2024; originally announced October 2024.

    Comments: 11 pages, 6 figures, technical report

  48. arXiv:2410.05411  [pdf, other

    cs.IR cs.HC

    Filtering Discomforting Recommendations with Large Language Models

    Authors: Jiahao Liu, Yiyang Shao, Peng Zhang, Dongsheng Li, Hansu Gu, Chao Chen, Longzhi Du, Tun Lu, Ning Gu

    Abstract: Personalized algorithms can inadvertently expose users to discomforting recommendations, potentially triggering negative consequences. The subjectivity of discomfort and the black-box nature of these algorithms make it challenging to effectively identify and filter such content. To address this, we first conducted a formative study to understand users' practices and expectations regarding discomfo… ▽ More

    Submitted 23 January, 2025; v1 submitted 7 October, 2024; originally announced October 2024.

    Comments: Accepted by WWW 2025, 16 pages, full version

  49. arXiv:2409.19272  [pdf, other

    cs.CL

    Perception Compressor: A Training-Free Prompt Compression Framework in Long Context Scenarios

    Authors: Jiwei Tang, Jin Xu, Tingwei Lu, Zhicheng Zhang, Yiming Zhao, Lin Hai, Hai-Tao Zheng

    Abstract: Large language models (LLMs) demonstrate exceptional capabilities in various scenarios. However, they suffer from much redundant information and are sensitive to the position of key information in long context scenarios. To address these challenges, we present Perception Compressor, a training-free prompt compression framework. It includes a perception retriever that leverages guiding questions an… ▽ More

    Submitted 7 February, 2025; v1 submitted 28 September, 2024; originally announced September 2024.

    Comments: Accepted at NAACL 2025 Findings

  50. arXiv:2409.18429  [pdf, other

    cs.IT eess.SP

    Joint Optimization of Data- and Model-Driven Probing Beams and Beam Predictor

    Authors: Tianheng Lu, Fan Meng, Zhilei Zhang, Yongming Huang, Cheng Zhang, Xiaoyu Bai

    Abstract: Hierarchical search in millimeter-wave (mmWave) communications incurs significant beam training overhead and delay, especially in a dynamic environment. Deep learning-enabled beam prediction is promising to significantly mitigate the overhead and delay, efficiently utilizing the site-specific channel prior. In this work, we propose to jointly optimize a data- and model-driven probe beam module and… ▽ More

    Submitted 26 September, 2024; originally announced September 2024.