Skip to main content

Showing 1–50 of 134 results for author: Yan, T

Searching in archive cs. Search in all archives.
.
  1. arXiv:2506.22756  [pdf, ps, other

    cs.CV cs.RO

    RoboPearls: Editable Video Simulation for Robot Manipulation

    Authors: Tao Tang, Likui Zhang, Youpeng Wen, Kaidong Zhang, Jia-Wang Bian, xia zhou, Tianyi Yan, Kun Zhan, Peng Jia, Hefeng Wu, Liang Lin, Xiaodan Liang

    Abstract: The development of generalist robot manipulation policies has seen significant progress, driven by large-scale demonstration data across diverse environments. However, the high cost and inefficiency of collecting real-world demonstrations hinder the scalability of data acquisition. While existing simulation platforms enable controlled environments for robotic learning, the challenge of bridging th… ▽ More

    Submitted 28 June, 2025; originally announced June 2025.

    Comments: ICCV 2025

  2. arXiv:2506.06205  [pdf, other

    cs.RO cs.AI

    Astra: Toward General-Purpose Mobile Robots via Hierarchical Multimodal Learning

    Authors: Sheng Chen, Peiyu He, Jiaxin Hu, Ziyang Liu, Yansheng Wang, Tao Xu, Chi Zhang, Chongchong Zhang, Chao An, Shiyu Cai, Duo Cao, Kangping Chen, Shuai Chu, Tianwei Chu, Mingdi Dan, Min Du, Weiwei Fang, Pengyou Fu, Junkai Hu, Xiaowei Jiang, Zhaodi Jiang, Fuxuan Li, Jun Li, Minghui Li, Mingyao Li , et al. (46 additional authors not shown)

    Abstract: Modern robot navigation systems encounter difficulties in diverse and complex indoor environments. Traditional approaches rely on multiple modules with small models or rule-based systems and thus lack adaptability to new environments. To address this, we developed Astra, a comprehensive dual-model architecture, Astra-Global and Astra-Local, for mobile robot navigation. Astra-Global, a multimodal L… ▽ More

    Submitted 6 June, 2025; originally announced June 2025.

    Comments: Astra Technical Report

  3. arXiv:2504.20525  [pdf, other

    cs.CV

    Geometry-aware Temporal Aggregation Network for Monocular 3D Lane Detection

    Authors: Huan Zheng, Wencheng Han, Tianyi Yan, Cheng-zhong Xu, Jianbing Shen

    Abstract: Monocular 3D lane detection aims to estimate 3D position of lanes from frontal-view (FV) images. However, current monocular 3D lane detection methods suffer from two limitations, including inaccurate geometric information of the predicted 3D lanes and difficulties in maintaining lane integrity. To address these issues, we seek to fully exploit the potential of multiple input frames. First, we aim… ▽ More

    Submitted 29 April, 2025; originally announced April 2025.

  4. arXiv:2504.15449  [pdf, other

    cs.DC

    Tracing Cross-chain Transactions between EVM-based Blockchains: An Analysis of Ethereum-Polygon Bridges

    Authors: Tao Yan, Chuanshan Huang, Claudio J. Tessone

    Abstract: Ethereum's scalability has been a major concern due to its limited transaction throughput and high fees. To address these limitations, Polygon has emerged as a sidechain solution that facilitates asset transfers between Ethereum and Polygon, thereby improving scalability and reducing costs. However, current cross-chain transactions, particularly those between Ethereum and Polygon, lack transparenc… ▽ More

    Submitted 21 April, 2025; originally announced April 2025.

  5. arXiv:2503.07834  [pdf, other

    cs.CE

    Network Analysis of Uniswap: Centralization and Fragility in the Decentralized Exchange Market

    Authors: Tao Yan, Claudio J. Tessone

    Abstract: The Uniswap is a Decentralized Exchange (DEX) protocol that facilitates automatic token exchange without the need for traditional order books. Every pair of tokens forms a liquidity pool on Uniswap, and each token can be paired with any other token to create liquidity pools. This characteristic motivates us to employ a complex network approach to analyze the features of the Uniswap market. This re… ▽ More

    Submitted 10 March, 2025; originally announced March 2025.

  6. arXiv:2503.07170  [pdf, other

    cs.CL cs.AI

    DeFine: A Decomposed and Fine-Grained Annotated Dataset for Long-form Article Generation

    Authors: Ming Wang, Fang Wang, Minghao Hu, Li He, Haiyang Wang, Jun Zhang, Tianwei Yan, Li Li, Zhunchen Luo, Wei Luo, Xiaoying Bai, Guotong Geng

    Abstract: Long-form article generation (LFAG) presents challenges such as maintaining logical consistency, comprehensive topic coverage, and narrative coherence across extended articles. Existing datasets often lack both the hierarchical structure and fine-grained annotation needed to effectively decompose tasks, resulting in shallow, disorganized article generation. To address these limitations, we introdu… ▽ More

    Submitted 10 March, 2025; originally announced March 2025.

  7. arXiv:2502.20475  [pdf, other

    cs.CL cs.AI cs.LG

    Promote, Suppress, Iterate: How Language Models Answer One-to-Many Factual Queries

    Authors: Tianyi Lorena Yan, Robin Jia

    Abstract: To answer one-to-many factual queries (e.g., listing cities of a country), a language model (LM) must simultaneously recall knowledge and avoid repeating previous answers. How are these two subtasks implemented and integrated internally? Across multiple datasets and models, we identify a promote-then-suppress mechanism: the model first recalls all answers, and then suppresses previously generated… ▽ More

    Submitted 5 March, 2025; v1 submitted 27 February, 2025; originally announced February 2025.

  8. arXiv:2502.14378  [pdf, ps, other

    cs.IT

    Extremal Self-Dual Codes and Linear Complementary Dual Codes from Double Circulant Codes

    Authors: Wenyu Han, Tongjiang Yan, Ming Yan

    Abstract: This paper explores extremal self-dual double circulant (DC) codes and linear complementary dual (LCD) codes of arbitrary length over the Galois field $\mathbb F_2$. We establish the sufficient and necessary conditions for DC codes and bordered DC codes to be self-dual and identify the conditions for self-dual DC codes of length up to 44 to be extremal or non-extremal. Additionally, The self-duali… ▽ More

    Submitted 20 February, 2025; originally announced February 2025.

  9. arXiv:2502.09696  [pdf, other

    cs.CV

    ZeroBench: An Impossible Visual Benchmark for Contemporary Large Multimodal Models

    Authors: Jonathan Roberts, Mohammad Reza Taesiri, Ansh Sharma, Akash Gupta, Samuel Roberts, Ioana Croitoru, Simion-Vlad Bogolin, Jialu Tang, Florian Langer, Vyas Raina, Vatsal Raina, Hanyi Xiong, Vishaal Udandarao, Jingyi Lu, Shiyang Chen, Sam Purkis, Tianshuo Yan, Wenye Lin, Gyungin Shin, Qiaochu Yang, Anh Totti Nguyen, David I. Atkinson, Aaditya Baranwal, Alexandru Coca, Mikah Dang , et al. (9 additional authors not shown)

    Abstract: Large Multimodal Models (LMMs) exhibit major shortfalls when interpreting images and, by some measures, have poorer spatial cognition than small children or animals. Despite this, they attain high scores on many popular visual benchmarks, with headroom rapidly eroded by an ongoing surge of model progress. To address this, there is a pressing need for difficult benchmarks that remain relevant for l… ▽ More

    Submitted 6 March, 2025; v1 submitted 13 February, 2025; originally announced February 2025.

    Comments: 20 pages, 13 figures

  10. arXiv:2412.20637  [pdf, other

    cs.CL

    Knowledge Editing for Large Language Model with Knowledge Neuronal Ensemble

    Authors: Yongchang Li, Yujin Zhu, Tao Yan, Shijian Fan, Gang Wu, Liang Xu

    Abstract: As real-world knowledge is constantly evolving, ensuring the timeliness and accuracy of a model's knowledge is crucial. This has made knowledge editing in large language models increasingly important. However, existing knowledge editing methods face several challenges, including parameter localization coupling, imprecise localization, and a lack of dynamic interaction across layers. In this paper,… ▽ More

    Submitted 29 December, 2024; originally announced December 2024.

    Comments: 26 pages, 5 figures, 2 tables

    MSC Class: 68T50

  11. arXiv:2412.17226  [pdf, other

    cs.CV cs.RO

    OLiDM: Object-aware LiDAR Diffusion Models for Autonomous Driving

    Authors: Tianyi Yan, Junbo Yin, Xianpeng Lang, Ruigang Yang, Cheng-Zhong Xu, Jianbing Shen

    Abstract: To enhance autonomous driving safety in complex scenarios, various methods have been proposed to simulate LiDAR point cloud data. Nevertheless, these methods often face challenges in producing high-quality, diverse, and controllable foreground objects. To address the needs of object-aware tasks in 3D perception, we introduce OLiDM, a novel framework capable of generating high-fidelity LiDAR data a… ▽ More

    Submitted 22 December, 2024; originally announced December 2024.

    Comments: AAAI 2025, https://yanty123.github.io/OLiDM

  12. arXiv:2412.15623  [pdf, other

    cs.CR cs.AI

    JailPO: A Novel Black-box Jailbreak Framework via Preference Optimization against Aligned LLMs

    Authors: Hongyi Li, Jiawei Ye, Jie Wu, Tianjie Yan, Chu Wang, Zhixin Li

    Abstract: Large Language Models (LLMs) aligned with human feedback have recently garnered significant attention. However, it remains vulnerable to jailbreak attacks, where adversaries manipulate prompts to induce harmful outputs. Exploring jailbreak attacks enables us to investigate the vulnerabilities of LLMs and further guides us in enhancing their security. Unfortunately, existing techniques mainly rely… ▽ More

    Submitted 20 December, 2024; originally announced December 2024.

    Comments: Accepted by AAAI 2025

  13. arXiv:2412.10707  [pdf, other

    cs.CV cs.MM

    MambaPro: Multi-Modal Object Re-Identification with Mamba Aggregation and Synergistic Prompt

    Authors: Yuhao Wang, Xuehu Liu, Tianyu Yan, Yang Liu, Aihua Zheng, Pingping Zhang, Huchuan Lu

    Abstract: Multi-modal object Re-IDentification (ReID) aims to retrieve specific objects by utilizing complementary image information from different modalities. Recently, large-scale pre-trained models like CLIP have demonstrated impressive performance in traditional single-modal object ReID tasks. However, they remain unexplored for multi-modal object ReID. Furthermore, current multi-modal aggregation metho… ▽ More

    Submitted 14 December, 2024; originally announced December 2024.

    Comments: This work is accepted by AAAI2025. More modifications may be performed

  14. arXiv:2412.08289  [pdf, other

    cs.LG

    k-HyperEdge Medoids for Clustering Ensemble

    Authors: Feijiang Li, Jieting Wang, Liuya zhang, Yuhua Qian, Shuai jin, Tao Yan, Liang Du

    Abstract: Clustering ensemble has been a popular research topic in data science due to its ability to improve the robustness of the single clustering method. Many clustering ensemble methods have been proposed, most of which can be categorized into clustering-view and sample-view methods. The clustering-view method is generally efficient, but it could be affected by the unreliability that existed in base cl… ▽ More

    Submitted 11 December, 2024; originally announced December 2024.

  15. arXiv:2412.00060  [pdf, other

    cs.CV cs.AI

    MOSABench: Multi-Object Sentiment Analysis Benchmark for Evaluating Multimodal Large Language Models Understanding of Complex Image

    Authors: Shezheng Song, Chengxiang He, Shasha Li, Shan Zhao, Chengyu Wang, Tianwei Yan, Xiaopeng Li, Qian Wan, Jun Ma, Jie Yu, Xiaoguang Mao

    Abstract: Multimodal large language models (MLLMs) have shown remarkable progress in high-level semantic tasks such as visual question answering, image captioning, and emotion recognition. However, despite advancements, there remains a lack of standardized benchmarks for evaluating MLLMs performance in multi-object sentiment analysis, a key task in semantic understanding. To address this gap, we introduce M… ▽ More

    Submitted 25 November, 2024; originally announced December 2024.

  16. arXiv:2411.11252  [pdf, other

    cs.RO cs.CV

    DrivingSphere: Building a High-fidelity 4D World for Closed-loop Simulation

    Authors: Tianyi Yan, Dongming Wu, Wencheng Han, Junpeng Jiang, Xia Zhou, Kun Zhan, Cheng-zhong Xu, Jianbing Shen

    Abstract: Autonomous driving evaluation requires simulation environments that closely replicate actual road conditions, including real-world sensory data and responsive feedback loops. However, many existing simulations need to predict waypoints along fixed routes on public datasets or synthetic photorealistic data, \ie, open-loop simulation usually lacks the ability to assess dynamic decision-making. While… ▽ More

    Submitted 17 November, 2024; originally announced November 2024.

    Comments: https://yanty123.github.io/DrivingSphere/

  17. arXiv:2410.18096  [pdf, other

    cs.IR cs.AI cs.CL cs.CV

    $M^3EL$: A Multi-task Multi-topic Dataset for Multi-modal Entity Linking

    Authors: Fang Wang, Shenglin Yin, Xiaoying Bai, Minghao Hu, Tianwei Yan, Yi Liang

    Abstract: Multi-modal Entity Linking (MEL) is a fundamental component for various downstream tasks. However, existing MEL datasets suffer from small scale, scarcity of topic types and limited coverage of tasks, making them incapable of effectively enhancing the entity linking capabilities of multi-modal models. To address these obstacles, we propose a dataset construction pipeline and publish $M^3EL$, a lar… ▽ More

    Submitted 8 October, 2024; originally announced October 2024.

  18. arXiv:2409.08062  [pdf, other

    cs.LG cs.RO

    Q-value Regularized Decision ConvFormer for Offline Reinforcement Learning

    Authors: Teng Yan, Zhendong Ruan, Yaobang Cai, Yu Han, Wenxian Li, Yang Zhang

    Abstract: As a data-driven paradigm, offline reinforcement learning (Offline RL) has been formulated as sequence modeling, where the Decision Transformer (DT) has demonstrated exceptional capabilities. Unlike previous reinforcement learning methods that fit value functions or compute policy gradients, DT adjusts the autoregressive model based on the expected returns, past states, and actions, using a causal… ▽ More

    Submitted 12 September, 2024; originally announced September 2024.

  19. arXiv:2408.12049  [pdf, ps, other

    cs.IT

    Research on the Construction of Maximum Distance Separable Codes via Arbitrary twisted Generalized Reed-Solomon Codes

    Authors: Chun'e Zhao, Wenping Ma, Tongjiang Yan, Yuhua Sun

    Abstract: Maximum distance separable (MDS) codes have significant combinatorial and cryptographic applications due to their certain optimality. Generalized Reed-Solomon (GRS) codes are the most prominent MDS codes. Twisted generalized Reed-Solomon (TGRS) codes may not necessarily be MDS. It is meaningful to study the conditions under which TGRS codes are MDS. In this paper, we study a general class of TGRS… ▽ More

    Submitted 21 August, 2024; originally announced August 2024.

  20. arXiv:2408.04326  [pdf, other

    cs.CV cs.MM

    Multi-Scale and Detail-Enhanced Segment Anything Model for Salient Object Detection

    Authors: Shixuan Gao, Pingping Zhang, Tianyu Yan, Huchuan Lu

    Abstract: Salient Object Detection (SOD) aims to identify and segment the most prominent objects in images. Advanced SOD methods often utilize various Convolutional Neural Networks (CNN) or Transformers for deep feature extraction. However, these methods still deliver low performance and poor generalization in complex cases. Recently, Segment Anything Model (SAM) has been proposed as a visual fundamental mo… ▽ More

    Submitted 8 August, 2024; originally announced August 2024.

    Comments: This work is accepted by ACM MM2024

  21. arXiv:2407.02483  [pdf, other

    cs.CL cs.AI

    MMedAgent: Learning to Use Medical Tools with Multi-modal Agent

    Authors: Binxu Li, Tiankai Yan, Yuanting Pan, Jie Luo, Ruiyang Ji, Jiayuan Ding, Zhe Xu, Shilong Liu, Haoyu Dong, Zihao Lin, Yixin Wang

    Abstract: Multi-Modal Large Language Models (MLLMs), despite being successful, exhibit limited generality and often fall short when compared to specialized models. Recently, LLM-based agents have been developed to address these challenges by selecting appropriate specialized models as tools based on user inputs. However, such advancements have not been extensively explored within the medical domain. To brid… ▽ More

    Submitted 5 October, 2024; v1 submitted 2 July, 2024; originally announced July 2024.

    Comments: EMNLP 2024

  22. arXiv:2406.16074  [pdf, other

    eess.IV cs.CV

    CAVM: Conditional Autoregressive Vision Model for Contrast-Enhanced Brain Tumor MRI Synthesis

    Authors: Lujun Gui, Chuyang Ye, Tianyi Yan

    Abstract: Contrast-enhanced magnetic resonance imaging (MRI) is pivotal in the pipeline of brain tumor segmentation and analysis. Gadolinium-based contrast agents, as the most commonly used contrast agents, are expensive and may have potential side effects, and it is desired to obtain contrast-enhanced brain tumor MRI scans without the actual use of contrast agents. Deep learning methods have been applied t… ▽ More

    Submitted 23 June, 2024; originally announced June 2024.

    Comments: The work has been accepted by MICCAI 2024

  23. arXiv:2406.10652  [pdf, ps, other

    cs.CV

    MDeRainNet: An Efficient Macro-pixel Image Rain Removal Network

    Authors: Tao Yan, Weijiang He, Chenglong Wang, Cihang Wei, Xiangjie Zhu, Yinghui Wang, Rynson W. H. Lau

    Abstract: Since rainy weather always degrades image quality and poses significant challenges to most computer vision-based intelligent systems, image de-raining has been a hot research topic. Fortunately, in a rainy light field (LF) image, background obscured by rain streaks in one sub-view may be visible in the other sub-views, and implicit depth information and recorded 4D structural information may benef… ▽ More

    Submitted 23 June, 2025; v1 submitted 15 June, 2024; originally announced June 2024.

    Comments: 14 pages, 14 figures, 4 tables

  24. arXiv:2406.09411  [pdf, other

    cs.CV cs.AI cs.CL

    MuirBench: A Comprehensive Benchmark for Robust Multi-image Understanding

    Authors: Fei Wang, Xingyu Fu, James Y. Huang, Zekun Li, Qin Liu, Xiaogeng Liu, Mingyu Derek Ma, Nan Xu, Wenxuan Zhou, Kai Zhang, Tianyi Lorena Yan, Wenjie Jacky Mo, Hsiang-Hui Liu, Pan Lu, Chunyuan Li, Chaowei Xiao, Kai-Wei Chang, Dan Roth, Sheng Zhang, Hoifung Poon, Muhao Chen

    Abstract: We introduce MuirBench, a comprehensive benchmark that focuses on robust multi-image understanding capabilities of multimodal LLMs. MuirBench consists of 12 diverse multi-image tasks (e.g., scene understanding, ordering) that involve 10 categories of multi-image relations (e.g., multiview, temporal relations). Comprising 11,264 images and 2,600 multiple-choice questions, MuirBench is created in a… ▽ More

    Submitted 1 July, 2024; v1 submitted 13 June, 2024; originally announced June 2024.

    Comments: typos corrected, references added, Project Page: https://muirbench.github.io/

  25. arXiv:2406.00017  [pdf, other

    cs.CL cs.AI cs.MM

    PTA: Enhancing Multimodal Sentiment Analysis through Pipelined Prediction and Translation-based Alignment

    Authors: Shezheng Song, Shasha Li, Shan Zhao, Chengyu Wang, Xiaopeng Li, Jie Yu, Qian Wan, Jun Ma, Tianwei Yan, Wentao Ma, Xiaoguang Mao

    Abstract: Multimodal aspect-based sentiment analysis (MABSA) aims to understand opinions in a granular manner, advancing human-computer interaction and other fields. Traditionally, MABSA methods use a joint prediction approach to identify aspects and sentiments simultaneously. However, we argue that joint models are not always superior. Our analysis shows that joint models struggle to align relevant text to… ▽ More

    Submitted 13 June, 2024; v1 submitted 22 May, 2024; originally announced June 2024.

    Comments: Code will be released upon publication

  26. arXiv:2405.18458  [pdf

    cs.LG physics.optics

    Asymmetrical estimator for training encapsulated deep photonic neural networks

    Authors: Yizhi Wang, Minjia Chen, Chunhui Yao, Jie Ma, Ting Yan, Richard Penty, Qixiang Cheng

    Abstract: Photonic neural networks (PNNs) are fast in-propagation and high bandwidth paradigms that aim to popularize reproducible NN acceleration with higher efficiency and lower cost. However, the training of PNN is known to be challenging, where the device-to-device and system-to-system variations create imperfect knowledge of the PNN. Despite backpropagation (BP)-based training algorithms being the indu… ▽ More

    Submitted 13 February, 2025; v1 submitted 28 May, 2024; originally announced May 2024.

    Comments: 23 pages, 6 figures

    MSC Class: 78-05

    Journal ref: Nat Commun 16, 2143 (2025)

  27. arXiv:2405.03971  [pdf, other

    cs.CV cs.MA

    Unified End-to-End V2X Cooperative Autonomous Driving

    Authors: Zhiwei Li, Bozhen Zhang, Lei Yang, Tianyu Shen, Nuo Xu, Ruosen Hao, Weiting Li, Tao Yan, Huaping Liu

    Abstract: V2X cooperation, through the integration of sensor data from both vehicles and infrastructure, is considered a pivotal approach to advancing autonomous driving technology. Current research primarily focuses on enhancing perception accuracy, often overlooking the systematic improvement of accident prediction accuracy through end-to-end learning, leading to insufficient attention to the safety issue… ▽ More

    Submitted 6 May, 2024; originally announced May 2024.

  28. arXiv:2404.15700  [pdf, other

    cs.CV cs.RO

    MAS-SAM: Segment Any Marine Animal with Aggregated Features

    Authors: Tianyu Yan, Zifu Wan, Xinhao Deng, Pingping Zhang, Yang Liu, Huchuan Lu

    Abstract: Recently, Segment Anything Model (SAM) shows exceptional performance in generating high-quality object masks and achieving zero-shot image segmentation. However, as a versatile vision model, SAM is primarily trained with large-scale natural light images. In underwater scenes, it exhibits substantial performance degradation due to the light scattering and absorption. Meanwhile, the simplicity of th… ▽ More

    Submitted 9 May, 2024; v1 submitted 24 April, 2024; originally announced April 2024.

    Comments: Accepted by IJCAI2024 as Poster

  29. arXiv:2404.04996  [pdf, other

    cs.CV cs.MM

    Fantastic Animals and Where to Find Them: Segment Any Marine Animal with Dual SAM

    Authors: Pingping Zhang, Tianyu Yan, Yang Liu, Huchuan Lu

    Abstract: As an important pillar of underwater intelligence, Marine Animal Segmentation (MAS) involves segmenting animals within marine environments. Previous methods don't excel in extracting long-range contextual features and overlook the connectivity between discrete pixels. Recently, Segment Anything Model (SAM) offers a universal framework for general segmentation tasks. Unfortunately, trained with nat… ▽ More

    Submitted 7 April, 2024; originally announced April 2024.

    Comments: Accepted by CVPR2024 as Poster(Highlight)

  30. arXiv:2404.04818  [pdf, other

    cs.AI cs.CL cs.CV

    DWE+: Dual-Way Matching Enhanced Framework for Multimodal Entity Linking

    Authors: Shezheng Song, Shasha Li, Shan Zhao, Xiaopeng Li, Chengyu Wang, Jie Yu, Jun Ma, Tianwei Yan, Bin Ji, Xiaoguang Mao

    Abstract: Multimodal entity linking (MEL) aims to utilize multimodal information (usually textual and visual information) to link ambiguous mentions to unambiguous entities in knowledge base. Current methods facing main issues: (1)treating the entire image as input may contain redundant information. (2)the insufficient utilization of entity-related information, such as attributes in images. (3)semantic inco… ▽ More

    Submitted 7 April, 2024; originally announced April 2024.

    Comments: under review on TOIS. arXiv admin note: substantial text overlap with arXiv:2312.11816

  31. arXiv:2404.00340  [pdf, other

    cs.RO eess.SY

    Deep Reinforcement Learning in Autonomous Car Path Planning and Control: A Survey

    Authors: Yiyang Chen, Chao Ji, Yunrui Cai, Tong Yan, Bo Su

    Abstract: Combining data-driven applications with control systems plays a key role in recent Autonomous Car research. This thesis offers a structured review of the latest literature on Deep Reinforcement Learning (DRL) within the realm of autonomous vehicle Path Planning and Control. It collects a series of DRL methodologies and algorithms and their applications in the field, focusing notably on their roles… ▽ More

    Submitted 30 March, 2024; originally announced April 2024.

  32. arXiv:2403.16038  [pdf, other

    cs.CL

    Monotonic Paraphrasing Improves Generalization of Language Model Prompting

    Authors: Qin Liu, Fei Wang, Nan Xu, Tianyi Yan, Tao Meng, Muhao Chen

    Abstract: Performance of large language models (LLMs) may vary with different prompts or instructions of even the same task. One commonly recognized factor for this phenomenon is the model's familiarity with the given prompt or instruction, which is typically estimated by its perplexity. However, finding the prompt with the lowest perplexity is challenging, given the enormous space of possible prompting phr… ▽ More

    Submitted 2 November, 2024; v1 submitted 24 March, 2024; originally announced March 2024.

    Comments: EMNLP 2024 Camera Ready

  33. arXiv:2403.15716  [pdf, other

    cs.RO cs.AI eess.SY

    Distributed Robust Learning based Formation Control of Mobile Robots based on Bioinspired Neural Dynamics

    Authors: Zhe Xu, Tao Yan, Simon X. Yang, S. Andrew Gadsden, Mohammad Biglarbegian

    Abstract: This paper addresses the challenges of distributed formation control in multiple mobile robots, introducing a novel approach that enhances real-world practicability. We first introduce a distributed estimator using a variable structure and cascaded design technique, eliminating the need for derivative information to improve the real time performance. Then, a kinematic tracking control method is de… ▽ More

    Submitted 23 March, 2024; originally announced March 2024.

    Comments: This paper is accepted by IEEE Transactions on Intelligent Vehicles

  34. arXiv:2403.12109  [pdf, other

    cs.LG cs.AI cs.CV

    GCAM: Gaussian and causal-attention model of food fine-grained recognition

    Authors: Guohang Zhuang, Yue Hu, Tianxing Yan, JiaZhan Gao

    Abstract: Currently, most food recognition relies on deep learning for category classification. However, these approaches struggle to effectively distinguish between visually similar food samples, highlighting the pressing need to address fine-grained issues in food recognition. To mitigate these challenges, we propose the adoption of a Gaussian and causal-attention model for fine-grained object recognition… ▽ More

    Submitted 17 March, 2024; originally announced March 2024.

    Comments: 23 pages, 11 figures

  35. arXiv:2403.10082  [pdf, other

    cs.CV

    CrossGLG: LLM Guides One-shot Skeleton-based 3D Action Recognition in a Cross-level Manner

    Authors: Tingbing Yan, Wenzheng Zeng, Yang Xiao, Xingyu Tong, Bo Tan, Zhiwen Fang, Zhiguo Cao, Joey Tianyi Zhou

    Abstract: Most existing one-shot skeleton-based action recognition focuses on raw low-level information (e.g., joint location), and may suffer from local information loss and low generalization ability. To alleviate these, we propose to leverage text description generated from large language models (LLM) that contain high-level human knowledge, to guide feature learning, in a global-local-global way. Partic… ▽ More

    Submitted 15 March, 2024; originally announced March 2024.

  36. arXiv:2402.18577  [pdf, other

    cs.CV cs.AI

    Motion Guided Token Compression for Efficient Masked Video Modeling

    Authors: Yukun Feng, Yangming Shi, Fengze Liu, Tan Yan

    Abstract: Recent developments in Transformers have achieved notable strides in enhancing video comprehension. Nonetheless, the O($N^2$) computation complexity associated with attention mechanisms presents substantial computational hurdles when dealing with the high dimensionality of videos. This challenge becomes particularly pronounced when striving to increase the frames per second (FPS) to enhance the mo… ▽ More

    Submitted 10 January, 2024; originally announced February 2024.

  37. "It Must Be Gesturing Towards Me": Gesture-Based Interaction between Autonomous Vehicles and Pedestrians

    Authors: Xiang Chang, Zihe Chen, Xiaoyan Dong, Yuxin Cai, Tingmin Yan, Haolin Cai, Zherui Zhou, Guyue Zhou, Jiangtao Gong

    Abstract: Interacting with pedestrians understandably and efficiently is one of the toughest challenges faced by autonomous vehicles (AVs) due to the limitations of current algorithms and external human-machine interfaces (eHMIs). In this paper, we design eHMIs based on gestures inspired by the most popular method of interaction between pedestrians and human drivers. Eight common gestures were selected to c… ▽ More

    Submitted 22 February, 2024; originally announced February 2024.

    Comments: 26 pages,22 figures

    MSC Class: H.5.2

    Journal ref: CHI2024

  38. arXiv:2402.14345  [pdf, other

    cs.CV

    An Error-Matching Exclusion Method for Accelerating Visual SLAM

    Authors: Shaojie Zhang, Yinghui Wang, Jiaxing Ma, Wei Li, Jinlong Yang, Tao Yan, Yukai Wang, Liangyi Huang, Mingfeng Wang, Ibragim R. Atadjanov

    Abstract: In Visual SLAM, achieving accurate feature matching consumes a significant amount of time, severely impacting the real-time performance of the system. This paper proposes an accelerated method for Visual SLAM by integrating GMS (Grid-based Motion Statistics) with RANSAC (Random Sample Consensus) for the removal of mismatched features. The approach first utilizes the GMS algorithm to estimate the q… ▽ More

    Submitted 25 February, 2024; v1 submitted 22 February, 2024; originally announced February 2024.

  39. arXiv:2402.13488  [pdf, other

    cs.CV

    A Feature Matching Method Based on Multi-Level Refinement Strategy

    Authors: Shaojie Zhang, Yinghui Wang, Jiaxing Ma, Wei Li, Jinlong Yang, Tao Yan, Yukai Wang, Liangyi Huang, Mingfeng Wang, Ibragim R. Atadjanov

    Abstract: Feature matching is a fundamental and crucial process in visual SLAM, and precision has always been a challenging issue in feature matching. In this paper, based on a multi-level fine matching strategy, we propose a new feature matching method called KTGP-ORB. This method utilizes the similarity of local appearance in the Hamming space generated by feature descriptors to establish initial correspo… ▽ More

    Submitted 25 February, 2024; v1 submitted 20 February, 2024; originally announced February 2024.

  40. arXiv:2402.11431  [pdf, other

    cs.CV

    A Robust Error-Resistant View Selection Method for 3D Reconstruction

    Authors: Shaojie Zhang, Yinghui Wang, Bin Nan, Wei Li, Jinlong Yang, Tao Yan, Yukai Wang, Liangyi Huang, Mingfeng Wang, Ibragim R. Atadjanov

    Abstract: To address the issue of increased triangulation uncertainty caused by selecting views with small camera baselines in Structure from Motion (SFM) view selection, this paper proposes a robust error-resistant view selection method. The method utilizes a triangulation-based computation to obtain an error-resistant model, which is then used to construct an error-resistant matrix. The sorting results of… ▽ More

    Submitted 25 February, 2024; v1 submitted 17 February, 2024; originally announced February 2024.

  41. arXiv:2402.11170  [pdf, other

    econ.GN cs.CR cs.CY cs.DB cs.DC

    Analyzing Reward Dynamics and Decentralization in Ethereum 2.0: An Advanced Data Engineering Workflow and Comprehensive Datasets for Proof-of-Stake Incentives

    Authors: Tao Yan, Shengnan Li, Benjamin Kraner, Luyao Zhang, Claudio J. Tessone

    Abstract: Ethereum 2.0, as the preeminent smart contract blockchain platform, guarantees the precise execution of applications without third-party intervention. At its core, this system leverages the Proof-of-Stake (PoS) consensus mechanism, which utilizes a stochastic process to select validators for block proposal and validation, consequently rewarding them for their contributions. However, the implementa… ▽ More

    Submitted 16 February, 2024; originally announced February 2024.

  42. arXiv:2402.11138  [pdf, other

    cs.CL cs.AI cs.LG

    Contrastive Instruction Tuning

    Authors: Tianyi Lorena Yan, Fei Wang, James Y. Huang, Wenxuan Zhou, Fan Yin, Aram Galstyan, Wenpeng Yin, Muhao Chen

    Abstract: Instruction tuning has been used as a promising approach to improve the performance of large language models (LLMs) on unseen tasks. However, current LLMs exhibit limited robustness to unseen instructions, generating inconsistent outputs when the same instruction is phrased with slightly varied forms or language styles. This behavior indicates LLMs' lack of robustness to textual variations and gen… ▽ More

    Submitted 6 June, 2024; v1 submitted 16 February, 2024; originally announced February 2024.

    Comments: ACL 2024 Findings

  43. arXiv:2402.09724  [pdf, other

    cs.CV

    Region Feature Descriptor Adapted to High Affine Transformations

    Authors: Shaojie Zhang, Yinghui Wang, Bin Nan, Wei Li, Jinlong Yang, Tao Yan, Yukai Wang, Liangyi Huang, Mingfeng Wang, Ibragim R. Atadjanov

    Abstract: To address the issue of feature descriptors being ineffective in representing grayscale feature information when images undergo high affine transformations, leading to a rapid decline in feature matching accuracy, this paper proposes a region feature descriptor based on simulating affine transformations using classification. The proposed method initially categorizes images with different affine de… ▽ More

    Submitted 25 February, 2024; v1 submitted 15 February, 2024; originally announced February 2024.

  44. arXiv:2402.07083  [pdf, other

    cs.CV

    A Highlight Removal Method for Capsule Endoscopy Images

    Authors: Shaojie Zhang, Yinghui Wang, Peixuan Liu, Wei Li, Jinlong Yang, Tao Yan, Yukai Wang, Liangyi Huang, Mingfeng Wang, Ibragim R. Atadjanov

    Abstract: The images captured by Wireless Capsule Endoscopy (WCE) always exhibit specular reflections, and removing highlights while preserving the color and texture in the region remains a challenge. To address this issue, this paper proposes a highlight removal method for capsule endoscopy images. Firstly, the confidence and feature terms of the highlight region's edges are computed, where confidence is o… ▽ More

    Submitted 25 February, 2024; v1 submitted 10 February, 2024; originally announced February 2024.

  45. arXiv:2401.16353  [pdf, ps, other

    cs.CR

    Empirical and Theoretical Analysis of Liquid Staking Protocols

    Authors: Krzysztof Gogol, Benjamin Kraner, Malte Schlosser, Tao Yan, Claudio Tessone, Burkhard Stiller

    Abstract: Liquid staking has become the largest category of decentralized finance protocols in terms of total value locked. However, few studies exist on its implementation designs or underlying risks. The liquid staking protocols allow for earning staking rewards without the disadvantage of locking the capital at the validators. Yet, they are seen by some as a threat to the Proof-of-Stake blockchain securi… ▽ More

    Submitted 29 January, 2024; originally announced January 2024.

    Report number: ChainScience/2023/21

  46. arXiv:2312.11816  [pdf, other

    cs.AI cs.CV

    A Dual-way Enhanced Framework from Text Matching Point of View for Multimodal Entity Linking

    Authors: Shezheng Song, Shan Zhao, Chengyu Wang, Tianwei Yan, Shasha Li, Xiaoguang Mao, Meng Wang

    Abstract: Multimodal Entity Linking (MEL) aims at linking ambiguous mentions with multimodal information to entity in Knowledge Graph (KG) such as Wikipedia, which plays a key role in many applications. However, existing methods suffer from shortcomings, including modality impurity such as noise in raw image and ambiguous textual entity representation, which puts obstacles to MEL. We formulate multimodal en… ▽ More

    Submitted 31 July, 2024; v1 submitted 18 December, 2023; originally announced December 2023.

    Comments: AAAI23 Accept

  47. arXiv:2310.14214  [pdf, other

    cs.CV cs.MM

    TransY-Net:Learning Fully Transformer Networks for Change Detection of Remote Sensing Images

    Authors: Tianyu Yan, Zifu Wan, Pingping Zhang, Gong Cheng, Huchuan Lu

    Abstract: In the remote sensing field, Change Detection (CD) aims to identify and localize the changed regions from dual-phase images over the same places. Recently, it has achieved great progress with the advances of deep learning. However, current methods generally deliver incomplete CD regions and irregular CD boundaries due to the limited representation ability of the extracted visual features. To relie… ▽ More

    Submitted 22 October, 2023; originally announced October 2023.

    Comments: This work is accepted by TGRS2023. It is an extension of our ACCV2022 paper and arXiv:2210.00757

  48. arXiv:2310.06290  [pdf, other

    cs.AR cs.LG

    Gem5Pred: Predictive Approaches For Gem5 Simulation Time

    Authors: Tian Yan, Xueyang Li, Sifat Ut Taki, Saeid Mehrdad

    Abstract: Gem5, an open-source, flexible, and cost-effective simulator, is widely recognized and utilized in both academic and industry fields for hardware simulation. However, the typically time-consuming nature of simulating programs on Gem5 underscores the need for a predictive model that can estimate simulation time. As of now, no such dataset or model exists. In response to this gap, this paper makes a… ▽ More

    Submitted 10 October, 2023; originally announced October 2023.

  49. arXiv:2309.08845  [pdf, other

    cs.CL cs.CY

    Has Sentiment Returned to the Pre-pandemic Level? A Sentiment Analysis Using U.S. College Subreddit Data from 2019 to 2022

    Authors: Tian Yan, Fang Liu

    Abstract: As impact of COVID-19 pandemic winds down, both individuals and society gradually return to pre-pandemic activities. This study aims to explore how people's emotions have changed from the pre-pandemic during the pandemic to post-emergency period and whether it has returned to pre-pandemic level. We collected Reddit data in 2019 (pre-pandemic), 2020 (peak pandemic), 2021, and 2022 (late stages of p… ▽ More

    Submitted 15 September, 2023; originally announced September 2023.

  50. arXiv:2308.16518  [pdf, other

    cs.CV

    MS23D: A 3D Object Detection Method Using Multi-Scale Semantic Feature Points to Construct 3D Feature Layer

    Authors: Yongxin Shao, Aihong Tan, Binrui Wang, Tianhong Yan, Zhetao Sun, Yiyang Zhang, Jiaxin Liu

    Abstract: LiDAR point clouds can effectively depict the motion and posture of objects in three-dimensional space. Many studies accomplish the 3D object detection by voxelizing point clouds. However, in autonomous driving scenarios, the sparsity and hollowness of point clouds create some difficulties for voxel-based methods. The sparsity of point clouds makes it challenging to describe the geometric features… ▽ More

    Submitted 10 August, 2024; v1 submitted 31 August, 2023; originally announced August 2023.