Skip to main content

Showing 1–31 of 31 results for author: Man, Y

Searching in archive cs. Search in all archives.
.
  1. arXiv:2506.00084  [pdf, ps, other

    cs.RO physics.flu-dyn

    Navigation of a Three-Link Microswimmer via Deep Reinforcement Learning

    Authors: Yuyang Lai, Sina Heydari, On Shun Pak, Yi Man

    Abstract: Motile microorganisms develop effective swimming gaits to adapt to complex biological environments. Translating this adaptability to smart microrobots presents significant challenges in motion planning and stroke design. In this work, we explore the use of reinforcement learning (RL) to develop stroke patterns for targeted navigation in a three-link swimmer model at low Reynolds numbers. Specifica… ▽ More

    Submitted 29 May, 2025; originally announced June 2025.

  2. arXiv:2505.23766  [pdf, ps, other

    cs.CV

    Argus: Vision-Centric Reasoning with Grounded Chain-of-Thought

    Authors: Yunze Man, De-An Huang, Guilin Liu, Shiwei Sheng, Shilong Liu, Liang-Yan Gui, Jan Kautz, Yu-Xiong Wang, Zhiding Yu

    Abstract: Recent advances in multimodal large language models (MLLMs) have demonstrated remarkable capabilities in vision-language tasks, yet they often struggle with vision-centric scenarios where precise visual focus is needed for accurate reasoning. In this paper, we introduce Argus to address these limitations with a new visual attention grounding mechanism. Our approach employs object-centric grounding… ▽ More

    Submitted 29 May, 2025; originally announced May 2025.

    Comments: CVPR 2025. Project Page: https://yunzeman.github.io/argus/

  3. arXiv:2504.21060  [pdf, other

    physics.soc-ph cs.GT cs.SI econ.EM econ.TH math.OC

    Construct to Commitment: The Effect of Narratives on Economic Growth

    Authors: Hanyuan Jiang, Yi Man

    Abstract: We study how government-led narratives through mass media evolve from construct, a mechanism for framing expectations, into commitment, a sustainable pillar for growth. We propose the ``Narratives-Construct-Commitment (NCC)" framework outlining the mechanism and institutionalization of narratives, and formalize it as a dynamic Bayesian game. Using the Innovation-Driven Development Strategy (2016)… ▽ More

    Submitted 29 April, 2025; originally announced April 2025.

    ACM Class: I.2.6; I.2.11; I.6.5; J.4; H.2.8; F.2.2; G.1.6

  4. arXiv:2504.10568  [pdf, other

    cs.CV

    AgMMU: A Comprehensive Agricultural Multimodal Understanding and Reasoning Benchmark

    Authors: Aruna Gauba, Irene Pi, Yunze Man, Ziqi Pang, Vikram S. Adve, Yu-Xiong Wang

    Abstract: We curate a dataset AgMMU for evaluating and developing vision-language models (VLMs) to produce factually accurate answers for knowledge-intensive expert domains. Our AgMMU concentrates on one of the most socially beneficial domains, agriculture, which requires connecting detailed visual observation with precise knowledge to diagnose, e.g., pest identification, management instructions, etc. As a… ▽ More

    Submitted 14 April, 2025; originally announced April 2025.

    Comments: Project Website: https://agmmu.github.io/ Huggingface: https://huggingface.co/datasets/AgMMU/AgMMU_v1/

  5. arXiv:2412.04471  [pdf, other

    cs.CV cs.AI

    PaintScene4D: Consistent 4D Scene Generation from Text Prompts

    Authors: Vinayak Gupta, Yunze Man, Yu-Xiong Wang

    Abstract: Recent advances in diffusion models have revolutionized 2D and 3D content creation, yet generating photorealistic dynamic 4D scenes remains a significant challenge. Existing dynamic 4D generation methods typically rely on distilling knowledge from pre-trained 3D generative models, often fine-tuned on synthetic object datasets. Consequently, the resulting scenes tend to be object-centric and lack p… ▽ More

    Submitted 28 March, 2025; v1 submitted 5 December, 2024; originally announced December 2024.

    Comments: Preprint. Project page: https://paintscene4d.github.io/

  6. arXiv:2412.01827  [pdf, other

    cs.CV cs.AI

    RandAR: Decoder-only Autoregressive Visual Generation in Random Orders

    Authors: Ziqi Pang, Tianyuan Zhang, Fujun Luan, Yunze Man, Hao Tan, Kai Zhang, William T. Freeman, Yu-Xiong Wang

    Abstract: We introduce RandAR, a decoder-only visual autoregressive (AR) model capable of generating images in arbitrary token orders. Unlike previous decoder-only AR models that rely on a predefined generation order, RandAR removes this inductive bias, unlocking new capabilities in decoder-only generation. Our essential design enables random order by inserting a "position instruction token" before each ima… ▽ More

    Submitted 2 December, 2024; originally announced December 2024.

    Comments: Project page: https://rand-ar.github.io/

  7. arXiv:2411.12593  [pdf, other

    cs.CV cs.AI

    AdaCM$^2$: On Understanding Extremely Long-Term Video with Adaptive Cross-Modality Memory Reduction

    Authors: Yuanbin Man, Ying Huang, Chengming Zhang, Bingzhe Li, Wei Niu, Miao Yin

    Abstract: The advancements in large language models (LLMs) have propelled the improvement of video understanding tasks by incorporating LLMs with visual models. However, most existing LLM-based models (e.g., VideoLLaMA, VideoChat) are constrained to processing short-duration videos. Recent attempts to understand long-term videos by extracting and compressing visual features into a fixed memory size. Neverth… ▽ More

    Submitted 4 April, 2025; v1 submitted 19 November, 2024; originally announced November 2024.

    Comments: CVPR 2025 Highlight

  8. Differentiable architecture search with multi-dimensional attention for spiking neural networks

    Authors: Yilei Man, Linhai Xie, Shushan Qiao, Yumei Zhou, Delong Shang

    Abstract: Spiking Neural Networks (SNNs) have gained enormous popularity in the field of artificial intelligence due to their low power consumption. However, the majority of SNN methods directly inherit the structure of Artificial Neural Networks (ANN), usually leading to sub-optimal model performance in SNNs. To alleviate this problem, we integrate Neural Architecture Search (NAS) method and propose Multi-… ▽ More

    Submitted 1 November, 2024; originally announced November 2024.

  9. arXiv:2410.09049  [pdf, other

    cs.CV

    SceneCraft: Layout-Guided 3D Scene Generation

    Authors: Xiuyu Yang, Yunze Man, Jun-Kun Chen, Yu-Xiong Wang

    Abstract: The creation of complex 3D scenes tailored to user specifications has been a tedious and challenging task with traditional 3D modeling tools. Although some pioneering methods have achieved automatic text-to-3D generation, they are generally limited to small-scale scenes with restricted control over the shape and texture. We introduce SceneCraft, a novel method for generating detailed indoor scenes… ▽ More

    Submitted 8 May, 2025; v1 submitted 11 October, 2024; originally announced October 2024.

    Comments: NeurIPS 2024. Code: https://github.com/OrangeSodahub/SceneCraft Project Page: https://orangesodahub.github.io/SceneCraft

  10. arXiv:2409.03757  [pdf, other

    cs.CV cs.AI cs.CL cs.LG cs.RO

    Lexicon3D: Probing Visual Foundation Models for Complex 3D Scene Understanding

    Authors: Yunze Man, Shuhong Zheng, Zhipeng Bao, Martial Hebert, Liang-Yan Gui, Yu-Xiong Wang

    Abstract: Complex 3D scene understanding has gained increasing attention, with scene encoding strategies playing a crucial role in this success. However, the optimal scene encoding strategies for various scenarios remain unclear, particularly compared to their image-based counterparts. To address this issue, we present a comprehensive study that probes various visual encoding models for 3D scene understandi… ▽ More

    Submitted 8 May, 2025; v1 submitted 5 September, 2024; originally announced September 2024.

    Comments: NeurIPS 2024. Project page: https://yunzeman.github.io/lexicon3d Github: https://github.com/YunzeMan/Lexicon3D

  11. arXiv:2407.18914  [pdf, other

    cs.CV

    Floating No More: Object-Ground Reconstruction from a Single Image

    Authors: Yunze Man, Yichen Sheng, Jianming Zhang, Liang-Yan Gui, Yu-Xiong Wang

    Abstract: Recent advancements in 3D object reconstruction from single images have primarily focused on improving the accuracy of object shapes. Yet, these techniques often fail to accurately capture the inter-relation between the object, ground, and camera. As a result, the reconstructed objects often appear floating or tilted when placed on flat surfaces. This limitation significantly affects 3D-aware imag… ▽ More

    Submitted 26 July, 2024; originally announced July 2024.

    Comments: Project Page: https://yunzeman.github.io/ORG/

  12. arXiv:2406.07544  [pdf, other

    cs.CV cs.AI cs.CL cs.LG

    Situational Awareness Matters in 3D Vision Language Reasoning

    Authors: Yunze Man, Liang-Yan Gui, Yu-Xiong Wang

    Abstract: Being able to carry out complicated vision language reasoning tasks in 3D space represents a significant milestone in developing household robots and human-centered embodied AI. In this work, we demonstrate that a critical and distinct challenge in 3D vision language reasoning is situational awareness, which incorporates two key components: (1) The autonomous agent grounds its self-location based… ▽ More

    Submitted 26 June, 2024; v1 submitted 11 June, 2024; originally announced June 2024.

    Comments: CVPR 2024. Project Page: https://yunzeman.github.io/situation3d

  13. arXiv:2404.12208  [pdf, ps, other

    cs.CR cs.IT

    The Explicit values of the UBCT, the LBCT and the DBCT of the inverse function

    Authors: Yuying Man, Nian Li, Zhen Liu, Xiangyong Zeng

    Abstract: Substitution boxes (S-boxes) play a significant role in ensuring the resistance of block ciphers against various attacks. The Upper Boomerang Connectivity Table (UBCT), the Lower Boomerang Connectivity Table (LBCT) and the Double Boomerang Connectivity Table (DBCT) of a given S-box are crucial tools to analyze its security concerning specific attacks. However, there are currently no related result… ▽ More

    Submitted 18 April, 2024; originally announced April 2024.

    Comments: This manuscript was submitted to Finite Fields and Their Application on April 8, 2024. arXiv admin note: text overlap with arXiv:2309.01881

  14. arXiv:2312.16385  [pdf, other

    cs.DC

    Analytical Insight of Earth: A Cloud-Platform of Intelligent Computing for Geospatial Big Data

    Authors: Hao Xu, Yuanbin Man, Mingyang Yang, Jichao Wu, Qi Zhang, Jing Wang

    Abstract: The rapid accumulation of Earth observation data presents a formidable challenge for the processing capabilities of traditional remote sensing desktop software, particularly when it comes to analyzing expansive geographical areas and prolonged temporal sequences. Cloud computing has emerged as a transformative solution, surmounting the barriers traditionally associated with the management and comp… ▽ More

    Submitted 26 December, 2023; originally announced December 2023.

  15. arXiv:2310.18568  [pdf, ps, other

    cs.IT

    On the second-order zero differential spectra of some power functions over finite fields

    Authors: Yuying Man, Nian Li, Zejun Xiang, Xiangyong Zeng

    Abstract: Boukerrou et al. (IACR Trans. Symmetric Cryptol. 2020(1), 331-362) introduced the notion of Feistel Boomerang Connectivity Table (FBCT), the Feistel counterpart of the Boomerang Connectivity Table (BCT), and the Feistel boomerang uniformity (which is the same as the second-order zero differential uniformity in even characteristic). FBCT is a crucial table for the analysis of the resistance of bloc… ▽ More

    Submitted 27 October, 2023; originally announced October 2023.

  16. arXiv:2310.12973  [pdf, other

    cs.CV cs.AI cs.CL cs.LG

    Frozen Transformers in Language Models Are Effective Visual Encoder Layers

    Authors: Ziqi Pang, Ziyang Xie, Yunze Man, Yu-Xiong Wang

    Abstract: This paper reveals that large language models (LLMs), despite being trained solely on textual data, are surprisingly strong encoders for purely visual tasks in the absence of language. Even more intriguingly, this can be achieved by a simple yet previously overlooked strategy -- employing a frozen transformer block from pre-trained LLMs as a constituent encoder layer to directly process visual tok… ▽ More

    Submitted 6 May, 2024; v1 submitted 19 October, 2023; originally announced October 2023.

    Comments: ICLR 2024 Spotlight. 23 pages, 13 figures. Code at https://github.com/ziqipang/LM4VisualEncoding

  17. arXiv:2309.01881  [pdf, ps, other

    cs.IT

    In-depth analysis of S-boxes over binary finite fields concerning their differential and Feistel boomerang differential uniformities

    Authors: Yuying Man, Sihem Mesnager, Nian Li, Xiangyong Zeng, Xiaohu Tang

    Abstract: Substitution boxes (S-boxes) play a significant role in ensuring the resistance of block ciphers against various attacks. The Difference Distribution Table (DDT), the Feistel Boomerang Connectivity Table (FBCT), the Feistel Boomerang Difference Table (FBDT) and the Feistel Boomerang Extended Table (FBET) of a given S-box are crucial tools to analyze its security concerning specific attacks. Howeve… ▽ More

    Submitted 4 September, 2023; originally announced September 2023.

  18. arXiv:2305.03724  [pdf, other

    cs.CV cs.AI cs.RO

    DualCross: Cross-Modality Cross-Domain Adaptation for Monocular BEV Perception

    Authors: Yunze Man, Liang-Yan Gui, Yu-Xiong Wang

    Abstract: Closing the domain gap between training and deployment and incorporating multiple sensor modalities are two challenging yet critical topics for self-driving. Existing work only focuses on single one of the above topics, overlooking the simultaneous domain and modality shift which pervasively exists in real-world scenarios. A model trained with multi-sensor data collected in Europe may need to run… ▽ More

    Submitted 11 June, 2024; v1 submitted 5 May, 2023; originally announced May 2023.

    Comments: IROS 2023. Project website: https://yunzeman.github.io/DualCross

  19. arXiv:2304.03144  [pdf, other

    cs.AI cs.SI

    BotTriNet: A Unified and Efficient Embedding for Social Bots Detection via Metric Learning

    Authors: Jun Wu, Xuesong Ye, Yanyuet Man

    Abstract: The rapid and accurate identification of bot accounts in online social networks is an ongoing challenge. In this paper, we propose BOTTRINET, a unified embedding framework that leverages the textual content posted by accounts to detect bots. Our approach is based on the premise that account personalities and habits can be revealed through their contextual content. To achieve this, we designed a tr… ▽ More

    Submitted 6 May, 2023; v1 submitted 6 April, 2023; originally announced April 2023.

    ACM Class: I.2

  20. arXiv:2212.04719  [pdf, ps, other

    cs.IT

    Several new infinite classes of 0-APN power functions over $\mathbb{F}_{2^n}$

    Authors: Yuying Man, Shizhu Tian, Nian Li, Xiangyong Zeng

    Abstract: The investigation of partially APN functions has attracted a lot of research interest recently. In this paper, we present several new infinite classes of 0-APN power functions over $\mathbb{F}_{2^n}$ by using the multivariate method and resultant elimination, and show that these 0-APN power functions are CCZ-inequivalent to the known ones.

    Submitted 9 December, 2022; originally announced December 2022.

    Comments: arXiv admin note: text overlap with arXiv:2210.02207, arXiv:2210.15103 by other authors

  21. arXiv:2204.08118  [pdf, ps, other

    cs.CR cs.IT

    On the Differential Properties of the Power Mapping $x^{p^m+2}$

    Authors: Yuying Man, Yongbo Xia, Chunlei Li, Tor Helleseth

    Abstract: Let $m$ be a positive integer and $p$ a prime. In this paper, we investigate the differential properties of the power mapping $x^{p^m+2}$ over $\mathbb{F}_{p^n}$, where $n=2m$ or $n=2m-1$. For the case $n=2m$, by transforming the derivative equation of $x^{p^m+2}$ and studying some related equations, we completely determine the differential spectrum of this power mapping. For the case $n=2m-1$, th… ▽ More

    Submitted 17 April, 2022; originally announced April 2022.

  22. arXiv:2112.02446  [pdf, other

    cs.LG

    Fast Graph Neural Tangent Kernel via Kronecker Sketching

    Authors: Shunhua Jiang, Yunze Man, Zhao Song, Zheng Yu, Danyang Zhuo

    Abstract: Many deep learning tasks have to deal with graphs (e.g., protein structures, social networks, source code abstract syntax trees). Due to the importance of these tasks, people turned to Graph Neural Networks (GNNs) as the de facto method for learning on graphs. GNNs have become widely applied due to their convincing performance. Unfortunately, one major barrier to using GNNs is that GNNs require su… ▽ More

    Submitted 4 December, 2021; originally announced December 2021.

    Comments: AAAI 2022

  23. arXiv:2107.11470  [pdf, other

    cs.CV

    Multi-Echo LiDAR for 3D Object Detection

    Authors: Yunze Man, Xinshuo Weng, Prasanna Kumar Sivakuma, Matthew O'Toole, Kris Kitani

    Abstract: LiDAR sensors can be used to obtain a wide range of measurement signals other than a simple 3D point cloud, and those signals can be leveraged to improve perception tasks like 3D object detection. A single laser pulse can be partially reflected by multiple objects along its path, resulting in multiple measurements called echoes. Multi-echo measurement can provide information about object contours… ▽ More

    Submitted 23 July, 2021; originally announced July 2021.

  24. arXiv:2107.04013  [pdf, other

    cs.CV cs.AI cs.LG cs.RO

    Multi-Modality Task Cascade for 3D Object Detection

    Authors: Jinhyung Park, Xinshuo Weng, Yunze Man, Kris Kitani

    Abstract: Point clouds and RGB images are naturally complementary modalities for 3D visual understanding - the former provides sparse but accurate locations of points on objects, while the latter contains dense color and texture information. Despite this potential for close sensor fusion, many methods train two models in isolation and use simple feature concatenation to represent 3D sensor data. This separa… ▽ More

    Submitted 8 July, 2021; originally announced July 2021.

  25. arXiv:2008.09506  [pdf, other

    cs.CV cs.LG cs.MA cs.MM cs.RO

    Graph Neural Networks for 3D Multi-Object Tracking

    Authors: Xinshuo Weng, Yongxin Wang, Yunze Man, Kris Kitani

    Abstract: 3D Multi-object tracking (MOT) is crucial to autonomous systems. Recent work often uses a tracking-by-detection pipeline, where the feature of each object is extracted independently to compute an affinity matrix. Then, the affinity matrix is passed to the Hungarian algorithm for data association. A key process of this pipeline is to learn discriminative features for different objects in order to r… ▽ More

    Submitted 20 August, 2020; originally announced August 2020.

    Comments: ECCV 2020 workshop paper. Project website: http://www.xinshuoweng.com/projects/GNN3DMOT. arXiv admin note: substantial text overlap with arXiv:2006.07327

  26. arXiv:2006.07327  [pdf, other

    cs.CV cs.LG eess.IV

    GNN3DMOT: Graph Neural Network for 3D Multi-Object Tracking with Multi-Feature Learning

    Authors: Xinshuo Weng, Yongxin Wang, Yunze Man, Kris Kitani

    Abstract: 3D Multi-object tracking (MOT) is crucial to autonomous systems. Recent work uses a standard tracking-by-detection pipeline, where feature extraction is first performed independently for each object in order to compute an affinity matrix. Then the affinity matrix is passed to the Hungarian algorithm for data association. A key process of this standard pipeline is to learn discriminative features f… ▽ More

    Submitted 12 June, 2020; originally announced June 2020.

    Comments: CVPR 2020. My website for all my research works: http://www.xinshuoweng.com/

  27. arXiv:2001.07792  [pdf, other

    cs.CR cs.CV cs.LG eess.IV

    GhostImage: Remote Perception Attacks against Camera-based Image Classification Systems

    Authors: Yanmao Man, Ming Li, Ryan Gerdes

    Abstract: In vision-based object classification systems imaging sensors perceive the environment and machine learning is then used to detect and classify objects for decision-making purposes; e.g., to maneuver an automated vehicle around an obstacle or to raise an alarm to indicate the presence of an intruder in surveillance settings. In this work we demonstrate how the perception domain can be remotely and… ▽ More

    Submitted 23 June, 2020; v1 submitted 21 January, 2020; originally announced January 2020.

    Comments: Accepted by USENIX RAID 2020. Source code is available at https://github.com/Harry1993/GhostImage

  28. arXiv:1907.01696   

    cs.CV

    A Semi-Supervised Framework for Automatic Pixel-Wise Breast Cancer Grading of Histological Images

    Authors: Yanyuet Man, Xiangyun Ding, Xingcheng Yao, Han Bao

    Abstract: Throughout the world, breast cancer is one of the leading causes of female death. Recently, deep learning methods are developed to automatically grade breast cancer of histological slides. However, the performance of existing deep learning models is limited due to the lack of large annotated biomedical datasets. One promising way to relieve the annotating burden is to leverage the unannotated data… ▽ More

    Submitted 8 March, 2022; v1 submitted 2 July, 2019; originally announced July 2019.

    Comments: The author list and contents of this paper is not complete. Other authors request to withdraw this paper

  29. Deep Q Learning Driven CT Pancreas Segmentation with Geometry-Aware U-Net

    Authors: Yunze Man, Yangsibo Huang, Junyi Feng, Xi Li, Fei Wu

    Abstract: Segmentation of pancreas is important for medical image analysis, yet it faces great challenges of class imbalance, background distractions and non-rigid geometrical features. To address these difficulties, we introduce a Deep Q Network(DQN) driven approach with deformable U-Net to accurately segment the pancreas by explicitly interacting with contextual information and extract anisotropic feature… ▽ More

    Submitted 19 April, 2019; originally announced April 2019.

    Comments: in IEEE Transactions on Medical Imaging (2019)

  30. arXiv:1811.07222  [pdf, other

    cs.CV

    GroundNet: Monocular Ground Plane Normal Estimation with Geometric Consistency

    Authors: Yunze Man, Xinshuo Weng, Xi Li, Kris Kitani

    Abstract: We focus on estimating the 3D orientation of the ground plane from a single image. We formulate the problem as an inter-mingled multi-task prediction problem by jointly optimizing for pixel-wise surface normal direction, ground plane segmentation, and depth estimates. Specifically, our proposed model, GroundNet, first estimates the depth and surface normal in two separate streams, from which two g… ▽ More

    Submitted 9 August, 2019; v1 submitted 17 November, 2018; originally announced November 2018.

    Comments: Camera Ready for ACM MM 2019

  31. arXiv:1607.03575  [pdf, other

    cs.CY

    IntelliAd Understanding In-APP Ad Costs From Users Perspective

    Authors: Cuiyun Gao, Hui Xu, Yichuan Man, Yangfan Zhou, Michael R. Lyu

    Abstract: Ads are an important revenue source for mobile app development, especially for free apps, whose expense can be compensated by ad revenue. The ad benefits also carry with costs. For example, too many ads can interfere the user experience, leading to less user retention and reduced earnings ultimately. In the paper, we aim at understanding the ad costs from users perspective. We utilize app reviews,… ▽ More

    Submitted 12 July, 2016; originally announced July 2016.

    Comments: 12 pages