Skip to main content

Showing 1–15 of 15 results for author: Fang, I

.
  1. arXiv:2506.09930  [pdf, ps, other

    cs.RO cs.CV

    From Intention to Execution: Probing the Generalization Boundaries of Vision-Language-Action Models

    Authors: Irving Fang, Juexiao Zhang, Shengbang Tong, Chen Feng

    Abstract: One promise that Vision-Language-Action (VLA) models hold over traditional imitation learning for robotics is to leverage the broad generalization capabilities of large Vision-Language Models (VLMs) to produce versatile, "generalist" robot policies. However, current evaluations of VLAs remain insufficient. Traditional imitation learning benchmarks are unsuitable due to the lack of language instruc… ▽ More

    Submitted 11 June, 2025; originally announced June 2025.

    Comments: Under review

  2. arXiv:2506.04676  [pdf, ps, other

    cs.CV cs.AI cs.LG cs.MA

    Gen-n-Val: Agentic Image Data Generation and Validation

    Authors: Jing-En Huang, I-Sheng Fang, Tzuhsuan Huang, Chih-Yu Wang, Jun-Cheng Chen

    Abstract: Recently, Large Language Models (LLMs) and Vision Large Language Models (VLLMs) have demonstrated impressive performance as agents across various tasks while data scarcity and label noise remain significant challenges in computer vision tasks, such as object detection and instance segmentation. A common solution for resolving these issues is to generate synthetic data. However, current synthetic d… ▽ More

    Submitted 5 June, 2025; originally announced June 2025.

  3. arXiv:2504.10090  [pdf, other

    cs.CV cs.CL

    CameraBench: Benchmarking Visual Reasoning in MLLMs via Photography

    Authors: I-Sheng Fang, Jun-Cheng Chen

    Abstract: Large language models (LLMs) and multimodal large language models (MLLMs) have significantly advanced artificial intelligence. However, visual reasoning, reasoning involving both visual and textual inputs, remains underexplored. Recent advancements, including the reasoning models like OpenAI o1 and Gemini 2.0 Flash Thinking, which incorporate image inputs, have opened this capability. In this ongo… ▽ More

    Submitted 17 April, 2025; v1 submitted 14 April, 2025; originally announced April 2025.

  4. arXiv:2504.05400  [pdf, other

    cs.CV cs.AI

    GARF: Learning Generalizable 3D Reassembly for Real-World Fractures

    Authors: Sihang Li, Zeyu Jiang, Grace Chen, Chenyang Xu, Siqi Tan, Xue Wang, Irving Fang, Kristof Zyskowski, Shannon P. McPherron, Radu Iovita, Chen Feng, Jing Zhang

    Abstract: 3D reassembly is a challenging spatial intelligence task with broad applications across scientific domains. While large-scale synthetic datasets have fueled promising learning-based approaches, their generalizability to different domains is limited. Critically, it remains uncertain whether models trained on synthetic datasets can generalize to real-world fractures where breakage patterns are more… ▽ More

    Submitted 7 April, 2025; originally announced April 2025.

    Comments: 15 pages, 11 figures. Project Page https://ai4ce.github.io/GARF/

  5. arXiv:2410.08792  [pdf, other

    cs.RO cs.AI cs.CV cs.LG

    VLM See, Robot Do: Human Demo Video to Robot Action Plan via Vision Language Model

    Authors: Beichen Wang, Juexiao Zhang, Shuwen Dong, Irving Fang, Chen Feng

    Abstract: Vision Language Models (VLMs) have recently been adopted in robotics for their capability in common sense reasoning and generalizability. Existing work has applied VLMs to generate task and motion planning from natural language instructions and simulate training data for robot learning. In this work, we explore using VLM to interpret human demonstration videos and generate robot task planning. Our… ▽ More

    Submitted 11 October, 2024; originally announced October 2024.

  6. arXiv:2410.08282  [pdf, other

    cs.RO cs.AI cs.CV cs.GR

    FusionSense: Bridging Common Sense, Vision, and Touch for Robust Sparse-View Reconstruction

    Authors: Irving Fang, Kairui Shi, Xujin He, Siqi Tan, Yifan Wang, Hanwen Zhao, Hung-Jui Huang, Wenzhen Yuan, Chen Feng, Jing Zhang

    Abstract: Humans effortlessly integrate common-sense knowledge with sensory input from vision and touch to understand their surroundings. Emulating this capability, we introduce FusionSense, a novel 3D reconstruction framework that enables robots to fuse priors from foundation models with highly sparse observations from vision and tactile sensors. FusionSense addresses three key challenges: (i) How can robo… ▽ More

    Submitted 10 October, 2024; originally announced October 2024.

    ACM Class: I.4.5; I.4.8

  7. arXiv:2403.13171  [pdf, other

    cs.CV

    LUWA Dataset: Learning Lithic Use-Wear Analysis on Microscopic Images

    Authors: Jing Zhang, Irving Fang, Juexiao Zhang, Hao Wu, Akshat Kaushik, Alice Rodriguez, Hanwen Zhao, Zhuo Zheng, Radu Iovita, Chen Feng

    Abstract: Lithic Use-Wear Analysis (LUWA) using microscopic images is an underexplored vision-for-science research area. It seeks to distinguish the worked material, which is critical for understanding archaeological artifacts, material interactions, tool functionalities, and dental records. However, this challenging task goes beyond the well-studied image classification problem for common objects. It is af… ▽ More

    Submitted 27 March, 2024; v1 submitted 19 March, 2024; originally announced March 2024.

    Comments: CVPR

  8. arXiv:2403.05046  [pdf, other

    cs.RO

    EgoPAT3Dv2: Predicting 3D Action Target from 2D Egocentric Vision for Human-Robot Interaction

    Authors: Irving Fang, Yuzhong Chen, Yifan Wang, Jianghan Zhang, Qiushi Zhang, Jiali Xu, Xibo He, Weibo Gao, Hao Su, Yiming Li, Chen Feng

    Abstract: A robot's ability to anticipate the 3D action target location of a hand's movement from egocentric videos can greatly improve safety and efficiency in human-robot interaction (HRI). While previous research predominantly focused on semantic action classification or 2D target region prediction, we argue that predicting the action target's 3D coordinate could pave the way for more versatile downstrea… ▽ More

    Submitted 7 March, 2024; originally announced March 2024.

    Comments: 6 pages. Accepted at ICRA 2024

  9. arXiv:2304.11354  [pdf

    cs.CV cs.GR

    Medium. Permeation: SARS-COV-2 Painting Creation by Generative Model

    Authors: Yuan-Fu Yang, Iuan-Kai Fang, Min Sun, Su-Chu Hsu

    Abstract: Airborne particles are the medium for SARS-CoV-2 to invade the human body. Light also reflects through suspended particles in the air, allowing people to see a colorful world. Impressionism is the most prominent art school that explores the spectrum of color created through color reflection of light. We find similarities of color structure and color stacking in the Impressionist paintings and the… ▽ More

    Submitted 22 April, 2023; originally announced April 2023.

    Comments: Keywords: SARS-CoV-2; Generative Art; Graph Neural Network. arXiv admin note: text overlap with arXiv:1706.07068 by other authors

  10. arXiv:2303.09192  [pdf, other

    cs.RO

    Metric-Free Exploration for Topological Mapping by Task and Motion Imitation in Feature Space

    Authors: Yuhang He, Irving Fang, Yiming Li, Rushi Bhavesh Shah, Chen Feng

    Abstract: We propose DeepExplorer, a simple and lightweight metric-free exploration method for topological mapping of unknown environments. It performs task and motion planning (TAMP) entirely in image feature space. The task planner is a recurrent network using the latest image observation sequence to hallucinate a feature as the next-best exploration goal. The motion planner then utilizes the current and… ▽ More

    Submitted 16 March, 2023; originally announced March 2023.

  11. arXiv:2111.07552  [pdf, other

    eess.SY cs.RO

    Dynamic Placement of Rapidly Deployable Mobile Sensor Robots Using Machine Learning and Expected Value of Information

    Authors: Alice Agogino, Hae Young Jang, Vivek Rao, Ritik Batra, Felicity Liao, Rohan Sood, Irving Fang, R. Lily Hu, Emerson Shoichet-Bartus, John Matranga

    Abstract: Although the Industrial Internet of Things has increased the number of sensors permanently installed in industrial plants, there will be gaps in coverage due to broken sensors or sparse density in very large plants, such as in the petrochemical industry. Modern emergency response operations are beginning to use Small Unmanned Aerial Systems (sUAS) that have the ability to drop sensor robots to pre… ▽ More

    Submitted 15 November, 2021; originally announced November 2021.

    Comments: 14 pages, 11 figures, IMECE2021

  12. arXiv:2102.01173  [pdf, ps, other

    cs.LG cs.AI cs.MM

    Multi-modal Ensemble Models for Predicting Video Memorability

    Authors: Tony Zhao, Irving Fang, Jeffrey Kim, Gerald Friedland

    Abstract: Modeling media memorability has been a consistent challenge in the field of machine learning. The Predicting Media Memorability task in MediaEval2020 is the latest benchmark among similar challenges addressing this topic. Building upon techniques developed in previous iterations of the challenge, we developed ensemble methods with the use of extracted video, image, text, and audio features. Critic… ▽ More

    Submitted 1 February, 2021; originally announced February 2021.

  13. arXiv:1912.01701  [pdf, other

    cs.CR

    An Off-Chip Attack on Hardware Enclaves via the Memory Bus

    Authors: Dayeol Lee, Dongha Jung, Ian T. Fang, Chia-Che Tsai, Raluca Ada Popa

    Abstract: This paper shows how an attacker can break the confidentiality of a hardware enclave with Membuster, an off-chip attack based on snooping the memory bus. An attacker with physical access can observe an unencrypted address bus and extract fine-grained memory access patterns of the victim. Membuster is qualitatively different from prior on-chip attacks to enclaves and is more difficult to thwart.… ▽ More

    Submitted 3 December, 2019; originally announced December 2019.

    Comments: In proceedings of the 29th USENIX Security Symposium, 2020, 18 pages

  14. arXiv:1812.03910  [pdf, other

    cs.CV

    Self-Contained Stylization via Steganography for Reverse and Serial Style Transfer

    Authors: Hung-Yu Chen, I-Sheng Fang, Wei-Chen Chiu

    Abstract: Style transfer has been widely applied to give real-world images a new artistic look. However, given a stylized image, the attempts to use typical style transfer methods for de-stylization or transferring it again into another style usually lead to artifacts or undesired results. We realize that these issues are originated from the content inconsistency between the original image and its stylized… ▽ More

    Submitted 9 January, 2020; v1 submitted 10 December, 2018; originally announced December 2018.

    Comments: 21 pages, 21 figures

  15. arXiv:1211.7019  [pdf

    physics.ins-det hep-ex physics.acc-ph

    Mu2e Conceptual Design Report

    Authors: The Mu2e Project, Collaboration, :, R. J. Abrams, D. Alezander, G. Ambrosio, N. Andreev, C. M. Ankenbrandt, D. M. Asner, D. Arnold, A. Artikov, E. Barnes, L. Bartoszek, R. H. Bernstein, K. Biery, V. Biliyar, R. Bonicalzi, R. Bossert, M. Bowden, J. Brandt, D. N. Brown, J. Budagov, M. Buehler, A. Burov, R. Carcagno , et al. (203 additional authors not shown)

    Abstract: Mu2e at Fermilab will search for charged lepton flavor violation via the coherent conversion process mu- N --> e- N with a sensitivity approximately four orders of magnitude better than the current world's best limits for this process. The experiment's sensitivity offers discovery potential over a wide array of new physics models and probes mass scales well beyond the reach of the LHC. We describe… ▽ More

    Submitted 29 November, 2012; originally announced November 2012.

    Comments: 562 pages, 339 figures

    Report number: Fermilab-TM-2545