Skip to main content

Showing 1–21 of 21 results for author: Simao, C

.
  1. arXiv:2506.07725  [pdf, ps, other

    cs.CV cs.AI

    ETA: Efficiency through Thinking Ahead, A Dual Approach to Self-Driving with Large Models

    Authors: Shadi Hamdan, Chonghao Sima, Zetong Yang, Hongyang Li, Fatma Güney

    Abstract: How can we benefit from large models without sacrificing inference speed, a common dilemma in self-driving systems? A prevalent solution is a dual-system architecture, employing a small model for rapid, reactive decisions and a larger model for slower but more informative analyses. Existing dual-system designs often implement parallel architectures where inference is either directly conducted usin… ▽ More

    Submitted 9 June, 2025; originally announced June 2025.

    Comments: ICCV 2025 submission. For code, see https://github.com/opendrivelab/ETA

  2. arXiv:2503.11650  [pdf, other

    cs.RO cs.AI cs.CV cs.LG

    Centaur: Robust End-to-End Autonomous Driving with Test-Time Training

    Authors: Chonghao Sima, Kashyap Chitta, Zhiding Yu, Shiyi Lan, Ping Luo, Andreas Geiger, Hongyang Li, Jose M. Alvarez

    Abstract: How can we rely on an end-to-end autonomous vehicle's complex decision-making system during deployment? One common solution is to have a ``fallback layer'' that checks the planned trajectory for rule violations and replaces it with a pre-defined safe action if necessary. Another approach involves adjusting the planner's decisions to minimize a pre-defined ``cost function'' using additional system… ▽ More

    Submitted 14 March, 2025; originally announced March 2025.

  3. arXiv:2503.06669  [pdf, other

    cs.RO cs.CV cs.LG

    AgiBot World Colosseo: A Large-scale Manipulation Platform for Scalable and Intelligent Embodied Systems

    Authors: AgiBot-World-Contributors, Qingwen Bu, Jisong Cai, Li Chen, Xiuqi Cui, Yan Ding, Siyuan Feng, Shenyuan Gao, Xindong He, Xuan Hu, Xu Huang, Shu Jiang, Yuxin Jiang, Cheng Jing, Hongyang Li, Jialu Li, Chiming Liu, Yi Liu, Yuxiang Lu, Jianlan Luo, Ping Luo, Yao Mu, Yuehan Niu, Yixuan Pan, Jiangmiao Pang , et al. (27 additional authors not shown)

    Abstract: We explore how scalable robot data can address real-world challenges for generalized robotic manipulation. Introducing AgiBot World, a large-scale platform comprising over 1 million trajectories across 217 tasks in five deployment scenarios, we achieve an order-of-magnitude increase in data scale compared to existing datasets. Accelerated by a standardized collection pipeline with human-in-the-loo… ▽ More

    Submitted 30 April, 2025; v1 submitted 9 March, 2025; originally announced March 2025.

    Comments: Project website: https://agibot-world.com/. Github repo: https://github.com/OpenDriveLab/AgiBot-World. The author list is ordered alphabetically by surname, with detailed contributions provided in the appendix

  4. arXiv:2501.04003  [pdf, other

    cs.CV cs.RO

    Are VLMs Ready for Autonomous Driving? An Empirical Study from the Reliability, Data, and Metric Perspectives

    Authors: Shaoyuan Xie, Lingdong Kong, Yuhao Dong, Chonghao Sima, Wenwei Zhang, Qi Alfred Chen, Ziwei Liu, Liang Pan

    Abstract: Recent advancements in Vision-Language Models (VLMs) have sparked interest in their use for autonomous driving, particularly in generating interpretable driving decisions through natural language. However, the assumption that VLMs inherently provide visually grounded, reliable, and interpretable explanations for driving remains largely unexamined. To address this gap, we introduce DriveBench, a be… ▽ More

    Submitted 7 January, 2025; originally announced January 2025.

    Comments: Preprint; 41 pages, 32 figures, 16 tables; Project Page at https://drive-bench.github.io/

  5. arXiv:2410.06062  [pdf, other

    cs.DB cs.AI cs.IR

    LLM-based SPARQL Query Generation from Natural Language over Federated Knowledge Graphs

    Authors: Vincent Emonet, Jerven Bolleman, Severine Duvaud, Tarcisio Mendes de Farias, Ana Claudia Sima

    Abstract: We introduce a Retrieval-Augmented Generation (RAG) system for translating user questions into accurate federated SPARQL queries over bioinformatics knowledge graphs (KGs) leveraging Large Language Models (LLMs). To enhance accuracy and reduce hallucinations in query generation, our system utilises metadata from the KGs, including query examples and schema information, and incorporates a validatio… ▽ More

    Submitted 10 February, 2025; v1 submitted 8 October, 2024; originally announced October 2024.

  6. arXiv:2410.06010  [pdf

    cs.DB cs.AI cs.IR

    A large collection of bioinformatics question-query pairs over federated knowledge graphs: methodology and applications

    Authors: Jerven Bolleman, Vincent Emonet, Adrian Altenhoff, Amos Bairoch, Marie-Claude Blatter, Alan Bridge, Severine Duvaud, Elisabeth Gasteiger, Dmitry Kuznetsov, Sebastien Moretti, Pierre-Andre Michel, Anne Morgat, Marco Pagni, Nicole Redaschi, Monique Zahn-Zabal, Tarcisio Mendes de Farias, Ana Claudia Sima

    Abstract: Background. In the last decades, several life science resources have structured data using the same framework and made these accessible using the same query language to facilitate interoperability. Knowledge graphs have seen increased adoption in bioinformatics due to their advantages for representing data in a generic graph format. For example, yummydata.org catalogs more than 60 knowledge graphs… ▽ More

    Submitted 8 October, 2024; originally announced October 2024.

  7. arXiv:2409.06702  [pdf, other

    cs.CV cs.AI

    Hint-AD: Holistically Aligned Interpretability in End-to-End Autonomous Driving

    Authors: Kairui Ding, Boyuan Chen, Yuchen Su, Huan-ang Gao, Bu Jin, Chonghao Sima, Wuqiang Zhang, Xiaohui Li, Paul Barsch, Hongyang Li, Hao Zhao

    Abstract: End-to-end architectures in autonomous driving (AD) face a significant challenge in interpretability, impeding human-AI trust. Human-friendly natural language has been explored for tasks such as driving explanation and 3D captioning. However, previous works primarily focused on the paradigm of declarative interpretability, where the natural language interpretations are not grounded in the intermed… ▽ More

    Submitted 10 September, 2024; originally announced September 2024.

    Comments: CoRL 2024, Project Page: https://air-discover.github.io/Hint-AD/

  8. arXiv:2402.04627  [pdf, other

    cs.AI cs.CL cs.DB cs.IR

    SPARQL Generation: an analysis on fine-tuning OpenLLaMA for Question Answering over a Life Science Knowledge Graph

    Authors: Julio C. Rangel, Tarcisio Mendes de Farias, Ana Claudia Sima, Norio Kobayashi

    Abstract: The recent success of Large Language Models (LLM) in a wide range of Natural Language Processing applications opens the path towards novel Question Answering Systems over Knowledge Graphs leveraging LLMs. However, one of the main obstacles preventing their implementation is the scarcity of training data for the task of translating questions into corresponding SPARQL queries, particularly in the ca… ▽ More

    Submitted 7 February, 2024; originally announced February 2024.

    Comments: To appear in Proceedings of SWAT4HCLS 2024: Semantic Web Tools and Applications for Healthcare and Life Sciences

  9. arXiv:2312.14150  [pdf, other

    cs.CV

    DriveLM: Driving with Graph Visual Question Answering

    Authors: Chonghao Sima, Katrin Renz, Kashyap Chitta, Li Chen, Hanxue Zhang, Chengen Xie, Jens Beißwenger, Ping Luo, Andreas Geiger, Hongyang Li

    Abstract: We study how vision-language models (VLMs) trained on web-scale data can be integrated into end-to-end driving systems to boost generalization and enable interactivity with human users. While recent approaches adapt VLMs to driving via single-round visual question answering (VQA), human drivers reason about decisions in multiple steps. Starting from the localization of key objects, humans estimate… ▽ More

    Submitted 16 January, 2025; v1 submitted 21 December, 2023; originally announced December 2023.

    Comments: Accepted to ECCV 2024 as Oral paper

  10. arXiv:2310.15670  [pdf, other

    cs.CV

    Leveraging Vision-Centric Multi-Modal Expertise for 3D Object Detection

    Authors: Linyan Huang, Zhiqi Li, Chonghao Sima, Wenhai Wang, Jingdong Wang, Yu Qiao, Hongyang Li

    Abstract: Current research is primarily dedicated to advancing the accuracy of camera-only 3D object detectors (apprentice) through the knowledge transferred from LiDAR- or multi-modal-based counterparts (expert). However, the presence of the domain gap between LiDAR and camera features, coupled with the inherent incompatibility in temporal fusion, significantly hinders the effectiveness of distillation-bas… ▽ More

    Submitted 24 October, 2023; originally announced October 2023.

    Comments: Accepted by NeurIPS 2023

  11. arXiv:2306.02851  [pdf, other

    cs.CV cs.RO

    Scene as Occupancy

    Authors: Chonghao Sima, Wenwen Tong, Tai Wang, Li Chen, Silei Wu, Hanming Deng, Yi Gu, Lewei Lu, Ping Luo, Dahua Lin, Hongyang Li

    Abstract: Human driver can easily describe the complex traffic scene by visual system. Such an ability of precise perception is essential for driver's planning. To achieve this, a geometry-aware representation that quantizes the physical 3D scene into structured grid map with semantic labels per cell, termed as 3D Occupancy, would be desirable. Compared to the form of bounding box, a key insight behind occu… ▽ More

    Submitted 26 June, 2023; v1 submitted 5 June, 2023; originally announced June 2023.

    Comments: Project link: https://github.com/OpenDriveLab/OccNet

  12. arXiv:2304.10440  [pdf, other

    cs.CV

    OpenLane-V2: A Topology Reasoning Benchmark for Unified 3D HD Mapping

    Authors: Huijie Wang, Tianyu Li, Yang Li, Li Chen, Chonghao Sima, Zhenbo Liu, Bangjun Wang, Peijin Jia, Yuting Wang, Shengyin Jiang, Feng Wen, Hang Xu, Ping Luo, Junchi Yan, Wei Zhang, Hongyang Li

    Abstract: Accurately depicting the complex traffic scene is a vital component for autonomous vehicles to execute correct judgments. However, existing benchmarks tend to oversimplify the scene by solely focusing on lane perception tasks. Observing that human drivers rely on both lanes and traffic signals to operate their vehicles safely, we present OpenLane-V2, the first dataset on topology reasoning for tra… ▽ More

    Submitted 28 October, 2023; v1 submitted 20 April, 2023; originally announced April 2023.

    Comments: Accepted by NeurIPS 2023 Track on Datasets and Benchmarks | OpenLane-V2 Dataset: https://github.com/OpenDriveLab/OpenLane-V2

  13. arXiv:2304.04179  [pdf, other

    cs.CV

    Sparse Dense Fusion for 3D Object Detection

    Authors: Yulu Gao, Chonghao Sima, Shaoshuai Shi, Shangzhe Di, Si Liu, Hongyang Li

    Abstract: With the prevalence of multimodal learning, camera-LiDAR fusion has gained popularity in 3D object detection. Although multiple fusion approaches have been proposed, they can be classified into either sparse-only or dense-only fashion based on the feature representation in the fusion module. In this paper, we analyze them in a common taxonomy and thereafter observe two challenges: 1) sparse-only s… ▽ More

    Submitted 9 April, 2023; originally announced April 2023.

  14. arXiv:2212.10156  [pdf, other

    cs.CV cs.RO

    Planning-oriented Autonomous Driving

    Authors: Yihan Hu, Jiazhi Yang, Li Chen, Keyu Li, Chonghao Sima, Xizhou Zhu, Siqi Chai, Senyao Du, Tianwei Lin, Wenhai Wang, Lewei Lu, Xiaosong Jia, Qiang Liu, Jifeng Dai, Yu Qiao, Hongyang Li

    Abstract: Modern autonomous driving system is characterized as modular tasks in sequential order, i.e., perception, prediction, and planning. In order to perform a wide diversity of tasks and achieve advanced-level intelligence, contemporary approaches either deploy standalone models for individual tasks, or design a multi-task paradigm with separate heads. However, they might suffer from accumulative error… ▽ More

    Submitted 23 March, 2023; v1 submitted 20 December, 2022; originally announced December 2022.

    Comments: CVPR 2023 award candidate. Project page: https://opendrivelab.github.io/UniAD/

  15. arXiv:2209.05324  [pdf, other

    cs.CV cs.LG cs.RO

    Delving into the Devils of Bird's-eye-view Perception: A Review, Evaluation and Recipe

    Authors: Hongyang Li, Chonghao Sima, Jifeng Dai, Wenhai Wang, Lewei Lu, Huijie Wang, Jia Zeng, Zhiqi Li, Jiazhi Yang, Hanming Deng, Hao Tian, Enze Xie, Jiangwei Xie, Li Chen, Tianyu Li, Yang Li, Yulu Gao, Xiaosong Jia, Si Liu, Jianping Shi, Dahua Lin, Yu Qiao

    Abstract: Learning powerful representations in bird's-eye-view (BEV) for perception tasks is trending and drawing extensive attention both from industry and academia. Conventional approaches for most autonomous driving algorithms perform detection, segmentation, tracking, etc., in a front or perspective view. As sensor configurations get more complex, integrating multi-source information from different sens… ▽ More

    Submitted 27 September, 2023; v1 submitted 12 September, 2022; originally announced September 2022.

    Comments: https://github.com/OpenDriveLab/Birds-eye-view-Perception

  16. arXiv:2203.17270  [pdf, other

    cs.CV

    BEVFormer: Learning Bird's-Eye-View Representation from Multi-Camera Images via Spatiotemporal Transformers

    Authors: Zhiqi Li, Wenhai Wang, Hongyang Li, Enze Xie, Chonghao Sima, Tong Lu, Qiao Yu, Jifeng Dai

    Abstract: 3D visual perception tasks, including 3D detection and map segmentation based on multi-camera images, are essential for autonomous driving systems. In this work, we present a new framework termed BEVFormer, which learns unified BEV representations with spatiotemporal transformers to support multiple autonomous driving perception tasks. In a nutshell, BEVFormer exploits both spatial and temporal in… ▽ More

    Submitted 13 July, 2022; v1 submitted 31 March, 2022; originally announced March 2022.

    Comments: Accepted to ECCV 2022

  17. arXiv:2203.11089  [pdf, other

    cs.CV

    PersFormer: 3D Lane Detection via Perspective Transformer and the OpenLane Benchmark

    Authors: Li Chen, Chonghao Sima, Yang Li, Zehan Zheng, Jiajie Xu, Xiangwei Geng, Hongyang Li, Conghui He, Jianping Shi, Yu Qiao, Junchi Yan

    Abstract: Methods for 3D lane detection have been recently proposed to address the issue of inaccurate lane layouts in many autonomous driving scenarios (uphill/downhill, bump, etc.). Previous work struggled in complex cases due to their simple designs of the spatial transformation between front view and bird's eye view (BEV) and the lack of a realistic dataset. Towards these issues, we present PersFormer:… ▽ More

    Submitted 19 July, 2022; v1 submitted 21 March, 2022; originally announced March 2022.

    Comments: Accepted by ECCV 2022 (Oral). Project page: https://github.com/OpenPerceptionX/PersFormer_3DLane | OpenLane dataset: https://github.com/OpenPerceptionX/OpenLane

  18. BV equivalence with boundary

    Authors: Francisco Manuel Castela Simão, Alberto S. Cattaneo, Michele Schiavina

    Abstract: An extension of the notion of classical equivalence of equivalence in the Batalin--(Fradkin)--Vilkovisky (BV) and (BFV) framework for local Lagrangian field theory on manifolds possibly with boundary is discussed. Equivalence is phrased in both a strict and a lax sense, distinguished by the compatibility between the BV data for a field theory and its boundary BFV data, necessary for quantisation.… ▽ More

    Submitted 7 March, 2023; v1 submitted 11 September, 2021; originally announced September 2021.

    Comments: Published version

    MSC Class: 81T70; 83C47; 70S15; 70B05

    Journal ref: Letters in Mathematical Physics volume 113 (25), 2023

  19. arXiv:2104.13744  [pdf, other

    cs.DB

    Bio-SODA: Enabling Natural Language Question Answering over Knowledge Graphs without Training Data

    Authors: Ana Claudia Sima, Tarcisio Mendes de Farias, Maria Anisimova, Christophe Dessimoz, Marc Robinson-Rechavi, Erich Zbinden, Kurt Stockinger

    Abstract: The problem of natural language processing over structured data has become a growing research field, both within the relational database and the Semantic Web community, with significant efforts involved in question answering over knowledge graphs (KGQA). However, many of these approaches are either specifically targeted at open-domain question answering using DBpedia, or require large training dat… ▽ More

    Submitted 14 June, 2021; v1 submitted 28 April, 2021; originally announced April 2021.

    Journal ref: 33rd International Conference on Scientific and Statistical Database Management (SSDBM 2021)

  20. Optical and mechanical properties of nanofibrillated cellulose: towards a robust platform for next-generation green technologies

    Authors: Claudia D. Simao, Juan S. Reparaz, Markus. R. Wagner, Bartlomiej Graczykowski, Martin Kreuzer, Yasser B. Ruiz-Blanco, Yamila Garcia, Jani-Markus Malho, Alejandro R. Goni, Jouni Ahopelto, Clivia M. Sotomayor Torres

    Abstract: Nanofibrillated cellulose, a polymer that can be obtained from one of the most abundant biopolymers in Nature, is being increasingly explored due to its outstanding properties for packaging and device applications. Still, open challenges in engineering its intrinsic properties remain to address. The results obtained show the precise determination of significant properties as elastic properties and… ▽ More

    Submitted 1 April, 2015; originally announced April 2015.

    Comments: in press in Carbohydrate Polymers (2015)

  21. Order quantification of hexagonal periodic arrays fabricated by in situ solvent-assisted nanoimprint lithography of block copolymers

    Authors: Claudia Simao, Worawut Khunsin, Nikolaos Kehagias, Mathieu Salaun, Marc Zelsmann, Michael A. Morris, Clivia M. Sotomayor Torres

    Abstract: Directed self-assembly of block copolymer polystyrene-b-polyethylene oxide (PS-b-PEO) thin film was achieved by one-pot methodology of solvent vapour assisted nanoimprint lithography (SAIL).

    Submitted 10 March, 2014; originally announced March 2014.

    Comments: 12 pages, 4 figures, paper accepted

    Journal ref: Nanotechnology (2014) 25 (7) 175703