-
Towards Suturing World Models: Learning Predictive Models for Robotic Surgical Tasks
Authors:
Mehmet Kerem Turkcan,
Mattia Ballo,
Filippo Filicori,
Zoran Kostic
Abstract:
We introduce specialized diffusion-based generative models that capture the spatiotemporal dynamics of fine-grained robotic surgical sub-stitch actions through supervised learning on annotated laparoscopic surgery footage. The proposed models form a foundation for data-driven world models capable of simulating the biomechanical interactions and procedural dynamics of surgical suturing with high te…
▽ More
We introduce specialized diffusion-based generative models that capture the spatiotemporal dynamics of fine-grained robotic surgical sub-stitch actions through supervised learning on annotated laparoscopic surgery footage. The proposed models form a foundation for data-driven world models capable of simulating the biomechanical interactions and procedural dynamics of surgical suturing with high temporal fidelity. Annotating a dataset of $\sim2K$ clips extracted from simulation videos, we categorize surgical actions into fine-grained sub-stitch classes including ideal and non-ideal executions of needle positioning, targeting, driving, and withdrawal. We fine-tune two state-of-the-art video diffusion models, LTX-Video and HunyuanVideo, to generate high-fidelity surgical action sequences at $\ge$768x512 resolution and $\ge$49 frames. For training our models, we explore both Low-Rank Adaptation (LoRA) and full-model fine-tuning approaches. Our experimental results demonstrate that these world models can effectively capture the dynamics of suturing, potentially enabling improved training simulators, surgical skill assessment tools, and autonomous surgical systems. The models also display the capability to differentiate between ideal and non-ideal technique execution, providing a foundation for building surgical training and evaluation systems. We release our models for testing and as a foundation for future research. Project Page: https://mkturkcan.github.io/suturingmodels/
△ Less
Submitted 16 March, 2025;
originally announced March 2025.
-
AI-Powered Urban Transportation Digital Twin: Methods and Applications
Authors:
Xuan Di,
Yongjie Fu,
Mehmet K. Turkcan,
Mahshid Ghasemi,
Zhaobin Mo,
Chengbo Zang,
Abhishek Adhikari,
Zoran Kostic,
Gil Zussman
Abstract:
We present a survey paper on methods and applications of digital twins (DT) for urban traffic management. While the majority of studies on the DT focus on its "eyes," which is the emerging sensing and perception like object detection and tracking, what really distinguishes the DT from a traditional simulator lies in its ``brain," the prediction and decision making capabilities of extracting patter…
▽ More
We present a survey paper on methods and applications of digital twins (DT) for urban traffic management. While the majority of studies on the DT focus on its "eyes," which is the emerging sensing and perception like object detection and tracking, what really distinguishes the DT from a traditional simulator lies in its ``brain," the prediction and decision making capabilities of extracting patterns and making informed decisions from what has been seen and perceived. In order to add values to urban transportation management, DTs need to be powered by artificial intelligence and complement with low-latency high-bandwidth sensing and networking technologies. We will first review the DT pipeline leveraging cyberphysical systems and propose our DT architecture deployed on a real-world testbed in New York City. This survey paper can be a pointer to help researchers and practitioners identify challenges and opportunities for the development of DTs; a bridge to initiate conversations across disciplines; and a road map to exploiting potentials of DTs for diverse urban transportation applications.
△ Less
Submitted 29 December, 2024;
originally announced January 2025.
-
The Streetscape Application Services Stack (SASS): Towards a Distributed Sensing Architecture for Urban Applications
Authors:
Navid Salami Pargoo,
Mahshid Ghasemi,
Shuren Xia,
Mehmet Kerem Turkcan,
Taqiya Ehsan,
Chengbo Zang,
Yuan Sun,
Javad Ghaderi,
Gil Zussman,
Zoran Kostic,
Jorge Ortiz
Abstract:
As urban populations grow, cities are becoming more complex, driving the deployment of interconnected sensing systems to realize the vision of smart cities. These systems aim to improve safety, mobility, and quality of life through applications that integrate diverse sensors with real-time decision-making. Streetscape applications-focusing on challenges like pedestrian safety and adaptive traffic…
▽ More
As urban populations grow, cities are becoming more complex, driving the deployment of interconnected sensing systems to realize the vision of smart cities. These systems aim to improve safety, mobility, and quality of life through applications that integrate diverse sensors with real-time decision-making. Streetscape applications-focusing on challenges like pedestrian safety and adaptive traffic management-depend on managing distributed, heterogeneous sensor data, aligning information across time and space, and enabling real-time processing. These tasks are inherently complex and often difficult to scale. The Streetscape Application Services Stack (SASS) addresses these challenges with three core services: multimodal data synchronization, spatiotemporal data fusion, and distributed edge computing. By structuring these capabilities as clear, composable abstractions with clear semantics, SASS allows developers to scale streetscape applications efficiently while minimizing the complexity of multimodal integration.
We evaluated SASS in two real-world testbed environments: a controlled parking lot and an urban intersection in a major U.S. city. These testbeds allowed us to test SASS under diverse conditions, demonstrating its practical applicability. The Multimodal Data Synchronization service reduced temporal misalignment errors by 88%, achieving synchronization accuracy within 50 milliseconds. Spatiotemporal Data Fusion service improved detection accuracy for pedestrians and vehicles by over 10%, leveraging multicamera integration. The Distributed Edge Computing service increased system throughput by more than an order of magnitude. Together, these results show how SASS provides the abstractions and performance needed to support real-time, scalable urban applications, bridging the gap between sensing infrastructure and actionable streetscape intelligence.
△ Less
Submitted 12 January, 2025; v1 submitted 29 November, 2024;
originally announced November 2024.
-
Boundless: Generating Photorealistic Synthetic Data for Object Detection in Urban Streetscapes
Authors:
Mehmet Kerem Turkcan,
Yuyang Li,
Chengbo Zang,
Javad Ghaderi,
Gil Zussman,
Zoran Kostic
Abstract:
We introduce Boundless, a photo-realistic synthetic data generation system for enabling highly accurate object detection in dense urban streetscapes. Boundless can replace massive real-world data collection and manual ground-truth object annotation (labeling) with an automated and configurable process. Boundless is based on the Unreal Engine 5 (UE5) City Sample project with improvements enabling a…
▽ More
We introduce Boundless, a photo-realistic synthetic data generation system for enabling highly accurate object detection in dense urban streetscapes. Boundless can replace massive real-world data collection and manual ground-truth object annotation (labeling) with an automated and configurable process. Boundless is based on the Unreal Engine 5 (UE5) City Sample project with improvements enabling accurate collection of 3D bounding boxes across different lighting and scene variability conditions.
We evaluate the performance of object detection models trained on the dataset generated by Boundless when used for inference on a real-world dataset acquired from medium-altitude cameras. We compare the performance of the Boundless-trained model against the CARLA-trained model and observe an improvement of 7.8 mAP. The results we achieved support the premise that synthetic data generation is a credible methodology for training/fine-tuning scalable object detection models for urban scenes.
△ Less
Submitted 26 September, 2024; v1 submitted 4 September, 2024;
originally announced September 2024.
-
Data-Driven Traffic Simulation for an Intersection in a Metropolis
Authors:
Chengbo Zang,
Mehmet Kerem Turkcan,
Gil Zussman,
Javad Ghaderi,
Zoran Kostic
Abstract:
We present a novel data-driven simulation environment for modeling traffic in metropolitan street intersections. Using real-world tracking data collected over an extended period of time, we train trajectory forecasting models to learn agent interactions and environmental constraints that are difficult to capture conventionally. Trajectories of new agents are first coarsely generated by sampling fr…
▽ More
We present a novel data-driven simulation environment for modeling traffic in metropolitan street intersections. Using real-world tracking data collected over an extended period of time, we train trajectory forecasting models to learn agent interactions and environmental constraints that are difficult to capture conventionally. Trajectories of new agents are first coarsely generated by sampling from the spatial and temporal generative distributions, then refined using state-of-the-art trajectory forecasting models. The simulation can run either autonomously, or under explicit human control conditioned on the generative distributions. We present the experiments for a variety of model configurations. Under an iterative prediction scheme, the way-point-supervised TrajNet++ model obtained 0.36 Final Displacement Error (FDE) in 20 FPS on an NVIDIA A100 GPU.
△ Less
Submitted 1 August, 2024;
originally announced August 2024.
-
Constellation Dataset: Benchmarking High-Altitude Object Detection for an Urban Intersection
Authors:
Mehmet Kerem Turkcan,
Sanjeev Narasimhan,
Chengbo Zang,
Gyung Hyun Je,
Bo Yu,
Mahshid Ghasemi,
Javad Ghaderi,
Gil Zussman,
Zoran Kostic
Abstract:
We introduce Constellation, a dataset of 13K images suitable for research on detection of objects in dense urban streetscapes observed from high-elevation cameras, collected for a variety of temporal conditions. The dataset addresses the need for curated data to explore problems in small object detection exemplified by the limited pixel footprint of pedestrians observed tens of meters from above.…
▽ More
We introduce Constellation, a dataset of 13K images suitable for research on detection of objects in dense urban streetscapes observed from high-elevation cameras, collected for a variety of temporal conditions. The dataset addresses the need for curated data to explore problems in small object detection exemplified by the limited pixel footprint of pedestrians observed tens of meters from above. It enables the testing of object detection models for variations in lighting, building shadows, weather, and scene dynamics. We evaluate contemporary object detection architectures on the dataset, observing that state-of-the-art methods have lower performance in detecting small pedestrians compared to vehicles, corresponding to a 10% difference in average precision (AP). Using structurally similar datasets for pretraining the models results in an increase of 1.8% mean AP (mAP). We further find that incorporating domain-specific data augmentations helps improve model performance. Using pseudo-labeled data, obtained from inference outcomes of the best-performing models, improves the performance of the models. Finally, comparing the models trained using the data collected in two different time intervals, we find a performance drift in models due to the changes in intersection conditions over time. The best-performing model achieves a pedestrian AP of 92.0% with 11.5 ms inference time on NVIDIA A100 GPUs, and an mAP of 95.4%.
△ Less
Submitted 25 April, 2024;
originally announced April 2024.
-
Examining the Influence of Varied Levels of Domain Knowledge Base Inclusion in GPT-based Intelligent Tutors
Authors:
Blake Castleman,
Mehmet Kerem Turkcan
Abstract:
Recent advancements in large language models (LLMs) have facilitated the development of chatbots with sophisticated conversational capabilities. However, LLMs exhibit frequent inaccurate responses to queries, hindering applications in educational settings. In this paper, we investigate the effectiveness of integrating a knowledge base (KB) with LLM intelligent tutors to increase response reliabili…
▽ More
Recent advancements in large language models (LLMs) have facilitated the development of chatbots with sophisticated conversational capabilities. However, LLMs exhibit frequent inaccurate responses to queries, hindering applications in educational settings. In this paper, we investigate the effectiveness of integrating a knowledge base (KB) with LLM intelligent tutors to increase response reliability. To achieve this, we design a scaleable KB that affords educational supervisors seamless integration of lesson curricula, which is automatically processed by the intelligent tutoring system. We then detail an evaluation, where student participants were presented with questions about the artificial intelligence curriculum to respond to. GPT-4 intelligent tutors with varying hierarchies of KB access and human domain experts then assessed these responses. Lastly, students cross-examined the intelligent tutors' responses to the domain experts' and ranked their various pedagogical abilities. Results suggest that, although these intelligent tutors still demonstrate a lower accuracy compared to domain experts, the accuracy of the intelligent tutors increases when access to a KB is granted. We also observe that the intelligent tutors with KB access exhibit better pedagogical abilities to speak like a teacher and understand students than those of domain experts, while their ability to help students remains lagging behind domain experts.
△ Less
Submitted 15 July, 2024; v1 submitted 16 September, 2023;
originally announced September 2023.
-
Using an Ancillary Neural Network to Capture Weekends and Holidays in an Adjoint Neural Network Architecture for Intelligent Building Management
Authors:
Zhicheng Ding,
Mehmet Kerem Turkcan,
Albert Boulanger
Abstract:
The US EIA estimated in 2017 about 39\% of total U.S. energy consumption was by the residential and commercial sectors. Therefore, Intelligent Building Management (IBM) solutions that minimize consumption while maintaining tenant comfort are an important component in addressing climate change. A forecasting capability for accurate prediction of indoor temperatures in a planning horizon of 24 hours…
▽ More
The US EIA estimated in 2017 about 39\% of total U.S. energy consumption was by the residential and commercial sectors. Therefore, Intelligent Building Management (IBM) solutions that minimize consumption while maintaining tenant comfort are an important component in addressing climate change. A forecasting capability for accurate prediction of indoor temperatures in a planning horizon of 24 hours is essential to IBM. It should predict the indoor temperature in both short-term (e.g. 15 minutes) and long-term (e.g. 24 hours) periods accurately including weekends, major holidays, and minor holidays. Other requirements include the ability to predict the maximum and the minimum indoor temperatures precisely and provide the confidence for each prediction. To achieve these requirements, we propose a novel adjoint neural network architecture for time series prediction that uses an ancillary neural network to capture weekend and holiday information. We studied four long short-term memory (LSTM) based time series prediction networks within this architecture. We observed that the ancillary neural network helps to improve the prediction accuracy, the maximum and the minimum temperature prediction and model reliability for all networks tested.
△ Less
Submitted 26 December, 2018;
originally announced February 2019.