Search | arXiv e-print repository

DROID: A Large-Scale In-The-Wild Robot Manipulation Dataset

Authors: Alexander Khazatsky, Karl Pertsch, Suraj Nair, Ashwin Balakrishna, Sudeep Dasari, Siddharth Karamcheti, Soroush Nasiriany, Mohan Kumar Srirama, Lawrence Yunliang Chen, Kirsty Ellis, Peter David Fagan, Joey Hejna, Masha Itkina, Marion Lepert, Yecheng Jason Ma, Patrick Tree Miller, Jimmy Wu, Suneel Belkhale, Shivin Dass, Huy Ha, Arhan Jain, Abraham Lee, Youngwoon Lee, Marius Memmel, Sungjae Park , et al. (76 additional authors not shown)

Abstract: The creation of large, diverse, high-quality robot manipulation datasets is an important stepping stone on the path toward more capable and robust robotic manipulation policies. However, creating such datasets is challenging: collecting robot manipulation data in diverse environments poses logistical and safety challenges and requires substantial investments in hardware and human labour. As a resu… ▽ More The creation of large, diverse, high-quality robot manipulation datasets is an important stepping stone on the path toward more capable and robust robotic manipulation policies. However, creating such datasets is challenging: collecting robot manipulation data in diverse environments poses logistical and safety challenges and requires substantial investments in hardware and human labour. As a result, even the most general robot manipulation policies today are mostly trained on data collected in a small number of environments with limited scene and task diversity. In this work, we introduce DROID (Distributed Robot Interaction Dataset), a diverse robot manipulation dataset with 76k demonstration trajectories or 350 hours of interaction data, collected across 564 scenes and 84 tasks by 50 data collectors in North America, Asia, and Europe over the course of 12 months. We demonstrate that training with DROID leads to policies with higher performance and improved generalization ability. We open source the full dataset, policy learning code, and a detailed guide for reproducing our robot hardware setup. △ Less

Submitted 22 April, 2025; v1 submitted 19 March, 2024; originally announced March 2024.

Comments: Project website: https://droid-dataset.github.io/

arXiv:2310.08864 [pdf, other]

Open X-Embodiment: Robotic Learning Datasets and RT-X Models

Authors: Open X-Embodiment Collaboration, Abby O'Neill, Abdul Rehman, Abhinav Gupta, Abhiram Maddukuri, Abhishek Gupta, Abhishek Padalkar, Abraham Lee, Acorn Pooley, Agrim Gupta, Ajay Mandlekar, Ajinkya Jain, Albert Tung, Alex Bewley, Alex Herzog, Alex Irpan, Alexander Khazatsky, Anant Rai, Anchit Gupta, Andrew Wang, Andrey Kolobov, Anikait Singh, Animesh Garg, Aniruddha Kembhavi, Annie Xie , et al. (269 additional authors not shown)

Abstract: Large, high-capacity models trained on diverse datasets have shown remarkable successes on efficiently tackling downstream applications. In domains from NLP to Computer Vision, this has led to a consolidation of pretrained models, with general pretrained backbones serving as a starting point for many applications. Can such a consolidation happen in robotics? Conventionally, robotic learning method… ▽ More Large, high-capacity models trained on diverse datasets have shown remarkable successes on efficiently tackling downstream applications. In domains from NLP to Computer Vision, this has led to a consolidation of pretrained models, with general pretrained backbones serving as a starting point for many applications. Can such a consolidation happen in robotics? Conventionally, robotic learning methods train a separate model for every application, every robot, and even every environment. Can we instead train generalist X-robot policy that can be adapted efficiently to new robots, tasks, and environments? In this paper, we provide datasets in standardized data formats and models to make it possible to explore this possibility in the context of robotic manipulation, alongside experimental results that provide an example of effective X-robot policies. We assemble a dataset from 22 different robots collected through a collaboration between 21 institutions, demonstrating 527 skills (160266 tasks). We show that a high-capacity model trained on this data, which we call RT-X, exhibits positive transfer and improves the capabilities of multiple robots by leveraging experience from other platforms. More details can be found on the project website https://robotics-transformer-x.github.io. △ Less

Submitted 14 May, 2025; v1 submitted 13 October, 2023; originally announced October 2023.

Comments: Project website: https://robotics-transformer-x.github.io

arXiv:2210.12161 [pdf, other]

Task-Based Assessment for Neural Networks: Evaluating Undersampled MRI Reconstructions based on Human Observer Signal Detection

Authors: Joshua D. Herman, Rachel E. Roca, Alexandra G. O'Neill, Marcus L. Wong, Sajan G. Lingala, Angel R. Pineda

Abstract: Recent research has explored using neural networks to reconstruct undersampled magnetic resonance imaging (MRI) data. Because of the complexity of the artifacts in the reconstructed images, there is a need to develop task-based approaches of image quality. Common metrics for evaluating image quality like the normalized root mean squared error (NRMSE) and structural similarity (SSIM) are global met… ▽ More Recent research has explored using neural networks to reconstruct undersampled magnetic resonance imaging (MRI) data. Because of the complexity of the artifacts in the reconstructed images, there is a need to develop task-based approaches of image quality. Common metrics for evaluating image quality like the normalized root mean squared error (NRMSE) and structural similarity (SSIM) are global metrics which average out impact of subtle features in the images. Using measures of image quality which incorporate a subtle signal for a specific task allow for image quality assessment which locally evaluates the effect of undersampling on a signal. We used a U-Net to reconstruct under-sampled images with 2x, 3x, 4x and 5x fold 1-D undersampling rates. Cross validation was performed for a 500 and a 4000 image training set with both structural similarity (SSIM) and mean squared error (MSE) losses. A two alternative forced choice (2-AFC) observer study was carried out for detecting a subtle signal (small blurred disk) from images with the 4000 image training set. We found that for both loss functions and training set sizes, the human observer performance on the 2-AFC studies led to a choice of a 2x undersampling but the SSIM and NRMSE led to a choice of a 3x undersampling. For this task, SSIM and NRMSE led to an overestimate of the achievable undersampling using a U-Net before a steep loss of image quality when compared to the performance of human observers in the detection of a subtle lesion. △ Less

Submitted 21 October, 2022; originally announced October 2022.

arXiv:2109.09819 [pdf, other]

Seriema: RDMA-based Remote Invocationwith a Case-Study on Monte-Carlo Tree Search

Authors: Hammurabi Mendes, Bryce Wiedenbeck, Aidan O'Neill

Abstract: We introduce Seriema, a middleware that integrates RDMA-based remote invocation, asynchronous data transfer, NUMA-aware automatic management of registered memory, and message aggregation in idiomatic C++1x. Seriema supports the notion that remote invocation and asynchronous data transfer are complementary services that, when tightly-integrated, allow distributed data structures, overlay networks,… ▽ More We introduce Seriema, a middleware that integrates RDMA-based remote invocation, asynchronous data transfer, NUMA-aware automatic management of registered memory, and message aggregation in idiomatic C++1x. Seriema supports the notion that remote invocation and asynchronous data transfer are complementary services that, when tightly-integrated, allow distributed data structures, overlay networks, and Cloud & datacenter service applications to be expressed effectively and naturally, resembling sequential code. In order to evaluate the usability of Seriema, we implement a Monte-Carlo Tree Search (MCTS) application framework, which runs distributed simulations given only a sequential problem specification. Micro-benchmarks show that Seriema provides remote invocations with low overhead, and that our MCTS application framework scales well up to the number of non-hyperthreaded CPU cores while simulating plays of the board game Hex. △ Less

Submitted 20 September, 2021; originally announced September 2021.

arXiv:2108.04996 [pdf]

Cybersecurity Incident Response in Organisations: A Meta-level Framework for Scenario-based Training

Authors: Ashley O'Neill, Atif Ahmad, Sean Maynard

Abstract: Cybersecurity incident response teams mitigate the impact of adverse cyber-related events in organisations. Field studies of IR teams suggest that at present the process of IR is under-developed with a focus on the technological dimension with little consideration of practice capability. To address this gap, we develop a scenario-based training approach to assist organisations to overcome socio-te… ▽ More Cybersecurity incident response teams mitigate the impact of adverse cyber-related events in organisations. Field studies of IR teams suggest that at present the process of IR is under-developed with a focus on the technological dimension with little consideration of practice capability. To address this gap, we develop a scenario-based training approach to assist organisations to overcome socio-technical barriers to incident response. The training approach is informed by a comprehensive list of socio-technical barriers compiled from a comprehensive review of the literature. Our primary contribution is a novel meta-level framework to generate scenarios specifically targeting socio-technical issues. To demonstrate the utility of the framework, a proof-of-concept scenario is presented. △ Less

Submitted 10 August, 2021; originally announced August 2021.

Comments: 11 pages

arXiv:2101.07342 [pdf, other]

doi 10.1039/D1AN00075F

Feature Fusion of Raman Chemical Imaging and Digital Histopathology using Machine Learning for Prostate Cancer Detection

Authors: Trevor Doherty, Susan McKeever, Nebras Al-Attar, Tiarnan Murphy, Claudia Aura, Arman Rahman, Amanda O'Neill, Stephen P Finn, Elaine Kay, William M. Gallagher, R. William G. Watson, Aoife Gowen, Patrick Jackman

Abstract: The diagnosis of prostate cancer is challenging due to the heterogeneity of its presentations, leading to the over diagnosis and treatment of non-clinically important disease. Accurate diagnosis can directly benefit a patient's quality of life and prognosis. Towards addressing this issue, we present a learning model for the automatic identification of prostate cancer. While many prostate cancer st… ▽ More The diagnosis of prostate cancer is challenging due to the heterogeneity of its presentations, leading to the over diagnosis and treatment of non-clinically important disease. Accurate diagnosis can directly benefit a patient's quality of life and prognosis. Towards addressing this issue, we present a learning model for the automatic identification of prostate cancer. While many prostate cancer studies have adopted Raman spectroscopy approaches, none have utilised the combination of Raman Chemical Imaging (RCI) and other imaging modalities. This study uses multimodal images formed from stained Digital Histopathology (DP) and unstained RCI. The approach was developed and tested on a set of 178 clinical samples from 32 patients, containing a range of non-cancerous, Gleason grade 3 (G3) and grade 4 (G4) tissue microarray samples. For each histological sample, there is a pathologist labelled DP - RCI image pair. The hypothesis tested was whether multimodal image models can outperform single modality baseline models in terms of diagnostic accuracy. Binary non-cancer/cancer models and the more challenging G3/G4 differentiation were investigated. Regarding G3/G4 classification, the multimodal approach achieved a sensitivity of 73.8% and specificity of 88.1% while the baseline DP model showed a sensitivity and specificity of 54.1% and 84.7% respectively. The multimodal approach demonstrated a statistically significant 12.7% AUC advantage over the baseline with a value of 85.8% compared to 73.1%, also outperforming models based solely on RCI and median Raman spectra. Feature fusion of DP and RCI does not improve the more trivial task of tumour identification but does deliver an observed advantage in G3/G4 discrimination. Building on these promising findings, future work could include the acquisition of larger datasets for enhanced model generalization. △ Less

Submitted 18 January, 2021; originally announced January 2021.

Comments: 19 pages, 8 tables, 18 figures

arXiv:1706.01552 [pdf, other]

doi 10.1145/3460120.3484786

$\mathcal{E}\text{psolute}$: Efficiently Querying Databases While Providing Differential Privacy

Authors: Dmytro Bogatov, Georgios Kellaris, George Kollios, Kobbi Nissim, Adam O'Neill

Abstract: As organizations struggle with processing vast amounts of information, outsourcing sensitive data to third parties becomes a necessity. To protect the data, various cryptographic techniques are used in outsourced database systems to ensure data privacy, while allowing efficient querying. A rich collection of attacks on such systems has emerged. Even with strong cryptography, just communication vol… ▽ More As organizations struggle with processing vast amounts of information, outsourcing sensitive data to third parties becomes a necessity. To protect the data, various cryptographic techniques are used in outsourced database systems to ensure data privacy, while allowing efficient querying. A rich collection of attacks on such systems has emerged. Even with strong cryptography, just communication volume or access pattern is enough for an adversary to succeed. In this work we present a model for differentially private outsourced database system and a concrete construction, $\mathcal{E}\text{psolute}$, that provably conceals the aforementioned leakages, while remaining efficient and scalable. In our solution, differential privacy is preserved at the record level even against an untrusted server that controls data and queries. $\mathcal{E}\text{psolute}$ combines Oblivious RAM and differentially private sanitizers to create a generic and efficient construction. We go further and present a set of improvements to bring the solution to efficiency and practicality necessary for real-world adoption. We describe the way to parallelize the operations, minimize the amount of noise, and reduce the number of network requests, while preserving the privacy guarantees. We have run an extensive set of experiments, dozens of servers processing up to 10 million records, and compiled a detailed result analysis proving the efficiency and scalability of our solution. While providing strong security and privacy guarantees we are less than an order of magnitude slower than range query execution of a non-secure plain-text optimized RDBMS like MySQL and PostgreSQL. △ Less

Submitted 27 September, 2021; v1 submitted 5 June, 2017; originally announced June 2017.

Comments: Camera-ready version for ACM CCS 2021

Journal ref: Proceedings of the 2021 ACM SIGSAC Conference on Computer and Communications Security (CCS '21)

arXiv:1201.2056 [pdf, ps, other]

Adaptive Context Tree Weighting

Authors: Alexander O'Neill, Marcus Hutter, Wen Shao, Peter Sunehag

Abstract: We describe an adaptive context tree weighting (ACTW) algorithm, as an extension to the standard context tree weighting (CTW) algorithm. Unlike the standard CTW algorithm, which weights all observations equally regardless of the depth, ACTW gives increasing weight to more recent observations, aiming to improve performance in cases where the input sequence is from a non-stationary distribution. Dat… ▽ More We describe an adaptive context tree weighting (ACTW) algorithm, as an extension to the standard context tree weighting (CTW) algorithm. Unlike the standard CTW algorithm, which weights all observations equally regardless of the depth, ACTW gives increasing weight to more recent observations, aiming to improve performance in cases where the input sequence is from a non-stationary distribution. Data compression results show ACTW variants improving over CTW on merged files from standard compression benchmark tests while never being significantly worse on any individual file. △ Less

Submitted 10 January, 2012; originally announced January 2012.

Comments: 11 LaTeX pages, 7 tables

Showing 1–8 of 8 results for author: O'Neill, A