-
Efficient and Real-Time Motion Planning for Robotics Using Projection-Based Optimization
Authors:
Xuemin Chi,
Hakan Girgin,
Tobias Löw,
Yangyang Xie,
Teng Xue,
Jihao Huang,
Cheng Hu,
Zhitao Liu,
Sylvain Calinon
Abstract:
Generating motions for robots interacting with objects of various shapes is a complex challenge, further complicated by the robot geometry and multiple desired behaviors. While current robot programming tools (such as inverse kinematics, collision avoidance, and manipulation planning) often treat these problems as constrained optimization, many existing solvers focus on specific problem domains or…
▽ More
Generating motions for robots interacting with objects of various shapes is a complex challenge, further complicated by the robot geometry and multiple desired behaviors. While current robot programming tools (such as inverse kinematics, collision avoidance, and manipulation planning) often treat these problems as constrained optimization, many existing solvers focus on specific problem domains or do not exploit geometric constraints effectively. We propose an efficient first-order method, Augmented Lagrangian Spectral Projected Gradient Descent (ALSPG), which leverages geometric projections via Euclidean projections, Minkowski sums, and basis functions. We show that by using geometric constraints rather than full constraints and gradients, ALSPG significantly improves real-time performance. Compared to second-order methods like iLQR, ALSPG remains competitive in the unconstrained case. We validate our method through toy examples and extensive simulations, and demonstrate its effectiveness on a 7-axis Franka robot, a 6-axis P-Rob robot and a 1:10 scale car in real-world experiments. Source codes, experimental data and videos are available on the project webpage: https://sites.google.com/view/alspg-oc
△ Less
Submitted 17 June, 2025;
originally announced June 2025.
-
Leveraging AI for Productive and Trustworthy HPC Software: Challenges and Research Directions
Authors:
Keita Teranishi,
Harshitha Menon,
William F. Godoy,
Prasanna Balaprakash,
David Bau,
Tal Ben-Nun,
Abhinav Bhatele,
Franz Franchetti,
Michael Franusich,
Todd Gamblin,
Giorgis Georgakoudis,
Tom Goldstein,
Arjun Guha,
Steven Hahn,
Costin Iancu,
Zheming Jin,
Terry Jones,
Tze Meng Low,
Het Mankad,
Narasinga Rao Miniskar,
Mohammad Alaul Haque Monil,
Daniel Nichols,
Konstantinos Parasyris,
Swaroop Pophale,
Pedro Valero-Lara
, et al. (3 additional authors not shown)
Abstract:
We discuss the challenges and propose research directions for using AI to revolutionize the development of high-performance computing (HPC) software. AI technologies, in particular large language models, have transformed every aspect of software development. For its part, HPC software is recognized as a highly specialized scientific field of its own. We discuss the challenges associated with lever…
▽ More
We discuss the challenges and propose research directions for using AI to revolutionize the development of high-performance computing (HPC) software. AI technologies, in particular large language models, have transformed every aspect of software development. For its part, HPC software is recognized as a highly specialized scientific field of its own. We discuss the challenges associated with leveraging state-of-the-art AI technologies to develop such a unique and niche class of software and outline our research directions in the two US Department of Energy--funded projects for advancing HPC Software via AI: Ellora and Durban.
△ Less
Submitted 12 May, 2025;
originally announced May 2025.
-
Diffusion-based Virtual Fixtures
Authors:
Cem Bilaloglu,
Tobias Löw,
Sylvain Calinon
Abstract:
Virtual fixtures assist human operators in teleoperation settings by constraining their actions. This extended abstract introduces a novel virtual fixture formulation \emph{on surfaces} for tactile robotics tasks. Unlike existing methods, our approach constrains the behavior based on the position on the surface and generalizes it over the surface by considering the distance (metric) on the surface…
▽ More
Virtual fixtures assist human operators in teleoperation settings by constraining their actions. This extended abstract introduces a novel virtual fixture formulation \emph{on surfaces} for tactile robotics tasks. Unlike existing methods, our approach constrains the behavior based on the position on the surface and generalizes it over the surface by considering the distance (metric) on the surface. Our method works directly on possibly noisy and partial point clouds collected via a camera. Given a set of regions on the surface together with their desired behaviors, our method diffuses the behaviors across the entire surface by taking into account the surface geometry. We demonstrate our method's ability in two simulated experiments (i) to regulate contact force magnitude or tangential speed based on surface position and (ii) to guide the robot to targets while avoiding restricted regions defined on the surface. All source codes, experimental data, and videos are available as open access at https://sites.google.com/view/diffusion-virtual-fixtures
△ Less
Submitted 4 November, 2024;
originally announced November 2024.
-
Tactile Ergodic Coverage on Curved Surfaces
Authors:
Cem Bilaloglu,
Tobias Löw,
Sylvain Calinon
Abstract:
In this article, we present a feedback control method for tactile coverage tasks, such as cleaning or surface inspection. These tasks are challenging to plan due to complex continuous physical interactions. In these tasks, the coverage target and progress can be easily measured using a camera and encoded in a point cloud. We propose an ergodic coverage method that operates directly on point clouds…
▽ More
In this article, we present a feedback control method for tactile coverage tasks, such as cleaning or surface inspection. These tasks are challenging to plan due to complex continuous physical interactions. In these tasks, the coverage target and progress can be easily measured using a camera and encoded in a point cloud. We propose an ergodic coverage method that operates directly on point clouds, guiding the robot to spend more time on regions requiring more coverage. For robot control and contact behavior, we use geometric algebra to formulate a task-space impedance controller that tracks a line while simultaneously exerting a desired force along that line. We evaluate the performance of our method in kinematic simulations and demonstrate its applicability in real-world experiments on kitchenware. Our source codes, experimental data, and videos are available as open access at https://sites.google.com/view/tactile-ergodic-control/
△ Less
Submitted 31 March, 2025; v1 submitted 7 February, 2024;
originally announced February 2024.
-
Extending the Cooperative Dual-Task Space in Conformal Geometric Algebra
Authors:
Tobias Löw,
Sylvain Calinon
Abstract:
In this work, we are presenting an extension of the cooperative dual-task space (CDTS) in conformal geometric algebra. The CDTS was first defined using dual quaternion algebra and is a well established framework for the simplified definition of tasks using two manipulators. By integrating conformal geometric algebra, we aim to further enhance the geometric expressiveness and thus simplify the mode…
▽ More
In this work, we are presenting an extension of the cooperative dual-task space (CDTS) in conformal geometric algebra. The CDTS was first defined using dual quaternion algebra and is a well established framework for the simplified definition of tasks using two manipulators. By integrating conformal geometric algebra, we aim to further enhance the geometric expressiveness and thus simplify the modeling of various tasks. We show this formulation by first presenting the CDTS and then its extension that is based around a cooperative pointpair. This extension keeps all the benefits of the original formulation that is based on dual quaternions, but adds more tools for geometric modeling of the dual-arm tasks. We also present how this CGA-CDTS can be seamlessly integrated with an optimal control framework in geometric algebra that was derived in previous work. In the experiments, we demonstrate how to model different objectives and constraints using the CGA-CDTS. Using a setup of two Franka Emika robots we then show the effectiveness of our approach using model predictive control in real world experiments.
△ Less
Submitted 29 October, 2023;
originally announced October 2023.
-
gafro: Geometric Algebra for Robotics
Authors:
Tobias Löw,
Philip Abbet,
Sylvain Calinon
Abstract:
Geometry is a fundamental part of robotics and there have been various frameworks of representation over the years. Recently, geometric algebra has gained attention for its property of unifying many of those previous ideas into one algebra. While there are already efficient open-source implementations of geometric algebra available, none of them is targeted at robotics applications. We want to add…
▽ More
Geometry is a fundamental part of robotics and there have been various frameworks of representation over the years. Recently, geometric algebra has gained attention for its property of unifying many of those previous ideas into one algebra. While there are already efficient open-source implementations of geometric algebra available, none of them is targeted at robotics applications. We want to address this shortcoming with our library gafro. This article presents an overview of the implementation details as well as a tutorial of gafro, an efficient c++ library targeting robotics applications using geometric algebra. The library focuses on using conformal geometric algebra. Hence, various geometric primitives are available for computation as well as rigid body transformations. The modeling of robotic systems is also an important aspect of the library. It implements various algorithms for calculating the kinematics and dynamics of such systems as well as objectives for optimisation problems. The software stack is completed by python bindings in pygafro and a ROS interface in gafro_ros.
△ Less
Submitted 29 October, 2023;
originally announced October 2023.
-
Geometric Projectors: Geometric Constraints based Optimization for Robot Behaviors
Authors:
Xuemin Chi,
Tobias Löw,
Yiming Li,
Zhitao Liu,
Sylvain Calinon
Abstract:
Generating motion for robots that interact with objects of various shapes is a complex challenge, further complicated when the robot's own geometry and multiple desired behaviors are considered. To address this issue, we introduce a new framework based on Geometric Projectors (GeoPro) for constrained optimization. This novel framework allows for the generation of task-agnostic behaviors that are c…
▽ More
Generating motion for robots that interact with objects of various shapes is a complex challenge, further complicated when the robot's own geometry and multiple desired behaviors are considered. To address this issue, we introduce a new framework based on Geometric Projectors (GeoPro) for constrained optimization. This novel framework allows for the generation of task-agnostic behaviors that are compliant with geometric constraints. GeoPro streamlines the design of behaviors in both task and configuration spaces, offering diverse functionalities such as collision avoidance and goal-reaching, while maintaining high computational efficiency. We validate the efficacy of our work through simulations and Franka Emika robotic experiments, comparing its performance against state-of-the-art methodologies. This comprehensive evaluation highlights GeoPro's versatility in accommodating robots with varying dynamics and precise geometric shapes. For additional materials, please visit: https://www.xueminchi.com/publications/geopro
△ Less
Submitted 15 September, 2023;
originally announced September 2023.
-
Projection-based first-order constrained optimization solver for robotics
Authors:
Hakan Girgin,
Tobias Löw,
Teng Xue,
Sylvain Calinon
Abstract:
Robot programming tools ranging from inverse kinematics (IK) to model predictive control (MPC) are most often described as constrained optimization problems. Even though there are currently many commercially-available second-order solvers, robotics literature recently focused on efficient implementations and improvements over these solvers for real-time robotic applications. However, most often, t…
▽ More
Robot programming tools ranging from inverse kinematics (IK) to model predictive control (MPC) are most often described as constrained optimization problems. Even though there are currently many commercially-available second-order solvers, robotics literature recently focused on efficient implementations and improvements over these solvers for real-time robotic applications. However, most often, these implementations stay problem-specific and are not easy to access or implement, or do not exploit the geometric aspect of the robotics problems. In this work, we propose to solve these problems using a fast, easy-to-implement first-order method that fully exploits the geometric constraints via Euclidean projections, called Augmented Lagrangian Spectral Projected Gradient Descent (ALSPG). We show that 1. using projections instead of full constraints and gradients improves the performance of the solver and 2. ALSPG stays competitive to the standard second-order methods such as iLQR in the unconstrained case. We showcase these results with IK and motion planning problems on simulated examples and with an MPC problem on a 7-axis manipulator experiment.
△ Less
Submitted 30 June, 2023;
originally announced June 2023.
-
Whole-Body Ergodic Exploration with a Manipulator Using Diffusion
Authors:
Cem Bilaloglu,
Tobias Löw,
Sylvain Calinon
Abstract:
This paper presents a whole-body robot control method for exploring and probing a given region of interest. The ergodic control formalism behind such an exploration behavior consists of matching the time-averaged statistics of a robot trajectory with the spatial statistics of the target distribution. Most existing ergodic control approaches assume the robots/sensors as individual point agents movi…
▽ More
This paper presents a whole-body robot control method for exploring and probing a given region of interest. The ergodic control formalism behind such an exploration behavior consists of matching the time-averaged statistics of a robot trajectory with the spatial statistics of the target distribution. Most existing ergodic control approaches assume the robots/sensors as individual point agents moving in space. We introduce an approach that decomposes the whole-body of a robotic manipulator into multiple kinematically constrained agents. Then, we generate control actions by calculating a consensus among the agents. To do so, we use an ergodic control formulation called heat equation-driven area coverage (HEDAC) and slow the diffusion using the non-stationary heat equation. Our approach extends HEDAC to applications where robots have multiple sensors on the whole-body (such as tactile skin) and use all sensors to optimally explore the given region. We show that our approach increases the exploration performance in terms of ergodicity and scales well to real-world problems. We compare our method in kinematic simulations with the state-of-the-art and demonstrate the applicability of an online exploration task with a 7-axis Franka Emika robot. Additional material available at https://sites.google.com/view/w-ee-d/
△ Less
Submitted 3 November, 2023; v1 submitted 29 June, 2023;
originally announced June 2023.
-
SMaLL: A Software Framework for portable Machine Learning Libraries
Authors:
Upasana Sridhar,
Nicholai Tukanov,
Elliott Binder,
Tze Meng Low,
Scott McMillan,
Martin D. Schatz
Abstract:
Interest in deploying Deep Neural Network (DNN) inference on edge devices has resulted in an explosion of the number and types of hardware platforms to use. While the high-level programming interface, such as TensorFlow, can be readily ported across different devices, high-performance inference implementations rely on a good mapping of the high-level interface to the target hardware platform. Comm…
▽ More
Interest in deploying Deep Neural Network (DNN) inference on edge devices has resulted in an explosion of the number and types of hardware platforms to use. While the high-level programming interface, such as TensorFlow, can be readily ported across different devices, high-performance inference implementations rely on a good mapping of the high-level interface to the target hardware platform. Commonly, this mapping may use optimizing compilers to generate code at compile time or high-performance vendor libraries that have been specialized to the target platform. Both approaches rely on expert knowledge to produce the mapping, which may be time-consuming and difficult to extend to new architectures.
In this work, we present a DNN library framework, SMaLL, that is easily extensible to new architectures. The framework uses a unified loop structure and shared, cache-friendly data format across all intermediate layers, eliminating the time and memory overheads incurred by data transformation between layers. Layers are implemented by simply specifying the layer's dimensions and a kernel -- the key computing operations of each layer. The unified loop structure and kernel abstraction allows us to reuse code across layers and computing platforms. New architectures only require the 100s of lines in the kernel to be redesigned. To show the benefits of our approach, we have developed software that supports a range of layer types and computing platforms, which is easily extensible for rapidly instantiating high performance DNN libraries.
We evaluate our software by instantiating networks from the TinyMLPerf benchmark suite on 5 ARM platforms and 1 x86 platform ( an AMD Zen 2). Our framework shows end-to-end performance that is comparable to or better than ML Frameworks such as TensorFlow, TVM and LibTorch.
△ Less
Submitted 8 March, 2023;
originally announced March 2023.
-
Geometric Algebra for Optimal Control with Applications in Manipulation Tasks
Authors:
Tobias Löw,
Sylvain Calinon
Abstract:
Many problems in robotics are fundamentally problems of geometry, which lead to an increased research effort in geometric methods for robotics in recent years. The results were algorithms using the various frameworks of screw theory, Lie algebra and dual quaternions. A unification and generalization of these popular formalisms can be found in geometric algebra. The aim of this paper is to showcase…
▽ More
Many problems in robotics are fundamentally problems of geometry, which lead to an increased research effort in geometric methods for robotics in recent years. The results were algorithms using the various frameworks of screw theory, Lie algebra and dual quaternions. A unification and generalization of these popular formalisms can be found in geometric algebra. The aim of this paper is to showcase the capabilities of geometric algebra when applied to robot manipulation tasks. In particular the modelling of cost functions for optimal control can be done uniformly across different geometric primitives leading to a low symbolic complexity of the resulting expressions and a geometric intuitiveness. We demonstrate the usefulness, simplicity and computational efficiency of geometric algebra in several experiments using a Franka Emika robot. The presented algorithms were implemented in c++20 and resulted in the publicly available library \textit{gafro}. The benchmark shows faster computation of the kinematics than state-of-the-art robotics libraries.
△ Less
Submitted 5 June, 2023; v1 submitted 14 December, 2022;
originally announced December 2022.
-
Automating Vascular Shunt Insertion with the dVRK Surgical Robot
Authors:
Karthik Dharmarajan,
Will Panitch,
Muyan Jiang,
Kishore Srinivas,
Baiyu Shi,
Yahav Avigal,
Huang Huang,
Thomas Low,
Danyal Fer,
Ken Goldberg
Abstract:
Vascular shunt insertion is a fundamental surgical procedure used to temporarily restore blood flow to tissues. It is often performed in the field after major trauma. We formulate a problem of automated vascular shunt insertion and propose a pipeline to perform Automated Vascular Shunt Insertion (AVSI) using a da Vinci Research Kit. The pipeline uses a learned visual model to estimate the locus of…
▽ More
Vascular shunt insertion is a fundamental surgical procedure used to temporarily restore blood flow to tissues. It is often performed in the field after major trauma. We formulate a problem of automated vascular shunt insertion and propose a pipeline to perform Automated Vascular Shunt Insertion (AVSI) using a da Vinci Research Kit. The pipeline uses a learned visual model to estimate the locus of the vessel rim, plans a grasp on the rim, and moves to grasp at that point. The first robot gripper then pulls the rim to stretch open the vessel with a dilation motion. The second robot gripper then proceeds to insert a shunt into the vessel phantom (a model of the blood vessel) with a chamfer tilt followed by a screw motion. Results suggest that AVSI achieves a high success rate even with tight tolerances and varying vessel orientations up to 30°. Supplementary material, dataset, videos, and visualizations can be found at https://sites.google.com/berkeley.edu/autolab-avsi.
△ Less
Submitted 8 March, 2023; v1 submitted 4 November, 2022;
originally announced November 2022.
-
Tensor Train for Global Optimization Problems in Robotics
Authors:
Suhan Shetty,
Teguh Lembono,
Tobias Loew,
Sylvain Calinon
Abstract:
The convergence of many numerical optimization techniques is highly dependent on the initial guess given to the solver. To address this issue, we propose a novel approach that utilizes tensor methods to initialize existing optimization solvers near global optima. Our method does not require access to a database of good solutions. We first transform the cost function, which depends on both task par…
▽ More
The convergence of many numerical optimization techniques is highly dependent on the initial guess given to the solver. To address this issue, we propose a novel approach that utilizes tensor methods to initialize existing optimization solvers near global optima. Our method does not require access to a database of good solutions. We first transform the cost function, which depends on both task parameters and optimization variables, into a probability density function. Unlike existing approaches, the joint probability distribution of the task parameters and optimization variables is approximated using the Tensor Train model, which enables efficient conditioning and sampling. We treat the task parameters as random variables, and for a given task, we generate samples for decision variables from the conditional distribution to initialize the optimization solver. Our method can produce multiple solutions (when they exist) faster than existing methods. We first evaluate the approach on benchmark functions for numerical optimization that are hard to solve using gradient-based optimization solvers with a naive initialization. The results show that the proposed method can generate samples close to global optima and from multiple modes. We then demonstrate the generality and relevance of our framework to robotics by applying it to inverse kinematics with obstacles and motion planning problems with a 7-DoF manipulator.
△ Less
Submitted 22 November, 2023; v1 submitted 10 June, 2022;
originally announced June 2022.
-
Delayed Asynchronous Iterative Graph Algorithms
Authors:
Mark P. Blanco,
Scott McMillan,
Tze Meng Low
Abstract:
Iterative graph algorithms often compute intermediate values and update them as computation progresses. Updated output values are used as inputs for computations in current or subsequent iterations; hence the number of iterations required for values to converge can potentially reduce if the newest values are asynchronously made available to other updates computed in the same iteration. In a multi-…
▽ More
Iterative graph algorithms often compute intermediate values and update them as computation progresses. Updated output values are used as inputs for computations in current or subsequent iterations; hence the number of iterations required for values to converge can potentially reduce if the newest values are asynchronously made available to other updates computed in the same iteration. In a multi-threaded shared memory system, the immediate propagation of updated values can cause memory contention that may offset the benefit of propagating updates sooner. In some cases, the benefit of a smaller number of iterations may be diminished by each iteration taking longer. Our key idea is to combine the low memory contention that synchronous approaches have with the faster information sharing of asynchronous approaches. Our hybrid approach buffers updates from threads locally before committing them to the global store to control how often threads may cause conflicts for others while still sharing data within one iteration and hence speeding convergence. On a 112-thread CPU system, our hybrid approach attains up to 4.5% - 19.4% speedup over an asynchronous approach for Pagerank and up to 1.9% - 17% speedup over asynchronous Bellman Ford SSSP. Further, our hybrid approach attains 2.56x better performance than the synchronous approach. Finally, we provide insights as to why delaying updates is not helpful on certain graphs where connectivity is clustered on the main diagonal of the adjacency matrix.
△ Less
Submitted 29 September, 2021;
originally announced October 2021.
-
Automating Surgical Peg Transfer: Calibration with Deep Learning Can Exceed Speed, Accuracy, and Consistency of Humans
Authors:
Minho Hwang,
Jeffrey Ichnowski,
Brijen Thananjeyan,
Daniel Seita,
Samuel Paradis,
Danyal Fer,
Thomas Low,
Ken Goldberg
Abstract:
Peg transfer is a well-known surgical training task in the Fundamentals of Laparoscopic Surgery (FLS). While human sur-geons teleoperate robots such as the da Vinci to perform this task with high speed and accuracy, it is challenging to automate. This paper presents a novel system and control method using a da Vinci Research Kit (dVRK) surgical robot and a Zivid depth sensor, and a human subjects…
▽ More
Peg transfer is a well-known surgical training task in the Fundamentals of Laparoscopic Surgery (FLS). While human sur-geons teleoperate robots such as the da Vinci to perform this task with high speed and accuracy, it is challenging to automate. This paper presents a novel system and control method using a da Vinci Research Kit (dVRK) surgical robot and a Zivid depth sensor, and a human subjects study comparing performance on three variants of the peg-transfer task: unilateral, bilateral without handovers, and bilateral with handovers. The system combines 3D printing, depth sensing, and deep learning for calibration with a new analytic inverse kinematics model and a time-minimized motion controller. In a controlled study of 3384 peg transfer trials performed by the system, an expert surgical resident, and 9 volunteers, results suggest that the system achieves accuracy on par with the experienced surgical resident and is significantly faster and more consistent than the surgical resident and volunteers. The system also exhibits the highest consistency and lowest collision rate. To our knowledge, this is the first autonomous system to achieve superhuman performance on a standardized surgical task.
△ Less
Submitted 15 May, 2022; v1 submitted 23 December, 2020;
originally announced December 2020.
-
Intermittent Visual Servoing: Efficiently Learning Policies Robust to Instrument Changes for High-precision Surgical Manipulation
Authors:
Samuel Paradis,
Minho Hwang,
Brijen Thananjeyan,
Jeffrey Ichnowski,
Daniel Seita,
Danyal Fer,
Thomas Low,
Joseph E. Gonzalez,
Ken Goldberg
Abstract:
Automation of surgical tasks using cable-driven robots is challenging due to backlash, hysteresis, and cable tension, and these issues are exacerbated as surgical instruments must often be changed during an operation. In this work, we propose a framework for automation of high-precision surgical tasks by learning sample efficient, accurate, closed-loop policies that operate directly on visual feed…
▽ More
Automation of surgical tasks using cable-driven robots is challenging due to backlash, hysteresis, and cable tension, and these issues are exacerbated as surgical instruments must often be changed during an operation. In this work, we propose a framework for automation of high-precision surgical tasks by learning sample efficient, accurate, closed-loop policies that operate directly on visual feedback instead of robot encoder estimates. This framework, which we call intermittent visual servoing (IVS), intermittently switches to a learned visual servo policy for high-precision segments of repetitive surgical tasks while relying on a coarse open-loop policy for the segments where precision is not necessary. To compensate for cable-related effects, we apply imitation learning to rapidly train a policy that maps images of the workspace and instrument from a top-down RGB camera to small corrective motions. We train the policy using only 180 human demonstrations that are roughly 2 seconds each. Results on a da Vinci Research Kit suggest that combining the coarse policy with half a second of corrections from the learned policy during each high-precision segment improves the success rate on the Fundamentals of Laparoscopic Surgery peg transfer task from 72.9% to 99.2%, 31.3% to 99.2%, and 47.2% to 100.0% for 3 instruments with differing cable-related effects. In the contexts we studied, IVS attains the highest published success rates for automated surgical peg transfer and is significantly more reliable than previous techniques when instruments are changed. Supplementary material is available at https://tinyurl.com/ivs-icra.
△ Less
Submitted 11 November, 2020;
originally announced November 2020.
-
Towards an Objective Metric for the Performance of Exact Triangle Count
Authors:
Mark P. Blanco,
Scott McMillan,
Tze Meng Low
Abstract:
The performance of graph algorithms is often measured in terms of the number of traversed edges per second (TEPS). However, this performance metric is inadequate for a graph operation such as exact triangle counting. In triangle counting, execution times on graphs with a similar number of edges can be distinctly different as demonstrated by results from the past Graph Challenge entries. We discuss…
▽ More
The performance of graph algorithms is often measured in terms of the number of traversed edges per second (TEPS). However, this performance metric is inadequate for a graph operation such as exact triangle counting. In triangle counting, execution times on graphs with a similar number of edges can be distinctly different as demonstrated by results from the past Graph Challenge entries. We discuss the need for an objective performance metric for graph operations and the desired characteristics of such a metric such that it more accurately captures the interactions between the amount of work performed and the capabilities of the hardware on which the code is executed. Using exact triangle counting as an example, we derive a metric that captures how certain techniques employed in many implementations improve performance. We demonstrate that our proposed metric can be used to evaluate and compare multiple approaches for triangle counting, using a SIMD approach as a case study against a scalar baseline.
△ Less
Submitted 29 September, 2021; v1 submitted 16 September, 2020;
originally announced September 2020.
-
Exploration of Fine-Grained Parallelism for Load Balancing Eager K-truss on GPU and CPU
Authors:
Mark Blanco,
Tze Meng Low,
Kyungjoo Kim
Abstract:
In this work we present a performance exploration on Eager K-truss, a linear-algebraic formulation of the K-truss graph algorithm. We address performance issues related to load imbalance of parallel tasks in symmetric, triangular graphs by presenting a fine-grained parallel approach to executing the support computation. This approach also increases available parallelism, making it amenable to GPU…
▽ More
In this work we present a performance exploration on Eager K-truss, a linear-algebraic formulation of the K-truss graph algorithm. We address performance issues related to load imbalance of parallel tasks in symmetric, triangular graphs by presenting a fine-grained parallel approach to executing the support computation. This approach also increases available parallelism, making it amenable to GPU execution. We demonstrate our fine-grained parallel approach using implementations in Kokkos and evaluate them on an Intel Skylake CPU and an Nvidia Tesla V100 GPU. Overall, we observe between a 1.261. 48x improvement on the CPU and a 9.97-16.92x improvement on the GPU due to our fine-grained parallel formulation.
△ Less
Submitted 16 September, 2020;
originally announced September 2020.
-
Efficiently Calibrating Cable-Driven Surgical Robots with RGBD Fiducial Sensing and Recurrent Neural Networks
Authors:
Minho Hwang,
Brijen Thananjeyan,
Samuel Paradis,
Daniel Seita,
Jeffrey Ichnowski,
Danyal Fer,
Thomas Low,
Ken Goldberg
Abstract:
Automation of surgical subtasks using cable-driven robotic surgical assistants (RSAs) such as Intuitive Surgical's da Vinci Research Kit (dVRK) is challenging due to imprecision in control from cable-related effects such as cable stretching and hysteresis. We propose a novel approach to efficiently calibrate such robots by placing a 3D printed fiducial coordinate frames on the arm and end-effector…
▽ More
Automation of surgical subtasks using cable-driven robotic surgical assistants (RSAs) such as Intuitive Surgical's da Vinci Research Kit (dVRK) is challenging due to imprecision in control from cable-related effects such as cable stretching and hysteresis. We propose a novel approach to efficiently calibrate such robots by placing a 3D printed fiducial coordinate frames on the arm and end-effector that is tracked using RGBD sensing. To measure the coupling and history-dependent effects between joints, we analyze data from sampled trajectories and consider 13 approaches to modeling. These models include linear regression and LSTM recurrent neural networks, each with varying temporal window length to provide compensatory feedback. With the proposed method, data collection of 1800 samples takes 31 minutes and model training takes under 1 minute. Results on a test set of reference trajectories suggest that the trained model can reduce the mean tracking error of the physical robot from 2.96 mm to 0.65 mm. Results on the execution of open-loop trajectories of the FLS peg transfer surgeon training task suggest that the best model increases success rate from 39.4 % to 96.7 %, producing performance comparable to that of an expert surgical resident. Supplementary materials, including code and 3D-printable models, are available at https://sites.google.com/berkeley.edu/surgical-calibration
△ Less
Submitted 31 July, 2020; v1 submitted 18 March, 2020;
originally announced March 2020.
-
Applying Depth-Sensing to Automated Surgical Manipulation with a da Vinci Robot
Authors:
Minho Hwang,
Daniel Seita,
Brijen Thananjeyan,
Jeffrey Ichnowski,
Samuel Paradis,
Danyal Fer,
Thomas Low,
Ken Goldberg
Abstract:
Recent advances in depth-sensing have significantly increased accuracy, resolution, and frame rate, as shown in the 1920x1200 resolution and 13 frames per second Zivid RGBD camera. In this study, we explore the potential of depth sensing for efficient and reliable automation of surgical subtasks. We consider a monochrome (all red) version of the peg transfer task from the Fundamentals of Laparosco…
▽ More
Recent advances in depth-sensing have significantly increased accuracy, resolution, and frame rate, as shown in the 1920x1200 resolution and 13 frames per second Zivid RGBD camera. In this study, we explore the potential of depth sensing for efficient and reliable automation of surgical subtasks. We consider a monochrome (all red) version of the peg transfer task from the Fundamentals of Laparoscopic Surgery training suite implemented with the da Vinci Research Kit (dVRK). We use calibration techniques that allow the imprecise, cable-driven da Vinci to reduce error from 4-5 mm to 1-2 mm in the task space. We report experimental results for a handover-free version of the peg transfer task, performing 20 and 5 physical episodes with single- and bilateral-arm setups, respectively. Results over 236 and 49 total block transfer attempts for the single- and bilateral-arm peg transfer cases suggest that reliability can be attained with 86.9 % and 78.0 % for each individual block, with respective block transfer speeds of 10.02 and 5.72 seconds. Supplementary material is available at https://sites.google.com/view/peg-transfer.
△ Less
Submitted 14 February, 2020;
originally announced February 2020.
-
Delta-stepping SSSP: from Vertices and Edges to GraphBLAS Implementations
Authors:
Upasana Sridhar,
Mark Blanco,
Rahul Mayuranath,
Daniele G. Spampinato,
Tze Meng Low,
Scott McMillan
Abstract:
GraphBLAS is an interface for implementing graph algorithms. Algorithms implemented using the GraphBLAS interface are cast in terms of linear algebra-like operations. However, many graph algorithms are canonically described in terms of operations on vertices and/or edges. Despite the known duality between these two representations, the differences in the way algorithms are described using the two…
▽ More
GraphBLAS is an interface for implementing graph algorithms. Algorithms implemented using the GraphBLAS interface are cast in terms of linear algebra-like operations. However, many graph algorithms are canonically described in terms of operations on vertices and/or edges. Despite the known duality between these two representations, the differences in the way algorithms are described using the two approaches can pose considerable difficulties in the adoption of the GraphBLAS as standard interface for development. This paper investigates a systematic approach for translating a graph algorithm described in the canonical vertex and edge representation into an implementation that leverages the GraphBLAS interface. We present a two-step approach to this problem. First, we express common vertex- and edge-centric design patterns using a linear algebraic language. Second, we map this intermediate representation to the GraphBLAS interface. We illustrate our approach by translating the delta-stepping single source shortest path algorithm from its canonical description to a GraphBLAS implementation, and highlight lessons learned when implementing using GraphBLAS.
△ Less
Submitted 16 September, 2020; v1 submitted 15 November, 2019;
originally announced November 2019.
-
A Flexible Framework for Parallel Multi-Dimensional DFTs
Authors:
Doru Thom Popovici,
Martin D. Schatz,
Franz Franchetti,
Tze Meng Low
Abstract:
Multi-dimensional discrete Fourier transforms (DFT) are typically decomposed into multiple 1D transforms. Hence, parallel implementations of any multi-dimensional DFT focus on parallelizing within or across the 1D DFT. Existing DFT packages exploit the inherent parallelism across the 1D DFTs and offer rigid frameworks, that cannot be extended to incorporate both forms of parallelism and various da…
▽ More
Multi-dimensional discrete Fourier transforms (DFT) are typically decomposed into multiple 1D transforms. Hence, parallel implementations of any multi-dimensional DFT focus on parallelizing within or across the 1D DFT. Existing DFT packages exploit the inherent parallelism across the 1D DFTs and offer rigid frameworks, that cannot be extended to incorporate both forms of parallelism and various data layouts to enable some of the parallelism. However, in the era of exascale, where systems have thousand of nodes and intricate network topologies, flexibility and parallel efficiency are key aspects all multi-dimensional DFT frameworks need to have in order to map and scale the computation appropriately. In this work, we present a flexible framework, built on the Redistribution Operations and Tensor Expressions (ROTE) framework, that facilitates the development of a family of parallel multi-dimensional DFT algorithms by 1) unifying the two parallelization schemes within a single framework, 2) exploiting the two different parallelization schemes to different degrees and 3) using different data layouts to distribute the data across the compute nodes. We demonstrate the need of a versatile framework and thus a need for a family of parallel multi-dimensional DFT algorithms on the K-Computer, where we show almost linear strong scaling results for problem sizes of 1024^3 on 32k compute nodes.
△ Less
Submitted 22 December, 2019; v1 submitted 22 April, 2019;
originally announced April 2019.
-
CodeNet: Training Large Scale Neural Networks in Presence of Soft-Errors
Authors:
Sanghamitra Dutta,
Ziqian Bai,
Tze Meng Low,
Pulkit Grover
Abstract:
This work proposes the first strategy to make distributed training of neural networks resilient to computing errors, a problem that has remained unsolved despite being first posed in 1956 by von Neumann. He also speculated that the efficiency and reliability of the human brain is obtained by allowing for low power but error-prone components with redundancy for error-resilience. It is surprising th…
▽ More
This work proposes the first strategy to make distributed training of neural networks resilient to computing errors, a problem that has remained unsolved despite being first posed in 1956 by von Neumann. He also speculated that the efficiency and reliability of the human brain is obtained by allowing for low power but error-prone components with redundancy for error-resilience. It is surprising that this problem remains open, even as massive artificial neural networks are being trained on increasingly low-cost and unreliable processing units. Our coding-theory-inspired strategy, "CodeNet," solves this problem by addressing three challenges in the science of reliable computing: (i) Providing the first strategy for error-resilient neural network training by encoding each layer separately; (ii) Keeping the overheads of coding (encoding/error-detection/decoding) low by obviating the need to re-encode the updated parameter matrices after each iteration from scratch. (iii) Providing a completely decentralized implementation with no central node (which is a single point of failure), allowing all primary computational steps to be error-prone. We theoretically demonstrate that CodeNet has higher error tolerance than replication, which we leverage to speed up computation time. Simultaneously, CodeNet requires lower redundancy than replication, and equal computational and communication costs in scaling sense. We first demonstrate the benefits of CodeNet in reducing expected computation time over replication when accounting for checkpointing. Our experiments show that CodeNet achieves the best accuracy-runtime tradeoff compared to both replication and uncoded strategies. CodeNet is a significant step towards biologically plausible neural network training, that could hold the key to orders of magnitude efficiency improvements.
△ Less
Submitted 3 March, 2019;
originally announced March 2019.
-
DESK: A Robotic Activity Dataset for Dexterous Surgical Skills Transfer to Medical Robots
Authors:
Naveen Madapana,
Md Masudur Rahman,
Natalia Sanchez-Tamayo,
Mythra V. Balakuntala,
Glebys Gonzalez,
Jyothsna Padmakumar Bindu,
L. N. Vishnunandan Venkatesh,
Xingguang Zhang,
Juan Barragan Noguera,
Thomas Low,
Richard Voyles,
Yexiang Xue,
Juan Wachs
Abstract:
Datasets are an essential component for training effective machine learning models. In particular, surgical robotic datasets have been key to many advances in semi-autonomous surgeries, skill assessment, and training. Simulated surgical environments can enhance the data collection process by making it faster, simpler and cheaper than real systems. In addition, combining data from multiple robotic…
▽ More
Datasets are an essential component for training effective machine learning models. In particular, surgical robotic datasets have been key to many advances in semi-autonomous surgeries, skill assessment, and training. Simulated surgical environments can enhance the data collection process by making it faster, simpler and cheaper than real systems. In addition, combining data from multiple robotic domains can provide rich and diverse training data for transfer learning algorithms. In this paper, we present the DESK (Dexterous Surgical Skill) dataset. It comprises a set of surgical robotic skills collected during a surgical training task using three robotic platforms: the Taurus II robot, Taurus II simulated robot, and the YuMi robot. This dataset was used to test the idea of transferring knowledge across different domains (e.g. from Taurus to YuMi robot) for a surgical gesture classification task with seven gestures. We explored three different scenarios: 1) No transfer, 2) Transfer from simulated Taurus to real Taurus and 3) Transfer from Simulated Taurus to the YuMi robot. We conducted extensive experiments with three supervised learning models and provided baselines in each of these scenarios. Results show that using simulation data during training enhances the performance on the real robot where limited real data is available. In particular, we obtained an accuracy of 55% on the real Taurus data using a model that is trained only on the simulator data. Furthermore, we achieved an accuracy improvement of 34% when 3% of the real data is added into the training process.
△ Less
Submitted 3 March, 2019;
originally announced March 2019.
-
A Unified Coded Deep Neural Network Training Strategy Based on Generalized PolyDot Codes for Matrix Multiplication
Authors:
Sanghamitra Dutta,
Ziqian Bai,
Haewon Jeong,
Tze Meng Low,
Pulkit Grover
Abstract:
This paper has two contributions. First, we propose a novel coded matrix multiplication technique called Generalized PolyDot codes that advances on existing methods for coded matrix multiplication under storage and communication constraints. This technique uses "garbage alignment," i.e., aligning computations in coded computing that are not a part of the desired output. Generalized PolyDot codes b…
▽ More
This paper has two contributions. First, we propose a novel coded matrix multiplication technique called Generalized PolyDot codes that advances on existing methods for coded matrix multiplication under storage and communication constraints. This technique uses "garbage alignment," i.e., aligning computations in coded computing that are not a part of the desired output. Generalized PolyDot codes bridge between Polynomial codes and MatDot codes, trading off between recovery threshold and communication costs. Second, we demonstrate that Generalized PolyDot can be used for training large Deep Neural Networks (DNNs) on unreliable nodes prone to soft-errors. This requires us to address three additional challenges: (i) prohibitively large overhead of coding the weight matrices in each layer of the DNN at each iteration; (ii) nonlinear operations during training, which are incompatible with linear coding; and (iii) not assuming presence of an error-free master node, requiring us to architect a fully decentralized implementation without any "single point of failure." We allow all primary DNN training steps, namely, matrix multiplication, nonlinear activation, Hadamard product, and update steps as well as the encoding/decoding to be error-prone. We consider the case of mini-batch size $B=1$, as well as $B>1$, leveraging coded matrix-vector products, and matrix-matrix products respectively. The problem of DNN training under soft-errors also motivates an interesting, probabilistic error model under which a real number $(P,Q)$ MDS code is shown to correct $P-Q-1$ errors with probability $1$ as compared to $\lfloor \frac{P-Q}{2} \rfloor$ for the more conventional, adversarial error model. We also demonstrate that our proposed strategy can provide unbounded gains in error tolerance over a competing replication strategy and a preliminary MDS-code-based strategy for both these error models.
△ Less
Submitted 26 November, 2018;
originally announced November 2018.
-
High Performance Zero-Memory Overhead Direct Convolutions
Authors:
Jiyuan Zhang,
Franz Franchetti,
Tze Meng Low
Abstract:
The computation of convolution layers in deep neural networks typically rely on high performance routines that trade space for time by using additional memory (either for packing purposes or required as part of the algorithm) to improve performance. The problems with such an approach are two-fold. First, these routines incur additional memory overhead which reduces the overall size of the network…
▽ More
The computation of convolution layers in deep neural networks typically rely on high performance routines that trade space for time by using additional memory (either for packing purposes or required as part of the algorithm) to improve performance. The problems with such an approach are two-fold. First, these routines incur additional memory overhead which reduces the overall size of the network that can fit on embedded devices with limited memory capacity. Second, these high performance routines were not optimized for performing convolution, which means that the performance obtained is usually less than conventionally expected. In this paper, we demonstrate that direct convolution, when implemented correctly, eliminates all memory overhead, and yields performance that is between 10% to 400% times better than existing high performance implementations of convolution layers on conventional and embedded CPU architectures. We also show that a high performance direct convolution exhibits better scaling performance, i.e. suffers less performance drop, when increasing the number of threads.
△ Less
Submitted 19 September, 2018;
originally announced September 2018.
-
Coded FFT and Its Communication Overhead
Authors:
Haewon Jeong,
Tze Meng Low,
Pulkit Grover
Abstract:
We propose a coded computing strategy and examine communication costs of coded computing algorithms to make distributed Fast Fourier Transform (FFT) resilient to errors during the computation. We apply maximum distance separable (MDS) codes to a widely used "Transpose" algorithm for parallel FFT. In the uncoded distributed FFT algorithm, the most expensive step is a single "all-to-all" communicati…
▽ More
We propose a coded computing strategy and examine communication costs of coded computing algorithms to make distributed Fast Fourier Transform (FFT) resilient to errors during the computation. We apply maximum distance separable (MDS) codes to a widely used "Transpose" algorithm for parallel FFT. In the uncoded distributed FFT algorithm, the most expensive step is a single "all-to-all" communication. We compare this with communication overhead of coding in our coded FFT algorithm. We show that by using a systematic MDS code, the communication overhead of coding is negligible in comparison with the communication costs inherent in an uncoded FFT implementation if the number of parity nodes is at most $o(\log K)$, where $K$ is the number of systematic nodes.
△ Less
Submitted 24 May, 2018;
originally announced May 2018.
-
Automating the Last-Mile for High Performance Dense Linear Algebra
Authors:
Richard Michael Veras,
Tze Meng Low,
Tyler Michael Smith,
Robert van de Geijn,
Franz Franchetti
Abstract:
High performance dense linear algebra (DLA) libraries often rely on a general matrix multiply (Gemm) kernel that is implemented using assembly or with vector intrinsics. In particular, the real-valued Gemm kernels provide the overwhelming fraction of performance for the complex-valued Gemm kernels, along with the entire level-3 BLAS and many of the real and complex LAPACK routines. Thus,achieving…
▽ More
High performance dense linear algebra (DLA) libraries often rely on a general matrix multiply (Gemm) kernel that is implemented using assembly or with vector intrinsics. In particular, the real-valued Gemm kernels provide the overwhelming fraction of performance for the complex-valued Gemm kernels, along with the entire level-3 BLAS and many of the real and complex LAPACK routines. Thus,achieving high performance for the Gemm kernel translates into a high performance linear algebra stack above this kernel. However, it is a monumental task for a domain expert to manually implement the kernel for every library-supported architecture. This leads to the belief that the craft of a Gemm kernel is more dark art than science. It is this premise that drives the popularity of autotuning with code generation in the domain of DLA.
This paper, instead, focuses on an analytical approach to code generation of the Gemm kernel for different architecture, in order to shed light on the details or voo-doo required for implementing a high performance Gemm kernel. We distill the implementation of the kernel into an even smaller kernel, an outer-product, and analytically determine how available SIMD instructions can be used to compute the outer-product efficiently. We codify this approach into a system to automatically generate a high performance SIMD implementation of the Gemm kernel. Experimental results demonstrate that our approach yields generated kernels with performance that is competitive with kernels implemented manually or using empirical search.
△ Less
Submitted 28 April, 2017; v1 submitted 23 November, 2016;
originally announced November 2016.
-
Exploiting Symmetry in Tensors for High Performance: Multiplication with Symmetric Tensors
Authors:
Martin D. Schatz,
Tze Meng Low,
Robert A. van de Geijn,
Tamara G. Kolda
Abstract:
Symmetric tensor operations arise in a wide variety of computations. However, the benefits of exploiting symmetry in order to reduce storage and computation is in conflict with a desire to simplify memory access patterns. In this paper, we propose a blocked data structure (Blocked Compact Symmetric Storage) wherein we consider the tensor by blocks and store only the unique blocks of a symmetric te…
▽ More
Symmetric tensor operations arise in a wide variety of computations. However, the benefits of exploiting symmetry in order to reduce storage and computation is in conflict with a desire to simplify memory access patterns. In this paper, we propose a blocked data structure (Blocked Compact Symmetric Storage) wherein we consider the tensor by blocks and store only the unique blocks of a symmetric tensor. We propose an algorithm-by-blocks, already shown of benefit for matrix computations, that exploits this storage format by utilizing a series of temporary tensors to avoid redundant computation. Further, partial symmetry within temporaries is exploited to further avoid redundant storage and redundant computation. A detailed analysis shows that, relative to storing and computing with tensors without taking advantage of symmetry and partial symmetry, storage requirements are reduced by a factor of $ O\left( m! \right)$ and computational requirements by a factor of $O\left( (m+1)!/2^m \right)$, where $ m $ is the order of the tensor. However, as the analysis shows, care must be taken in choosing the correct block size to ensure these storage and computational benefits are achieved (particularly for low-order tensors). An implementation demonstrates that storage is greatly reduced and the complexity introduced by storing and computing with tensors by blocks is manageable. Preliminary results demonstrate that computational time is also reduced. The paper concludes with a discussion of how insights in this paper point to opportunities for generalizing recent advances in the domain of linear algebra libraries to the field of multi-linear computation.
△ Less
Submitted 9 April, 2014; v1 submitted 31 January, 2013;
originally announced January 2013.
-
A literature review: What exactly should we preserve? How scholars address this question and where is the gap
Authors:
Jyue Tyan Low
Abstract:
This review addresses the question of what exactly should we preserve, and how the digital preservation community and scholars address this question. The paper first introduces the much-abused-term "significant properties," before revealing how some scholars are of the opinion that characteristics of digital objects to be preserved (i.e., significant properties) can be identified and should be exp…
▽ More
This review addresses the question of what exactly should we preserve, and how the digital preservation community and scholars address this question. The paper first introduces the much-abused-term "significant properties," before revealing how some scholars are of the opinion that characteristics of digital objects to be preserved (i.e., significant properties) can be identified and should be expressed formally, while others are not of that opinion. The digital preservation community's attempt to expound on the general characteristics of digital objects and significant properties will then be discussed. Finally, the review shows that while there may be ways to identify the technical makeup or general characteristics of a digital object, there is currently no formal and objective methodology to help stakeholders identify and decide what the significant properties of the objects are. This review thus helps open questions and generates a formative recommendation based on expert opinion that expressing an object's functions in an explicit and formal way (using didactic guides from the archives community) could be the solution to help stakeholders decide what characteristics/ elements exactly we should preserve.
△ Less
Submitted 7 December, 2011;
originally announced December 2011.
-
Inferring 3D Articulated Models for Box Packaging Robot
Authors:
Heran Yang,
Tiffany Low,
Matthew Cong,
Ashutosh Saxena
Abstract:
Given a point cloud, we consider inferring kinematic models of 3D articulated objects such as boxes for the purpose of manipulating them. While previous work has shown how to extract a planar kinematic model (often represented as a linear chain), such planar models do not apply to 3D objects that are composed of segments often linked to the other segments in cyclic configurations. We present an ap…
▽ More
Given a point cloud, we consider inferring kinematic models of 3D articulated objects such as boxes for the purpose of manipulating them. While previous work has shown how to extract a planar kinematic model (often represented as a linear chain), such planar models do not apply to 3D objects that are composed of segments often linked to the other segments in cyclic configurations. We present an approach for building a model that captures the relation between the input point cloud features and the object segment as well as the relation between the neighboring object segments. We use a conditional random field that allows us to model the dependencies between different segments of the object. We test our approach on inferring the kinematic structure from partial and noisy point cloud data for a wide variety of boxes including cake boxes, pizza boxes, and cardboard cartons of several sizes. The inferred structure enables our robot to successfully close these boxes by manipulating the flaps.
△ Less
Submitted 23 June, 2011;
originally announced June 2011.