-
Steering Large Agent Populations using Mean-Field Schrodinger Bridges with Gaussian Mixture Models
Authors:
George Rapakoulias,
Ali Reza Pedram,
Panagiotis Tsiotras
Abstract:
The Mean-Field Schrodinger Bridge (MFSB) problem is an optimization problem aiming to find the minimum effort control policy to drive a McKean-Vlassov stochastic differential equation from one probability measure to another. In the context of multiagent control, the objective is to control the configuration of a swarm of identical, interacting cooperative agents, as captured by the time-varying pr…
▽ More
The Mean-Field Schrodinger Bridge (MFSB) problem is an optimization problem aiming to find the minimum effort control policy to drive a McKean-Vlassov stochastic differential equation from one probability measure to another. In the context of multiagent control, the objective is to control the configuration of a swarm of identical, interacting cooperative agents, as captured by the time-varying probability measure of their state. Available methods for solving this problem for distributions with continuous support rely either on spatial discretizations of the problem's domain or on approximating optimal solutions using neural networks trained through stochastic optimization schemes. For agents following Linear Time-Varying dynamics, and for Gaussian Mixture Model boundary distributions, we propose a highly efficient parameterization to approximate the solutions of the corresponding MFSB in closed form, without any learning steps. Our proposed approach consists of a mixture of elementary policies, each solving a Gaussian-to-Gaussian Covariance Steering problem from the components of the initial to the components of the terminal mixture. Leveraging the semidefinite formulation of the Covariance Steering problem, our proposed solver can handle probabilistic hard constraints on the system's state, while maintaining numerical tractability. We illustrate our approach on a variety of numerical examples.
△ Less
Submitted 3 April, 2025; v1 submitted 31 March, 2025;
originally announced March 2025.
-
Communication-Aware Iterative Map Compression for Online Path-Planning
Authors:
Evangelos Psomiadis,
Ali Reza Pedram,
Dipankar Maity,
Panagiotis Tsiotras
Abstract:
This paper addresses the problem of optimizing communicated information among heterogeneous, resource-aware robot teams to facilitate their navigation. In such operations, a mobile robot compresses its local map to assist another robot in reaching a target within an uncharted environment. The primary challenge lies in ensuring that the map compression step balances network load while transmitting…
▽ More
This paper addresses the problem of optimizing communicated information among heterogeneous, resource-aware robot teams to facilitate their navigation. In such operations, a mobile robot compresses its local map to assist another robot in reaching a target within an uncharted environment. The primary challenge lies in ensuring that the map compression step balances network load while transmitting only the most essential information for effective navigation. We propose a communication framework that sequentially selects the optimal map compression in a task-driven, communication-aware manner. It introduces a decoder capable of iterative map estimation, handling noise through Kalman filter techniques. The computational speed of our decoder allows for a larger compression template set compared to previous methods, and enables applications in more challenging environments. Specifically, our simulations demonstrate a remarkable 98% reduction in communicated information, compared to a framework that transmits the raw data, on a large Mars inclination map and an Earth map, all while maintaining similar planning costs. Furthermore, our method significantly reduces computational time compared to the state-of-the-art approach.
△ Less
Submitted 13 March, 2025;
originally announced March 2025.
-
DFModel: Design Space Optimization of Large-Scale Systems Exploiting Dataflow Mappings
Authors:
Sho Ko,
Nathan Zhang,
Olivia Hsu,
Ardavan Pedram,
Kunle Olukotun
Abstract:
We propose DFModel, a modeling framework for mapping dataflow computation graphs onto large-scale systems. Mapping a workload to a system requires optimizing dataflow mappings at various levels, including the inter-chip (between chips) level and the intra-chip (within a chip) level. DFModel is, to the best of our knowledge, the first framework to perform the optimization at multiple levels of the…
▽ More
We propose DFModel, a modeling framework for mapping dataflow computation graphs onto large-scale systems. Mapping a workload to a system requires optimizing dataflow mappings at various levels, including the inter-chip (between chips) level and the intra-chip (within a chip) level. DFModel is, to the best of our knowledge, the first framework to perform the optimization at multiple levels of the memory hierarchy and the interconnection network hierarchy. We use DFModel to explore a wide range of workloads on a variety of systems. Evaluated workloads include two state-of-the-art machine learning applications (Large Language Models and Deep Learning Recommendation Models) and two high-performance computing applications (High Performance LINPACK and Fast Fourier Transform). System parameters investigated span the combination of dataflow and traditional accelerator architectures, memory technologies (DDR, HBM), interconnect technologies (PCIe, NVLink), and interconnection network topologies (torus, DGX, dragonfly). For a variety of workloads on a wide range of systems, the DFModel provided a mapping that predicts an average of 1.25X better performance compared to the ones measured on real systems. DFModel shows that for large language model training, dataflow architectures achieve 1.52X higher performance, 1.59X better cost efficiency, and 1.6X better power efficiency compared to non-dataflow architectures. On an industrial system with dataflow architectures, the DFModel-optimized dataflow mapping achieves a speedup of 6.13X compared to non-dataflow mappings from previous performance models such as Calculon, and 1.52X compared to a vendor provided dataflow mapping.
△ Less
Submitted 20 December, 2024;
originally announced December 2024.
-
Go With the Flow: Fast Diffusion for Gaussian Mixture Models
Authors:
George Rapakoulias,
Ali Reza Pedram,
Fengjiao Liu,
Lingjiong Zhu,
Panagiotis Tsiotras
Abstract:
Schrodinger Bridges (SBs) are diffusion processes that steer, in finite time, a given initial distribution to another final one while minimizing a suitable cost functional. Although various methods for computing SBs have recently been proposed in the literature, most of these approaches require computationally expensive training schemes, even for solving low-dimensional problems. In this work, we…
▽ More
Schrodinger Bridges (SBs) are diffusion processes that steer, in finite time, a given initial distribution to another final one while minimizing a suitable cost functional. Although various methods for computing SBs have recently been proposed in the literature, most of these approaches require computationally expensive training schemes, even for solving low-dimensional problems. In this work, we propose an analytic parametrization of a set of feasible policies for steering the distribution of a dynamical system from one Gaussian Mixture Model (GMM) to another. Instead of relying on standard non-convex optimization techniques, the optimal policy within the set can be approximated as the solution of a low-dimensional linear program whose dimension scales linearly with the number of components in each mixture. The proposed method generalizes naturally to more general classes of dynamical systems, such as controllable linear time-varying systems, enabling efficient solutions to multi-marginal momentum SB between GMMs, a challenging distribution interpolation problem. We showcase the potential of this approach in low-to-moderate dimensional problems such as image-to-image translation in the latent space of an autoencoder, learning of cellular dynamics using multi-marginal momentum SB problems, and various other examples. We also test our approach on an Entropic Optimal Transport (EOT) benchmark problem and show that it outperforms state-of-the-art methods in cases where the boundary distributions are mixture models while requiring virtually no training.
△ Less
Submitted 30 May, 2025; v1 submitted 12 December, 2024;
originally announced December 2024.
-
Retrospective: EIE: Efficient Inference Engine on Sparse and Compressed Neural Network
Authors:
Song Han,
Xingyu Liu,
Huizi Mao,
Jing Pu,
Ardavan Pedram,
Mark A. Horowitz,
William J. Dally
Abstract:
EIE proposed to accelerate pruned and compressed neural networks, exploiting weight sparsity, activation sparsity, and 4-bit weight-sharing in neural network accelerators. Since published in ISCA'16, it opened a new design space to accelerate pruned and sparse neural networks and spawned many algorithm-hardware co-designs for model compression and acceleration, both in academia and commercial AI c…
▽ More
EIE proposed to accelerate pruned and compressed neural networks, exploiting weight sparsity, activation sparsity, and 4-bit weight-sharing in neural network accelerators. Since published in ISCA'16, it opened a new design space to accelerate pruned and sparse neural networks and spawned many algorithm-hardware co-designs for model compression and acceleration, both in academia and commercial AI chips. In retrospect, we review the background of this project, summarize the pros and cons, and discuss new opportunities where pruning, sparsity, and low precision can accelerate emerging deep learning workloads.
△ Less
Submitted 15 June, 2023;
originally announced June 2023.
-
Optimal Sampling-based Motion Planning in Gaussian Belief Space for Minimum Sensing Navigation
Authors:
Vrushabh Zinage,
Ali Reza Pedram,
Takashi Tanaka
Abstract:
In this paper, we consider the motion planning problem in Gaussian belief space for minimum sensing navigation. Despite the extensive use of sampling-based algorithms and their rigorous analysis in the deterministic setting, there has been little formal analysis of the quality of their solutions returned by sampling algorithms in Gaussian belief space. This paper aims to address this lack of resea…
▽ More
In this paper, we consider the motion planning problem in Gaussian belief space for minimum sensing navigation. Despite the extensive use of sampling-based algorithms and their rigorous analysis in the deterministic setting, there has been little formal analysis of the quality of their solutions returned by sampling algorithms in Gaussian belief space. This paper aims to address this lack of research by examining the asymptotic behavior of the cost of solutions obtained from Gaussian belief space based sampling algorithms as the number of samples increases. To that end, we propose a sampling based motion planning algorithm termed Information Geometric PRM* (IG-PRM*) for generating feasible paths that minimize a weighted sum of the Euclidean and an information-theoretic cost and show that the cost of the solution that is returned is guaranteed to approach the global optimum in the limit of large number of samples. Finally, we consider an obstacle-free scenario and compute the optimal solution using the "move and sense" strategy in literature. We then verify that the cost returned by our proposed algorithm converges to this optimal solution as the number of samples increases.
△ Less
Submitted 31 May, 2023;
originally announced June 2023.
-
A Smoothing Algorithm for Minimum Sensing Path Plans in Gaussian Belief Space
Authors:
Ali Reza Pedram,
Takashi Tanaka
Abstract:
This paper explores minimum sensing navigation of robots in environments cluttered with obstacles. The general objective is to find a path plan to a goal region that requires minimal sensing effort. In [1], the information-geometric RRT* (IG-RRT*) algorithm was proposed to efficiently find such a path. However, like any stochastic sampling-based planner, the computational complexity of IG-RRT* gro…
▽ More
This paper explores minimum sensing navigation of robots in environments cluttered with obstacles. The general objective is to find a path plan to a goal region that requires minimal sensing effort. In [1], the information-geometric RRT* (IG-RRT*) algorithm was proposed to efficiently find such a path. However, like any stochastic sampling-based planner, the computational complexity of IG-RRT* grows quickly, impeding its use with a large number of nodes. To remedy this limitation, we suggest running IG-RRT* with a moderate number of nodes, and then using a smoothing algorithm to adjust the path obtained. To develop a smoothing algorithm, we explicitly formulate the minimum sensing path planning problem as an optimization problem. For this formulation, we introduce a new safety constraint to impose a bound on the probability of collision with obstacles in continuous-time, in contrast to the common discrete-time approach. The problem is amenable to solution via the convex-concave procedure (CCP). We develop a CCP algorithm for the formulated optimization and use this algorithm for path smoothing. We demonstrate the efficacy of the proposed approach through numerical simulations.
△ Less
Submitted 13 March, 2023;
originally announced March 2023.
-
Gaussian Belief Space Path Planning for Minimum Sensing Navigation
Authors:
Ali Reza Pedram,
Riku Funada,
Takashi Tanaka
Abstract:
We propose a path planning methodology for a mobile robot navigating through an obstacle-filled environment to generate a reference path that is traceable with moderate sensing efforts. The desired reference path is characterized as the shortest path in an obstacle-filled Gaussian belief manifold equipped with a novel information-geometric distance function. The distance function we introduce is s…
▽ More
We propose a path planning methodology for a mobile robot navigating through an obstacle-filled environment to generate a reference path that is traceable with moderate sensing efforts. The desired reference path is characterized as the shortest path in an obstacle-filled Gaussian belief manifold equipped with a novel information-geometric distance function. The distance function we introduce is shown to be an asymmetric quasi-pseudometric and can be interpreted as the minimum information gain required to steer the Gaussian belief. An RRT*-based numerical solution algorithm is presented to solve the formulated shortest-path problem. To gain insight into the asymptotic optimality of the proposed algorithm, we show that the considered path length function is continuous with respect to the topology of total variation. Simulation results demonstrate that the proposed method is effective in various robot navigation scenarios to reduce sensing costs, such as the required frequency of sensor measurements and the number of sensors that must be operated simultaneously.
△ Less
Submitted 7 December, 2022; v1 submitted 28 September, 2021;
originally announced September 2021.
-
Dynamic Allocation of Visual Attention for Vision-based Autonomous Navigation under Data Rate Constraints
Authors:
Ali Reza Pedram,
Riku Funada,
Takashi Tanaka
Abstract:
This paper considers the problem of task-dependent (top-down) attention allocation for vision-based autonomous navigation using known landmarks. Unlike the existing paradigm in which landmark selection is formulated as a combinatorial optimization problem, we model it as a resource allocation problem where the decision-maker (DM) is granted extra freedom to control the degree of attention to each…
▽ More
This paper considers the problem of task-dependent (top-down) attention allocation for vision-based autonomous navigation using known landmarks. Unlike the existing paradigm in which landmark selection is formulated as a combinatorial optimization problem, we model it as a resource allocation problem where the decision-maker (DM) is granted extra freedom to control the degree of attention to each landmark. The total resource available to DM is expressed in terms of the capacity limit of the in-take information flow, which is quantified by the directed information from the state of the environment to the DM's observation. We consider a receding horizon implementation of such a controlled sensing scheme in the Linear-Quadratic-Gaussian (LQG) regime. The convex-concave procedure is applied in each time step, whose time complexity is shown to be linear in the horizon length if the alternating direction method of multipliers (ADMM) is used. Numerical studies show that the proposed formulation is sparsity-promoting in the sense that it tends to allocate zero data rate to uninformative landmarks.
△ Less
Submitted 27 September, 2021;
originally announced September 2021.
-
Griffin: Rethinking Sparse Optimization for Deep Learning Architectures
Authors:
Jong Hoon Shin,
Ali Shafiee,
Ardavan Pedram,
Hamzah Abdel-Aziz,
Ling Li,
Joseph Hassoun
Abstract:
This paper examines the design space trade-offs of DNNs accelerators aiming to achieve competitive performance and efficiency metrics for all four combinations of dense or sparse activation/weight tensors. To do so, we systematically examine the overheads of supporting sparsity on top of an optimized dense core. These overheads are modeled based on parameters that indicate how a multiplier can bor…
▽ More
This paper examines the design space trade-offs of DNNs accelerators aiming to achieve competitive performance and efficiency metrics for all four combinations of dense or sparse activation/weight tensors. To do so, we systematically examine the overheads of supporting sparsity on top of an optimized dense core. These overheads are modeled based on parameters that indicate how a multiplier can borrow a nonzero operation from the neighboring multipliers or future cycles. As a result of this exploration, we identify a few promising designs that perform better than prior work. Our findings suggest that even the best design targeting dual sparsity yields a 20%-30% drop in power efficiency when performing on single sparse models, i.e., those with only sparse weight or sparse activation tensors. We found that one can reuse resources of the same core to maintain high performance and efficiency when running single sparsity or dense models. We call this hybrid architecture Griffin. Griffin is 1.2, 3.0, 3.1, and 1.4X more power-efficient than state-of-the-art sparse architectures, for dense, weight-only sparse, activation-only sparse, and dual sparse models, respectively.
△ Less
Submitted 1 November, 2021; v1 submitted 27 July, 2021;
originally announced July 2021.
-
Optimized Data Rate Allocation for Dynamic Sensor Fusion over Resource Constrained Communication Networks
Authors:
Hyunho Jung,
Ali Reza Pedram,
Travis Craig Cuvelier,
Takashi Tanaka
Abstract:
This paper presents a new method to solve a dynamic sensor fusion problem. We consider a large number of remote sensors which measure a common Gauss-Markov process and encoders that transmit the measurements to a data fusion center through the resource restricted communication network. The proposed approach heuristically minimizes a weighted sum of communication costs subject to a constraint on th…
▽ More
This paper presents a new method to solve a dynamic sensor fusion problem. We consider a large number of remote sensors which measure a common Gauss-Markov process and encoders that transmit the measurements to a data fusion center through the resource restricted communication network. The proposed approach heuristically minimizes a weighted sum of communication costs subject to a constraint on the state estimation error at the fusion center. The communication costs are quantified as the expected bitrates from the sensors to the fusion center. We show that the problem as formulated is a difference-of-convex program and apply the convex-concave procedure (CCP) to obtain a heuristic solution. We consider a 1D heat transfer model and 2D target tracking by a drone swarm model for numerical studies. Through these simulations, we observe that our proposed approach has a tendency to assign zero data rate to unnecessary sensors indicating that our approach is sparsity promoting, and an effective sensor selection heuristic.
△ Less
Submitted 18 October, 2022; v1 submitted 7 June, 2021;
originally announced June 2021.
-
Rethinking Floating Point Overheads for Mixed Precision DNN Accelerators
Authors:
Hamzah Abdel-Aziz,
Ali Shafiee,
Jong Hoon Shin,
Ardavan Pedram,
Joseph H. Hassoun
Abstract:
In this paper, we propose a mixed-precision convolution unit architecture which supports different integer and floating point (FP) precisions. The proposed architecture is based on low-bit inner product units and realizes higher precision based on temporal decomposition. We illustrate how to integrate FP computations on integer-based architecture and evaluate overheads incurred by FP arithmetic su…
▽ More
In this paper, we propose a mixed-precision convolution unit architecture which supports different integer and floating point (FP) precisions. The proposed architecture is based on low-bit inner product units and realizes higher precision based on temporal decomposition. We illustrate how to integrate FP computations on integer-based architecture and evaluate overheads incurred by FP arithmetic support. We argue that alignment and addition overhead for FP inner product can be significant since the maximum exponent difference could be up to 58 bits, which results into a large alignment logic. To address this issue, we illustrate empirically that no more than 26-bitproduct bits are required and up to 8-bit of alignment is sufficient in most inference cases. We present novel optimizations based on the above observations to reduce the FP arithmetic hardware overheads. Our empirical results, based on simulation and hardware implementation, show significant reduction in FP16 overhead. Over typical mixed precision implementation, the proposed architecture achieves area improvements of up to 25% in TFLOPS/mm2and up to 46% in TOPS/mm2with power efficiency improvements of up to 40% in TFLOPS/Wand up to 63% in TOPS/W.
△ Less
Submitted 27 January, 2021;
originally announced January 2021.
-
Closed-loop Parameter Identification of Linear Dynamical Systems through the Lens of Feedback Channel Coding Theory
Authors:
Ali Reza Pedram,
Takashi Tanaka
Abstract:
This paper considers the problem of closed-loop identification of linear scalar systems with Gaussian process noise, where the system input is determined by a deterministic state feedback policy. The regularized least-square estimate (LSE) algorithm is adopted, seeking to find the best estimate of unknown model parameters based on noiseless measurements of the state. We are interested in the funda…
▽ More
This paper considers the problem of closed-loop identification of linear scalar systems with Gaussian process noise, where the system input is determined by a deterministic state feedback policy. The regularized least-square estimate (LSE) algorithm is adopted, seeking to find the best estimate of unknown model parameters based on noiseless measurements of the state. We are interested in the fundamental limitation of the rate at which unknown parameters can be learned, in the sense of the D-optimality scalarization criterion subject to a quadratic control cost. We first establish a novel connection between a closed-loop identification problem of interest and a channel coding problem involving an additive white Gaussian noise (AWGN) channel with feedback and a certain structural constraint. Based on this connection, we show that the learning rate is fundamentally upper bounded by the capacity of the corresponding AWGN channel. Although the optimal design of the feedback policy remains challenging, we derive conditions under which the upper bound is achieved. Finally, we show that the obtained upper bound implies that super-linear convergence is unattainable for any choice of the policy.
△ Less
Submitted 27 March, 2020;
originally announced March 2020.
-
Rationally Inattentive Path-Planning via RRT*
Authors:
Jeb Stefan,
Ali Reza Pedram,
Riku Funada,
Takashi Tanaka
Abstract:
We consider a path-planning scenario for a mobile robot traveling in a configuration space with obstacles under the presence of stochastic disturbances. A novel path length metric is proposed on the uncertain configuration space and then integrated with the existing RRT* algorithm. The metric is a weighted sum of two terms which capture both the Euclidean distance traveled by the robot and the per…
▽ More
We consider a path-planning scenario for a mobile robot traveling in a configuration space with obstacles under the presence of stochastic disturbances. A novel path length metric is proposed on the uncertain configuration space and then integrated with the existing RRT* algorithm. The metric is a weighted sum of two terms which capture both the Euclidean distance traveled by the robot and the perception cost, i.e., the amount of information the robot must perceive about the environment to follow the path safely. The continuity of the path length function with respect to the topology of the total variation metric is shown and the optimality of the Rationally Inattentive RRT* algorithm is discussed. Three numerical studies are presented which display the utility of the new algorithm.
△ Less
Submitted 27 February, 2020;
originally announced February 2020.
-
Temporal Segmentation of Surgical Sub-tasks through Deep Learning with Multiple Data Sources
Authors:
Yidan Qin,
Sahba Aghajani Pedram,
Seyedshams Feyzabadi,
Max Allan,
A. Jonathan McLeod,
Joel W. Burdick,
Mahdi Azizian
Abstract:
Many tasks in robot-assisted surgeries (RAS) can be represented by finite-state machines (FSMs), where each state represents either an action (such as picking up a needle) or an observation (such as bleeding). A crucial step towards the automation of such surgical tasks is the temporal perception of the current surgical scene, which requires a real-time estimation of the states in the FSMs. The ob…
▽ More
Many tasks in robot-assisted surgeries (RAS) can be represented by finite-state machines (FSMs), where each state represents either an action (such as picking up a needle) or an observation (such as bleeding). A crucial step towards the automation of such surgical tasks is the temporal perception of the current surgical scene, which requires a real-time estimation of the states in the FSMs. The objective of this work is to estimate the current state of the surgical task based on the actions performed or events occurred as the task progresses. We propose Fusion-KVE, a unified surgical state estimation model that incorporates multiple data sources including the Kinematics, Vision, and system Events. Additionally, we examine the strengths and weaknesses of different state estimation models in segmenting states with different representative features or levels of granularity. We evaluate our model on the JHU-ISI Gesture and Skill Assessment Working Set (JIGSAWS), as well as a more complex dataset involving robotic intra-operative ultrasound (RIOUS) imaging, created using the da Vinci Xi surgical system. Our model achieves a superior frame-wise state estimation accuracy up to 89.4%, which improves the state-of-the-art surgical state estimation models in both JIGSAWS suturing dataset and our RIOUS dataset.
△ Less
Submitted 7 February, 2020;
originally announced February 2020.
-
Campfire: Compressible, Regularization-Free, Structured Sparse Training for Hardware Accelerators
Authors:
Noah Gamboa,
Kais Kudrolli,
Anand Dhoot,
Ardavan Pedram
Abstract:
This paper studies structured sparse training of CNNs with a gradual pruning technique that leads to fixed, sparse weight matrices after a set number of epochs. We simplify the structure of the enforced sparsity so that it reduces overhead caused by regularization. The proposed training methodology Campfire explores pruning at granularities within a convolutional kernel and filter.
We study vari…
▽ More
This paper studies structured sparse training of CNNs with a gradual pruning technique that leads to fixed, sparse weight matrices after a set number of epochs. We simplify the structure of the enforced sparsity so that it reduces overhead caused by regularization. The proposed training methodology Campfire explores pruning at granularities within a convolutional kernel and filter.
We study various tradeoffs with respect to pruning duration, level of sparsity, and learning rate configuration. We show that our method creates a sparse version of ResNet-50 and ResNet-50 v1.5 on full ImageNet while remaining within a negligible <1% margin of accuracy loss. To ensure that this type of sparse training does not harm the robustness of the network, we also demonstrate how the network behaves in the presence of adversarial attacks. Our results show that with 70% target sparsity, over 75% top-1 accuracy is achievable.
△ Less
Submitted 12 January, 2020; v1 submitted 9 January, 2020;
originally announced January 2020.
-
Toward Synergic Learning for Autonomous Manipulation of Deformable Tissues via Surgical Robots: An Approximate Q-Learning Approach
Authors:
Sahba Aghajani Pedram,
Peter Walker Ferguson,
Changyeob Shin,
Ankur Mehta,
Erik P. Dutson,
Farshid Alambeigi,
Jacob Rosen
Abstract:
In this paper, we present a synergic learning algorithm to address the task of indirect manipulation of an unknown deformable tissue. Tissue manipulation is a common yet challenging task in various surgical interventions, which makes it a good candidate for robotic automation. We propose using a linear approximate Q-learning method in which human knowledge contributes to selecting useful yet simpl…
▽ More
In this paper, we present a synergic learning algorithm to address the task of indirect manipulation of an unknown deformable tissue. Tissue manipulation is a common yet challenging task in various surgical interventions, which makes it a good candidate for robotic automation. We propose using a linear approximate Q-learning method in which human knowledge contributes to selecting useful yet simple features of tissue manipulation while the algorithm learns to take optimal actions and accomplish the task. The algorithm is implemented and evaluated on a simulation using the OpenCV and CHAI3D libraries. Successful simulation results for four different configurations which are based on realistic tissue manipulation scenarios are presented. Results indicate that with a careful selection of relatively simple and intuitive features, the developed Q-learning algorithm can successfully learn an optimal policy without any prior knowledge of tissue dynamics or camera intrinsic/extrinsic calibration parameters.
△ Less
Submitted 11 October, 2019; v1 submitted 8 October, 2019;
originally announced October 2019.
-
Bidirectional Information Flow and the Roles of Privacy Masks in Cloud-Based Control
Authors:
Ali Reza Pedram,
Takashi Tanaka,
Matthew Hale
Abstract:
We consider a cloud-based control architecture for a linear plant with Gaussian process noise, where the state of the plant contains a client's sensitive information. We assume that the cloud tries to estimate the state while executing a designated control algorithm. The mutual information between the client's actual state and the cloud's estimate is adopted as a measure of privacy loss. We discus…
▽ More
We consider a cloud-based control architecture for a linear plant with Gaussian process noise, where the state of the plant contains a client's sensitive information. We assume that the cloud tries to estimate the state while executing a designated control algorithm. The mutual information between the client's actual state and the cloud's estimate is adopted as a measure of privacy loss. We discuss the necessity of uplink and downlink privacy masks. After observing that privacy is not necessarily a monotone function of the noise levels of privacy masks, we discuss the joint design procedure for uplink and downlink privacy masks. Finally, the trade-off between privacy and control performance is explored.
△ Less
Submitted 17 May, 2019;
originally announced May 2019.
-
Autonomous Tissue Manipulation via Surgical Robot Using Learning Based Model Predictive Control
Authors:
Changyeob Shin,
Peter Walker Ferguson,
Sahba Aghajani Pedram,
Ji Ma,
Erik P. Dutson,
Jacob Rosen
Abstract:
Tissue manipulation is a frequently used fundamental subtask of any surgical procedures, and in some cases it may require the involvement of a surgeon's assistant. The complex dynamics of soft tissue as an unstructured environment is one of the main challenges in any attempt to automate the manipulation of it via a surgical robotic system. Two AI learning based model predictive control algorithms…
▽ More
Tissue manipulation is a frequently used fundamental subtask of any surgical procedures, and in some cases it may require the involvement of a surgeon's assistant. The complex dynamics of soft tissue as an unstructured environment is one of the main challenges in any attempt to automate the manipulation of it via a surgical robotic system. Two AI learning based model predictive control algorithms using vision strategies are proposed and studied: (1) reinforcement learning and (2) learning from demonstration. Comparison of the performance of these AI algorithms in a simulation setting indicated that the learning from demonstration algorithm can boost the learning policy by initializing the predicted dynamics with given demonstrations. Furthermore, the learning from demonstration algorithm is implemented on a Raven IV surgical robotic system and successfully demonstrated feasibility of the proposed algorithm using an experimental approach. This study is part of a profound vision in which the role of a surgeon will be redefined as a pure decision maker whereas the vast majority of the manipulation will be conducted autonomously by a surgical robotic system. A supplementary video can be found at: http://bionics.seas.ucla.edu/research/surgeryproject17.html
△ Less
Submitted 2 March, 2019; v1 submitted 4 February, 2019;
originally announced February 2019.
-
Optimal Needle Diameter, Shape, and Path in Autonomous Suturing
Authors:
S. Aghajani Pedram,
P. Ferguson,
J. Ma,
E. Dutson,
J. Rosen
Abstract:
Needle shape, diameter, and path are critical parameters that directly affect suture depth and tissue trauma in autonomous suturing. This paper presents an optimization-based approach to specify these parameters. Given clinical suturing guidelines, a kinematic model of needle-tissue interaction was developed to quantify suture parameters and constraints. The model was further used to formulate con…
▽ More
Needle shape, diameter, and path are critical parameters that directly affect suture depth and tissue trauma in autonomous suturing. This paper presents an optimization-based approach to specify these parameters. Given clinical suturing guidelines, a kinematic model of needle-tissue interaction was developed to quantify suture parameters and constraints. The model was further used to formulate constant curvature needle path planning as a nonlinear optimization problem. The optimization results were confirmed experimentally with the Raven II surgical system. The proposed needle path planning algorithm guarantees minimal tissue trauma and complies with a wide range of suturing requirements.
△ Less
Submitted 14 January, 2019;
originally announced January 2019.
-
CATERPILLAR: Coarse Grain Reconfigurable Architecture for Accelerating the Training of Deep Neural Networks
Authors:
Yuanfang Li,
Ardavan Pedram
Abstract:
Accelerating the inference of a trained DNN is a well studied subject. In this paper we switch the focus to the training of DNNs. The training phase is compute intensive, demands complicated data communication, and contains multiple levels of data dependencies and parallelism. This paper presents an algorithm/architecture space exploration of efficient accelerators to achieve better network conver…
▽ More
Accelerating the inference of a trained DNN is a well studied subject. In this paper we switch the focus to the training of DNNs. The training phase is compute intensive, demands complicated data communication, and contains multiple levels of data dependencies and parallelism. This paper presents an algorithm/architecture space exploration of efficient accelerators to achieve better network convergence rates and higher energy efficiency for training DNNs. We further demonstrate that an architecture with hierarchical support for collective communication semantics provides flexibility in training various networks performing both stochastic and batched gradient descent based techniques. Our results suggest that smaller networks favor non-batched techniques while performance for larger networks is higher using batched operations. At 45nm technology, CATERPILLAR achieves performance efficiencies of 177 GFLOPS/W at over 80% utilization for SGD training on small networks and 211 GFLOPS/W at over 90% utilization for pipelined SGD/CP training on larger networks using a total area of 103.2 mm$^2$ and 178.9 mm$^2$ respectively.
△ Less
Submitted 8 June, 2017; v1 submitted 1 June, 2017;
originally announced June 2017.
-
A Systematic Approach to Blocking Convolutional Neural Networks
Authors:
Xuan Yang,
Jing Pu,
Blaine Burton Rister,
Nikhil Bhagdikar,
Stephen Richardson,
Shahar Kvatinsky,
Jonathan Ragan-Kelley,
Ardavan Pedram,
Mark Horowitz
Abstract:
Convolutional Neural Networks (CNNs) are the state of the art solution for many computer vision problems, and many researchers have explored optimized implementations. Most implementations heuristically block the computation to deal with the large data sizes and high data reuse of CNNs. This paper explores how to block CNN computations for memory locality by creating an analytical model for CNN-li…
▽ More
Convolutional Neural Networks (CNNs) are the state of the art solution for many computer vision problems, and many researchers have explored optimized implementations. Most implementations heuristically block the computation to deal with the large data sizes and high data reuse of CNNs. This paper explores how to block CNN computations for memory locality by creating an analytical model for CNN-like loop nests. Using this model we automatically derive optimized blockings for common networks that improve the energy efficiency of custom hardware implementations by up to an order of magnitude. Compared to traditional CNN CPU implementations based on highly-tuned, hand-optimized BLAS libraries,our x86 programs implementing the optimal blocking reduce the number of memory accesses by up to 90%.
△ Less
Submitted 14 June, 2016;
originally announced June 2016.
-
Dark Memory and Accelerator-Rich System Optimization in the Dark Silicon Era
Authors:
Ardavan Pedram,
Stephen Richardson,
Sameh Galal,
Shahar Kvatinsky,
Mark A. Horowitz
Abstract:
The key challenge to improving performance in the age of Dark Silicon is how to leverage transistors when they cannot all be used at the same time. In modern SOCs, these transistors are often used to create specialized accelerators which improve energy efficiency for some applications by 10-1000X. While this might seem like the magic bullet we need, for most CPU applications more energy is dissipa…
▽ More
The key challenge to improving performance in the age of Dark Silicon is how to leverage transistors when they cannot all be used at the same time. In modern SOCs, these transistors are often used to create specialized accelerators which improve energy efficiency for some applications by 10-1000X. While this might seem like the magic bullet we need, for most CPU applications more energy is dissipated in the memory system than in the processor: these large gains in efficiency are only possible if the DRAM and memory hierarchy are mostly idle. We refer to this desirable state as Dark Memory, and it only occurs for applications with an extreme form of locality.
To show our findings, we introduce Pareto curves in the energy/op and mm$^2$/(ops/s) metric space for compute units, accelerators, and on-chip memory/interconnect. These Pareto curves allow us to solve the power, performance, area constrained optimization problem to determine which accelerators should be used, and how to set their design parameters to optimize the system. This analysis shows that memory accesses create a floor to the achievable energy-per-op. Thus high performance requires Dark Memory, which in turn requires co-design of the algorithm for parallelism and locality, with the hardware.
△ Less
Submitted 26 April, 2016; v1 submitted 12 February, 2016;
originally announced February 2016.
-
EIE: Efficient Inference Engine on Compressed Deep Neural Network
Authors:
Song Han,
Xingyu Liu,
Huizi Mao,
Jing Pu,
Ardavan Pedram,
Mark A. Horowitz,
William J. Dally
Abstract:
State-of-the-art deep neural networks (DNNs) have hundreds of millions of connections and are both computationally and memory intensive, making them difficult to deploy on embedded systems with limited hardware resources and power budgets. While custom hardware helps the computation, fetching weights from DRAM is two orders of magnitude more expensive than ALU operations, and dominates the require…
▽ More
State-of-the-art deep neural networks (DNNs) have hundreds of millions of connections and are both computationally and memory intensive, making them difficult to deploy on embedded systems with limited hardware resources and power budgets. While custom hardware helps the computation, fetching weights from DRAM is two orders of magnitude more expensive than ALU operations, and dominates the required power.
Previously proposed 'Deep Compression' makes it possible to fit large DNNs (AlexNet and VGGNet) fully in on-chip SRAM. This compression is achieved by pruning the redundant connections and having multiple connections share the same weight. We propose an energy efficient inference engine (EIE) that performs inference on this compressed network model and accelerates the resulting sparse matrix-vector multiplication with weight sharing. Going from DRAM to SRAM gives EIE 120x energy saving; Exploiting sparsity saves 10x; Weight sharing gives 8x; Skipping zero activations from ReLU saves another 3x. Evaluated on nine DNN benchmarks, EIE is 189x and 13x faster when compared to CPU and GPU implementations of the same DNN without compression. EIE has a processing power of 102GOPS/s working directly on a compressed network, corresponding to 3TOPS/s on an uncompressed network, and processes FC layers of AlexNet at 1.88x10^4 frames/sec with a power dissipation of only 600mW. It is 24,000x and 3,400x more energy efficient than a CPU and GPU respectively. Compared with DaDianNao, EIE has 2.9x, 19x and 3x better throughput, energy efficiency and area efficiency.
△ Less
Submitted 3 May, 2016; v1 submitted 3 February, 2016;
originally announced February 2016.