-
Knowledge Insulating Vision-Language-Action Models: Train Fast, Run Fast, Generalize Better
Authors:
Danny Driess,
Jost Tobias Springenberg,
Brian Ichter,
Lili Yu,
Adrian Li-Bell,
Karl Pertsch,
Allen Z. Ren,
Homer Walke,
Quan Vuong,
Lucy Xiaoyang Shi,
Sergey Levine
Abstract:
Vision-language-action (VLA) models provide a powerful approach to training control policies for physical systems, such as robots, by combining end-to-end learning with transfer of semantic knowledge from web-scale vision-language model (VLM) training. However, the constraints of real-time control are often at odds with the design of VLMs: the most powerful VLMs have tens or hundreds of billions o…
▽ More
Vision-language-action (VLA) models provide a powerful approach to training control policies for physical systems, such as robots, by combining end-to-end learning with transfer of semantic knowledge from web-scale vision-language model (VLM) training. However, the constraints of real-time control are often at odds with the design of VLMs: the most powerful VLMs have tens or hundreds of billions of parameters, presenting an obstacle to real-time inference, and operate on discrete tokens rather than the continuous-valued outputs that are required for controlling robots. To address this challenge, recent VLA models have used specialized modules for efficient continuous control, such as action experts or continuous output heads, which typically require adding new untrained parameters to the pretrained VLM backbone. While these modules improve real-time and control capabilities, it remains an open question whether they preserve or degrade the semantic knowledge contained in the pretrained VLM, and what effect they have on the VLA training dynamics. In this paper, we study this question in the context of VLAs that include a continuous diffusion or flow matching action expert, showing that naively including such experts significantly harms both training speed and knowledge transfer. We provide an extensive analysis of various design choices, their impact on performance and knowledge transfer, and propose a technique for insulating the VLM backbone during VLA training that mitigates this issue. Videos are available at https://pi.website/research/knowledge_insulation.
△ Less
Submitted 29 May, 2025;
originally announced May 2025.
-
InstructSAM: A Training-Free Framework for Instruction-Oriented Remote Sensing Object Recognition
Authors:
Yijie Zheng,
Weijie Wu,
Qingyun Li,
Xuehui Wang,
Xu Zhou,
Aiai Ren,
Jun Shen,
Long Zhao,
Guoqing Li,
Xue Yang
Abstract:
Language-Guided object recognition in remote sensing imagery is crucial for large-scale mapping and automated data annotation. However, existing open-vocabulary and visual grounding methods rely on explicit category cues, limiting their ability to handle complex or implicit queries that require advanced reasoning. To address this issue, we introduce a new suite of tasks, including Instruction-Orie…
▽ More
Language-Guided object recognition in remote sensing imagery is crucial for large-scale mapping and automated data annotation. However, existing open-vocabulary and visual grounding methods rely on explicit category cues, limiting their ability to handle complex or implicit queries that require advanced reasoning. To address this issue, we introduce a new suite of tasks, including Instruction-Oriented Object Counting, Detection, and Segmentation (InstructCDS), covering open-vocabulary, open-ended, and open-subclass scenarios. We further present EarthInstruct, the first InstructCDS benchmark for earth observation. It is constructed from two diverse remote sensing datasets with varying spatial resolutions and annotation rules across 20 categories, necessitating models to interpret dataset-specific instructions. Given the scarcity of semantically rich labeled data in remote sensing, we propose InstructSAM, a training-free framework for instruction-driven object recognition. InstructSAM leverages large vision-language models to interpret user instructions and estimate object counts, employs SAM2 for mask proposal, and formulates mask-label assignment as a binary integer programming problem. By integrating semantic similarity with counting constraints, InstructSAM efficiently assigns categories to predicted masks without relying on confidence thresholds. Experiments demonstrate that InstructSAM matches or surpasses specialized baselines across multiple tasks while maintaining near-constant inference time regardless of object count, reducing output tokens by 89% and overall runtime by over 32% compared to direct generation approaches. We believe the contributions of the proposed tasks, benchmark, and effective approach will advance future research in developing versatile object recognition systems.
△ Less
Submitted 21 May, 2025;
originally announced May 2025.
-
Guiding Data Collection via Factored Scaling Curves
Authors:
Lihan Zha,
Apurva Badithela,
Michael Zhang,
Justin Lidard,
Jeremy Bao,
Emily Zhou,
David Snyder,
Allen Z. Ren,
Dhruv Shah,
Anirudha Majumdar
Abstract:
Generalist imitation learning policies trained on large datasets show great promise for solving diverse manipulation tasks. However, to ensure generalization to different conditions, policies need to be trained with data collected across a large set of environmental factor variations (e.g., camera pose, table height, distractors) $-$ a prohibitively expensive undertaking, if done exhaustively. We…
▽ More
Generalist imitation learning policies trained on large datasets show great promise for solving diverse manipulation tasks. However, to ensure generalization to different conditions, policies need to be trained with data collected across a large set of environmental factor variations (e.g., camera pose, table height, distractors) $-$ a prohibitively expensive undertaking, if done exhaustively. We introduce a principled method for deciding what data to collect and how much to collect for each factor by constructing factored scaling curves (FSC), which quantify how policy performance varies as data scales along individual or paired factors. These curves enable targeted data acquisition for the most influential factor combinations within a given budget. We evaluate the proposed method through extensive simulated and real-world experiments, across both training-from-scratch and fine-tuning settings, and show that it boosts success rates in real-world tasks in new environments by up to 26% over existing data-collection strategies. We further demonstrate how factored scaling curves can effectively guide data collection using an offline metric, without requiring real-world evaluation at scale.
△ Less
Submitted 12 May, 2025;
originally announced May 2025.
-
Optimization over Trained (and Sparse) Neural Networks: A Surrogate within a Surrogate
Authors:
Hung Pham,
Aiden Ren,
Ibrahim Tahir,
Jiatai Tong,
Thiago Serra
Abstract:
We can approximate a constraint or an objective function that is uncertain or nonlinear with a neural network that we embed in the optimization model. This approach, which is known as constraint learning, faces the challenge that optimization models with neural network surrogates are harder to solve. Such difficulties have motivated studies on model reformulation, specialized optimization algorith…
▽ More
We can approximate a constraint or an objective function that is uncertain or nonlinear with a neural network that we embed in the optimization model. This approach, which is known as constraint learning, faces the challenge that optimization models with neural network surrogates are harder to solve. Such difficulties have motivated studies on model reformulation, specialized optimization algorithms, and - to a lesser extent - pruning of the embedded networks. In this work, we double down on the use of surrogates by applying network pruning to produce a surrogate of the neural network itself. In the context of using a Mixed-Integer Linear Programming (MILP) solver to verify neural networks, we obtained faster adversarial perturbations for dense neural networks by using sparse surrogates, especially - and surprisingly - if not taking the time to finetune the sparse network to make up for the loss in accuracy. In other words, we show that a pruned network with bad classification performance can still be a good - and more efficient - surrogate.
△ Less
Submitted 4 May, 2025;
originally announced May 2025.
-
$π_{0.5}$: a Vision-Language-Action Model with Open-World Generalization
Authors:
Physical Intelligence,
Kevin Black,
Noah Brown,
James Darpinian,
Karan Dhabalia,
Danny Driess,
Adnan Esmail,
Michael Equi,
Chelsea Finn,
Niccolo Fusai,
Manuel Y. Galliker,
Dibya Ghosh,
Lachy Groom,
Karol Hausman,
Brian Ichter,
Szymon Jakubczak,
Tim Jones,
Liyiming Ke,
Devin LeBlanc,
Sergey Levine,
Adrian Li-Bell,
Mohith Mothukuri,
Suraj Nair,
Karl Pertsch,
Allen Z. Ren
, et al. (11 additional authors not shown)
Abstract:
In order for robots to be useful, they must perform practically relevant tasks in the real world, outside of the lab. While vision-language-action (VLA) models have demonstrated impressive results for end-to-end robot control, it remains an open question how far such models can generalize in the wild. We describe $π_{0.5}$, a new model based on $π_{0}$ that uses co-training on heterogeneous tasks…
▽ More
In order for robots to be useful, they must perform practically relevant tasks in the real world, outside of the lab. While vision-language-action (VLA) models have demonstrated impressive results for end-to-end robot control, it remains an open question how far such models can generalize in the wild. We describe $π_{0.5}$, a new model based on $π_{0}$ that uses co-training on heterogeneous tasks to enable broad generalization. $π_{0.5}$\ uses data from multiple robots, high-level semantic prediction, web data, and other sources to enable broadly generalizable real-world robotic manipulation. Our system uses a combination of co-training and hybrid multi-modal examples that combine image observations, language commands, object detections, semantic subtask prediction, and low-level actions. Our experiments show that this kind of knowledge transfer is essential for effective generalization, and we demonstrate for the first time that an end-to-end learning-enabled robotic system can perform long-horizon and dexterous manipulation skills, such as cleaning a kitchen or bedroom, in entirely new homes.
△ Less
Submitted 22 April, 2025;
originally announced April 2025.
-
Endo-TTAP: Robust Endoscopic Tissue Tracking via Multi-Facet Guided Attention and Hybrid Flow-point Supervision
Authors:
Rulin Zhou,
Wenlong He,
An Wang,
Qiqi Yao,
Haijun Hu,
Jiankun Wang,
Xi Zhang an Hongliang Ren
Abstract:
Accurate tissue point tracking in endoscopic videos is critical for robotic-assisted surgical navigation and scene understanding, but remains challenging due to complex deformations, instrument occlusion, and the scarcity of dense trajectory annotations. Existing methods struggle with long-term tracking under these conditions due to limited feature utilization and annotation dependence. We present…
▽ More
Accurate tissue point tracking in endoscopic videos is critical for robotic-assisted surgical navigation and scene understanding, but remains challenging due to complex deformations, instrument occlusion, and the scarcity of dense trajectory annotations. Existing methods struggle with long-term tracking under these conditions due to limited feature utilization and annotation dependence. We present Endo-TTAP, a novel framework addressing these challenges through: (1) A Multi-Facet Guided Attention (MFGA) module that synergizes multi-scale flow dynamics, DINOv2 semantic embeddings, and explicit motion patterns to jointly predict point positions with uncertainty and occlusion awareness; (2) A two-stage curriculum learning strategy employing an Auxiliary Curriculum Adapter (ACA) for progressive initialization and hybrid supervision. Stage I utilizes synthetic data with optical flow ground truth for uncertainty-occlusion regularization, while Stage II combines unsupervised flow consistency and semi-supervised learning with refined pseudo-labels from off-the-shelf trackers. Extensive validation on two MICCAI Challenge datasets and our collected dataset demonstrates that Endo-TTAP achieves state-of-the-art performance in tissue point tracking, particularly in scenarios characterized by complex endoscopic conditions. The source code and dataset will be available at https://anonymous.4open.science/r/Endo-TTAP-36E5.
△ Less
Submitted 28 March, 2025;
originally announced March 2025.
-
A Cutting Mechanics-based Machine Learning Modeling Method to Discover Governing Equations of Machining Dynamics
Authors:
Alisa Ren,
Mason Ma,
Jiajie Wu,
Jaydeep Karandikar,
Chris Tyler,
Tony Shi,
Tony Schmitz
Abstract:
This paper proposes a cutting mechanics-based machine learning (CMML) modeling method to discover governing equations of machining dynamics. The main idea of CMML design is to integrate existing physics in cutting mechanics and unknown physics in data to achieve automated model discovery, with the potential to advance machining modeling. Based on existing physics in cutting mechanics, CMML first e…
▽ More
This paper proposes a cutting mechanics-based machine learning (CMML) modeling method to discover governing equations of machining dynamics. The main idea of CMML design is to integrate existing physics in cutting mechanics and unknown physics in data to achieve automated model discovery, with the potential to advance machining modeling. Based on existing physics in cutting mechanics, CMML first establishes a general modeling structure governing machining dynamics, that is represented by a set of unknown differential algebraic equations. CMML can therefore achieve data-driven discovery of these unknown equations through effective cutting mechanics-based nonlinear learning function space design and discrete optimization-based learning algorithm. Experimentally verified time domain simulation of milling is used to validate the proposed modeling method. Numerical results show CMML can discover the exact milling dynamics models with process damping and edge force from noisy data. This indicates that CMML has the potential to be used for advancing machining modeling in practice with the development of effective metrology systems.
△ Less
Submitted 20 January, 2025;
originally announced January 2025.
-
A Survey on Uncertainty Quantification of Large Language Models: Taxonomy, Open Research Challenges, and Future Directions
Authors:
Ola Shorinwa,
Zhiting Mei,
Justin Lidard,
Allen Z. Ren,
Anirudha Majumdar
Abstract:
The remarkable performance of large language models (LLMs) in content generation, coding, and common-sense reasoning has spurred widespread integration into many facets of society. However, integration of LLMs raises valid questions on their reliability and trustworthiness, given their propensity to generate hallucinations: plausible, factually-incorrect responses, which are expressed with strikin…
▽ More
The remarkable performance of large language models (LLMs) in content generation, coding, and common-sense reasoning has spurred widespread integration into many facets of society. However, integration of LLMs raises valid questions on their reliability and trustworthiness, given their propensity to generate hallucinations: plausible, factually-incorrect responses, which are expressed with striking confidence. Previous work has shown that hallucinations and other non-factual responses generated by LLMs can be detected by examining the uncertainty of the LLM in its response to the pertinent prompt, driving significant research efforts devoted to quantifying the uncertainty of LLMs. This survey seeks to provide an extensive review of existing uncertainty quantification methods for LLMs, identifying their salient features, along with their strengths and weaknesses. We present existing methods within a relevant taxonomy, unifying ostensibly disparate methods to aid understanding of the state of the art. Furthermore, we highlight applications of uncertainty quantification methods for LLMs, spanning chatbot and textual applications to embodied artificial intelligence applications in robotics. We conclude with open research challenges in uncertainty quantification of LLMs, seeking to motivate future research.
△ Less
Submitted 1 July, 2025; v1 submitted 7 December, 2024;
originally announced December 2024.
-
Thinking Forward and Backward: Effective Backward Planning with Large Language Models
Authors:
Allen Z. Ren,
Brian Ichter,
Anirudha Majumdar
Abstract:
Large language models (LLMs) have exhibited remarkable reasoning and planning capabilities. Most prior work in this area has used LLMs to reason through steps from an initial to a goal state or criterion, thereby effectively reasoning in a forward direction. Nonetheless, many planning problems exhibit an inherent asymmetry such that planning backward from the goal is significantly easier -- for ex…
▽ More
Large language models (LLMs) have exhibited remarkable reasoning and planning capabilities. Most prior work in this area has used LLMs to reason through steps from an initial to a goal state or criterion, thereby effectively reasoning in a forward direction. Nonetheless, many planning problems exhibit an inherent asymmetry such that planning backward from the goal is significantly easier -- for example, if there are bottlenecks close to the goal. We take inspiration from this observation and demonstrate that this bias holds for LLM planning as well: planning performance in one direction correlates with the planning complexity of the problem in that direction. However, our experiments also reveal systematic biases which lead to poor planning in the backward direction. With this knowledge, we propose a backward planning algorithm for LLMs that first flips the problem and then plans forward in the flipped problem. This helps avoid the backward bias, generate more diverse candidate plans, and exploit asymmetries between the forward and backward directions in planning problems -- we find that combining planning in both directions with self-verification improves the overall planning success rates by 4-24% in three planning domains.
△ Less
Submitted 3 November, 2024;
originally announced November 2024.
-
Run-time Observation Interventions Make Vision-Language-Action Models More Visually Robust
Authors:
Asher J. Hancock,
Allen Z. Ren,
Anirudha Majumdar
Abstract:
Vision-language-action (VLA) models trained on large-scale internet data and robot demonstrations have the potential to serve as generalist robot policies. However, despite their large-scale training, VLAs are often brittle to task-irrelevant visual details such as distractor objects or background colors. We introduce Bring Your Own VLA (BYOVLA): a run-time intervention scheme that (1) dynamically…
▽ More
Vision-language-action (VLA) models trained on large-scale internet data and robot demonstrations have the potential to serve as generalist robot policies. However, despite their large-scale training, VLAs are often brittle to task-irrelevant visual details such as distractor objects or background colors. We introduce Bring Your Own VLA (BYOVLA): a run-time intervention scheme that (1) dynamically identifies regions of the input image that the model is sensitive to, and (2) minimally alters task-irrelevant regions to reduce the model's sensitivity using automated image editing tools. Our approach is compatible with any off the shelf VLA without model fine-tuning or access to the model's weights. Hardware experiments on language-instructed manipulation tasks demonstrate that BYOVLA enables state-of-the-art VLA models to nearly retain their nominal performance in the presence of distractor objects and backgrounds, which otherwise degrade task success rates by up to 40%. Website with additional information, videos, and code: https://aasherh.github.io/byovla/ .
△ Less
Submitted 2 October, 2024;
originally announced October 2024.
-
Diffusion Policy Policy Optimization
Authors:
Allen Z. Ren,
Justin Lidard,
Lars L. Ankile,
Anthony Simeonov,
Pulkit Agrawal,
Anirudha Majumdar,
Benjamin Burchfiel,
Hongkai Dai,
Max Simchowitz
Abstract:
We introduce Diffusion Policy Policy Optimization, DPPO, an algorithmic framework including best practices for fine-tuning diffusion-based policies (e.g. Diffusion Policy) in continuous control and robot learning tasks using the policy gradient (PG) method from reinforcement learning (RL). PG methods are ubiquitous in training RL policies with other policy parameterizations; nevertheless, they had…
▽ More
We introduce Diffusion Policy Policy Optimization, DPPO, an algorithmic framework including best practices for fine-tuning diffusion-based policies (e.g. Diffusion Policy) in continuous control and robot learning tasks using the policy gradient (PG) method from reinforcement learning (RL). PG methods are ubiquitous in training RL policies with other policy parameterizations; nevertheless, they had been conjectured to be less efficient for diffusion-based policies. Surprisingly, we show that DPPO achieves the strongest overall performance and efficiency for fine-tuning in common benchmarks compared to other RL methods for diffusion-based policies and also compared to PG fine-tuning of other policy parameterizations. Through experimental investigation, we find that DPPO takes advantage of unique synergies between RL fine-tuning and the diffusion parameterization, leading to structured and on-manifold exploration, stable training, and strong policy robustness. We further demonstrate the strengths of DPPO in a range of realistic settings, including simulated robotic tasks with pixel observations, and via zero-shot deployment of simulation-trained policies on robot hardware in a long-horizon, multi-stage manipulation task. Website with code: diffusion-ppo.github.io
△ Less
Submitted 9 December, 2024; v1 submitted 31 August, 2024;
originally announced September 2024.
-
A Comparison of LLM Finetuning Methods & Evaluation Metrics with Travel Chatbot Use Case
Authors:
Sonia Meyer,
Shreya Singh,
Bertha Tam,
Christopher Ton,
Angel Ren
Abstract:
This research compares large language model (LLM) fine-tuning methods, including Quantized Low Rank Adapter (QLoRA), Retrieval Augmented fine-tuning (RAFT), and Reinforcement Learning from Human Feedback (RLHF), and additionally compared LLM evaluation methods including End to End (E2E) benchmark method of "Golden Answers", traditional natural language processing (NLP) metrics, RAG Assessment (Rag…
▽ More
This research compares large language model (LLM) fine-tuning methods, including Quantized Low Rank Adapter (QLoRA), Retrieval Augmented fine-tuning (RAFT), and Reinforcement Learning from Human Feedback (RLHF), and additionally compared LLM evaluation methods including End to End (E2E) benchmark method of "Golden Answers", traditional natural language processing (NLP) metrics, RAG Assessment (Ragas), OpenAI GPT-4 evaluation metrics, and human evaluation, using the travel chatbot use case. The travel dataset was sourced from the the Reddit API by requesting posts from travel-related subreddits to get travel-related conversation prompts and personalized travel experiences, and augmented for each fine-tuning method. We used two pretrained LLMs utilized for fine-tuning research: LLaMa 2 7B, and Mistral 7B. QLoRA and RAFT are applied to the two pretrained models. The inferences from these models are extensively evaluated against the aforementioned metrics. The best model according to human evaluation and some GPT-4 metrics was Mistral RAFT, so this underwent a Reinforcement Learning from Human Feedback (RLHF) training pipeline, and ultimately was evaluated as the best model. Our main findings are that: 1) quantitative and Ragas metrics do not align with human evaluation, 2) Open AI GPT-4 evaluation most aligns with human evaluation, 3) it is essential to keep humans in the loop for evaluation because, 4) traditional NLP metrics insufficient, 5) Mistral generally outperformed LLaMa, 6) RAFT outperforms QLoRA, but still needs postprocessing, 7) RLHF improves model performance significantly. Next steps include improving data quality, increasing data quantity, exploring RAG methods, and focusing data collection on a specific city, which would improve data quality by narrowing the focus, while creating a useful product.
△ Less
Submitted 7 August, 2024;
originally announced August 2024.
-
Requirements Elicitation in Government Projects: A Preliminary Empirical Study
Authors:
Anqi Ren,
Lin Liu,
Yi Wang,
Xiao Liu,
Hailong Wang,
Kaijia Xu,
Xishuo Zhang,
Chetan Arora
Abstract:
Government development projects vary significantly from private sector initiatives in scope, stakeholder complexity, and regulatory requirements. There is a lack of empirical studies focusing on requirements engineering (RE) activities specifically for government projects. We addressed this gap by conducting a series of semi-structured interviews with 12 professional software practitioners working…
▽ More
Government development projects vary significantly from private sector initiatives in scope, stakeholder complexity, and regulatory requirements. There is a lack of empirical studies focusing on requirements engineering (RE) activities specifically for government projects. We addressed this gap by conducting a series of semi-structured interviews with 12 professional software practitioners working on government projects. These interviewees are employed by two types of companies, each serving different government departments. Our findings uncover differences in the requirements elicitation phase between government projects, particularly for data visualization aspects, and other software projects, such as stakeholders and policy requirements. Additionally, we explore the coverage of human and social aspects in requirements elicitation, finding that culture, team dynamics, and policy implications are critical considerations. Our findings also pinpoint the main challenges encountered during the requirements elicitation phase for government projects. Our findings highlight future research work that is important to bridge the gap in RE activities for government software projects.
△ Less
Submitted 8 April, 2024;
originally announced April 2024.
-
Explore until Confident: Efficient Exploration for Embodied Question Answering
Authors:
Allen Z. Ren,
Jaden Clark,
Anushri Dixit,
Masha Itkina,
Anirudha Majumdar,
Dorsa Sadigh
Abstract:
We consider the problem of Embodied Question Answering (EQA), which refers to settings where an embodied agent such as a robot needs to actively explore an environment to gather information until it is confident about the answer to a question. In this work, we leverage the strong semantic reasoning capabilities of large vision-language models (VLMs) to efficiently explore and answer such questions…
▽ More
We consider the problem of Embodied Question Answering (EQA), which refers to settings where an embodied agent such as a robot needs to actively explore an environment to gather information until it is confident about the answer to a question. In this work, we leverage the strong semantic reasoning capabilities of large vision-language models (VLMs) to efficiently explore and answer such questions. However, there are two main challenges when using VLMs in EQA: they do not have an internal memory for mapping the scene to be able to plan how to explore over time, and their confidence can be miscalibrated and can cause the robot to prematurely stop exploration or over-explore. We propose a method that first builds a semantic map of the scene based on depth information and via visual prompting of a VLM - leveraging its vast knowledge of relevant regions of the scene for exploration. Next, we use conformal prediction to calibrate the VLM's question answering confidence, allowing the robot to know when to stop exploration - leading to a more calibrated and efficient exploration strategy. To test our framework in simulation, we also contribute a new EQA dataset with diverse, realistic human-robot scenarios and scenes built upon the Habitat-Matterport 3D Research Dataset (HM3D). Both simulated and real robot experiments show our proposed approach improves the performance and efficiency over baselines that do no leverage VLM for exploration or do not calibrate its confidence. Webpage with experiment videos and code: https://explore-eqa.github.io/
△ Less
Submitted 7 July, 2024; v1 submitted 23 March, 2024;
originally announced March 2024.
-
Perceive With Confidence: Statistical Safety Assurances for Navigation with Learning-Based Perception
Authors:
Zhiting Mei,
Anushri Dixit,
Meghan Booker,
Emily Zhou,
Mariko Storey-Matsutani,
Allen Z. Ren,
Ola Shorinwa,
Anirudha Majumdar
Abstract:
Rapid advances in perception have enabled large pre-trained models to be used out of the box for transforming high-dimensional, noisy, and partial observations of the world into rich occupancy representations. However, the reliability of these models and consequently their safe integration onto robots remains unknown when deployed in environments unseen during training. To provide safety guarantee…
▽ More
Rapid advances in perception have enabled large pre-trained models to be used out of the box for transforming high-dimensional, noisy, and partial observations of the world into rich occupancy representations. However, the reliability of these models and consequently their safe integration onto robots remains unknown when deployed in environments unseen during training. To provide safety guarantees, we rigorously quantify the uncertainty of pre-trained perception systems for object detection and scene completion via a novel calibration technique based on conformal prediction. Crucially, this procedure guarantees robustness to distribution shifts in states when perception outputs are used in conjunction with a planner. As a result, the calibrated perception system can be used in combination with any safe planner to provide an end-to-end statistical assurance on safety in unseen environments. We evaluate the resulting approach, Perceive with Confidence (PwC), in simulation and on hardware where a quadruped robot navigates through previously unseen indoor, static environments. These experiments validate the safety assurances for obstacle avoidance provided by PwC. In simulation, our method reduces obstacle misdetection by $70\%$ compared to uncalibrated perception models. While misdetections lead to collisions for baseline methods, our approach consistently achieves $100\%$ safety. We further demonstrate reducing the conservatism of our method without sacrificing safety, achieving a $46\%$ increase in success rates in challenging environments while maintaining $100\%$ safety. In hardware experiments, our method improves empirical safety by $40\%$ over baselines and reduces obstacle misdetection by $93.3\%$. The safety gap widens to $46.7\%$ when navigation speed increases, highlighting our approach's robustness under more demanding conditions.
△ Less
Submitted 17 April, 2025; v1 submitted 12 March, 2024;
originally announced March 2024.
-
Learning to Learn Faster from Human Feedback with Language Model Predictive Control
Authors:
Jacky Liang,
Fei Xia,
Wenhao Yu,
Andy Zeng,
Montserrat Gonzalez Arenas,
Maria Attarian,
Maria Bauza,
Matthew Bennice,
Alex Bewley,
Adil Dostmohamed,
Chuyuan Kelly Fu,
Nimrod Gileadi,
Marissa Giustina,
Keerthana Gopalakrishnan,
Leonard Hasenclever,
Jan Humplik,
Jasmine Hsu,
Nikhil Joshi,
Ben Jyenis,
Chase Kew,
Sean Kirmani,
Tsang-Wei Edward Lee,
Kuang-Huei Lee,
Assaf Hurwitz Michaely,
Joss Moore
, et al. (25 additional authors not shown)
Abstract:
Large language models (LLMs) have been shown to exhibit a wide range of capabilities, such as writing robot code from language commands -- enabling non-experts to direct robot behaviors, modify them based on feedback, or compose them to perform new tasks. However, these capabilities (driven by in-context learning) are limited to short-term interactions, where users' feedback remains relevant for o…
▽ More
Large language models (LLMs) have been shown to exhibit a wide range of capabilities, such as writing robot code from language commands -- enabling non-experts to direct robot behaviors, modify them based on feedback, or compose them to perform new tasks. However, these capabilities (driven by in-context learning) are limited to short-term interactions, where users' feedback remains relevant for only as long as it fits within the context size of the LLM, and can be forgotten over longer interactions. In this work, we investigate fine-tuning the robot code-writing LLMs, to remember their in-context interactions and improve their teachability i.e., how efficiently they adapt to human inputs (measured by average number of corrections before the user considers the task successful). Our key observation is that when human-robot interactions are viewed as a partially observable Markov decision process (in which human language inputs are observations, and robot code outputs are actions), then training an LLM to complete previous interactions is training a transition dynamics model -- that can be combined with classic robotics techniques such as model predictive control (MPC) to discover shorter paths to success. This gives rise to Language Model Predictive Control (LMPC), a framework that fine-tunes PaLM 2 to improve its teachability on 78 tasks across 5 robot embodiments -- improving non-expert teaching success rates of unseen tasks by 26.9% while reducing the average number of human corrections from 2.4 to 1.9. Experiments show that LMPC also produces strong meta-learners, improving the success rate of in-context learning new tasks on unseen robot embodiments and APIs by 31.5%. See videos, code, and demos at: https://robot-teaching.github.io/.
△ Less
Submitted 31 May, 2024; v1 submitted 17 February, 2024;
originally announced February 2024.
-
Robots That Ask For Help: Uncertainty Alignment for Large Language Model Planners
Authors:
Allen Z. Ren,
Anushri Dixit,
Alexandra Bodrova,
Sumeet Singh,
Stephen Tu,
Noah Brown,
Peng Xu,
Leila Takayama,
Fei Xia,
Jake Varley,
Zhenjia Xu,
Dorsa Sadigh,
Andy Zeng,
Anirudha Majumdar
Abstract:
Large language models (LLMs) exhibit a wide range of promising capabilities -- from step-by-step planning to commonsense reasoning -- that may provide utility for robots, but remain prone to confidently hallucinated predictions. In this work, we present KnowNo, which is a framework for measuring and aligning the uncertainty of LLM-based planners such that they know when they don't know and ask for…
▽ More
Large language models (LLMs) exhibit a wide range of promising capabilities -- from step-by-step planning to commonsense reasoning -- that may provide utility for robots, but remain prone to confidently hallucinated predictions. In this work, we present KnowNo, which is a framework for measuring and aligning the uncertainty of LLM-based planners such that they know when they don't know and ask for help when needed. KnowNo builds on the theory of conformal prediction to provide statistical guarantees on task completion while minimizing human help in complex multi-step planning settings. Experiments across a variety of simulated and real robot setups that involve tasks with different modes of ambiguity (e.g., from spatial to numeric uncertainties, from human preferences to Winograd schemas) show that KnowNo performs favorably over modern baselines (which may involve ensembles or extensive prompt tuning) in terms of improving efficiency and autonomy, while providing formal assurances. KnowNo can be used with LLMs out of the box without model-finetuning, and suggests a promising lightweight approach to modeling uncertainty that can complement and scale with the growing capabilities of foundation models. Website: https://robot-help.github.io
△ Less
Submitted 4 September, 2023; v1 submitted 4 July, 2023;
originally announced July 2023.
-
PersonaGen: A Tool for Generating Personas from User Feedback
Authors:
Xishuo Zhang,
Lin Liu,
Yi Wang,
Xiao Liu,
Hailong Wang,
Anqi Ren,
Chetan Arora
Abstract:
Personas are crucial in software development processes, particularly in agile settings. However, no effective tools are available for generating personas from user feedback in agile software development processes. To fill this gap, we propose a novel tool that uses the GPT-4 model and knowledge graph to generate persona templates from well-processed user feedback, facilitating requirement analysis…
▽ More
Personas are crucial in software development processes, particularly in agile settings. However, no effective tools are available for generating personas from user feedback in agile software development processes. To fill this gap, we propose a novel tool that uses the GPT-4 model and knowledge graph to generate persona templates from well-processed user feedback, facilitating requirement analysis in agile software development processes. We developed a tool called PersonaGen. We evaluated PersonaGen using qualitative feedback from a small-scale user study involving student software projects. The results were mixed, highlighting challenges in persona-based educational practice and addressing non-functional requirements.
△ Less
Submitted 6 July, 2023; v1 submitted 1 July, 2023;
originally announced July 2023.
-
Learning a Universal Human Prior for Dexterous Manipulation from Human Preference
Authors:
Zihan Ding,
Yuanpei Chen,
Allen Z. Ren,
Shixiang Shane Gu,
Qianxu Wang,
Hao Dong,
Chi Jin
Abstract:
Generating human-like behavior on robots is a great challenge especially in dexterous manipulation tasks with robotic hands. Scripting policies from scratch is intractable due to the high-dimensional control space, and training policies with reinforcement learning (RL) and manual reward engineering can also be hard and lead to unnatural motions. Leveraging the recent progress on RL from Human Feed…
▽ More
Generating human-like behavior on robots is a great challenge especially in dexterous manipulation tasks with robotic hands. Scripting policies from scratch is intractable due to the high-dimensional control space, and training policies with reinforcement learning (RL) and manual reward engineering can also be hard and lead to unnatural motions. Leveraging the recent progress on RL from Human Feedback, we propose a framework that learns a universal human prior using direct human preference feedback over videos, for efficiently tuning the RL policies on 20 dual-hand robot manipulation tasks in simulation, without a single human demonstration. A task-agnostic reward model is trained through iteratively generating diverse polices and collecting human preference over the trajectories; it is then applied for regularizing the behavior of polices in the fine-tuning stage. Our method empirically demonstrates more human-like behaviors on robot hands in diverse tasks including even unseen tasks, indicating its generalization capability.
△ Less
Submitted 13 September, 2023; v1 submitted 10 April, 2023;
originally announced April 2023.
-
AdaptSim: Task-Driven Simulation Adaptation for Sim-to-Real Transfer
Authors:
Allen Z. Ren,
Hongkai Dai,
Benjamin Burchfiel,
Anirudha Majumdar
Abstract:
Simulation parameter settings such as contact models and object geometry approximations are critical to training robust robotic policies capable of transferring from simulation to real-world deployment. Previous approaches typically handcraft distributions over such parameters (domain randomization), or identify parameters that best match the dynamics of the real environment (system identification…
▽ More
Simulation parameter settings such as contact models and object geometry approximations are critical to training robust robotic policies capable of transferring from simulation to real-world deployment. Previous approaches typically handcraft distributions over such parameters (domain randomization), or identify parameters that best match the dynamics of the real environment (system identification). However, there is often an irreducible gap between simulation and reality: attempting to match the dynamics between simulation and reality across all states and tasks may be infeasible and may not lead to policies that perform well in reality for a specific task. Addressing this issue, we propose AdaptSim, a new task-driven adaptation framework for sim-to-real transfer that aims to optimize task performance in target (real) environments -- instead of matching dynamics between simulation and reality. First, we meta-learn an adaptation policy in simulation using reinforcement learning for adjusting the simulation parameter distribution based on the current policy's performance in a target environment. We then perform iterative real-world adaptation by inferring new simulation parameter distributions for policy training, using a small amount of real data. We perform experiments in three robotic tasks: (1) swing-up of linearized double pendulum, (2) dynamic table-top pushing of a bottle, and (3) dynamic scooping of food pieces with a spatula. Our extensive simulation and hardware experiments demonstrate AdaptSim achieving 1-3x asymptotic performance and $\sim$2x real data efficiency when adapting to different environments, compared to methods based on Sys-ID and directly training the task policy in target environments. Website: https://irom-lab.github.io/AdaptSim/
△ Less
Submitted 30 September, 2023; v1 submitted 9 February, 2023;
originally announced February 2023.
-
FlowDrone: Wind Estimation and Gust Rejection on UAVs Using Fast-Response Hot-Wire Flow Sensors
Authors:
Nathaniel Simon,
Allen Z. Ren,
Alexander Piqué,
David Snyder,
Daphne Barretto,
Marcus Hultmark,
Anirudha Majumdar
Abstract:
Unmanned aerial vehicles (UAVs) are finding use in applications that place increasing emphasis on robustness to external disturbances including extreme wind. However, traditional multirotor UAV platforms do not directly sense wind; conventional flow sensors are too slow, insensitive, or bulky for widespread integration on UAVs. Instead, drones typically observe the effects of wind indirectly throu…
▽ More
Unmanned aerial vehicles (UAVs) are finding use in applications that place increasing emphasis on robustness to external disturbances including extreme wind. However, traditional multirotor UAV platforms do not directly sense wind; conventional flow sensors are too slow, insensitive, or bulky for widespread integration on UAVs. Instead, drones typically observe the effects of wind indirectly through accumulated errors in position or trajectory tracking. In this work, we integrate a novel flow sensor based on micro-electro-mechanical systems (MEMS) hot-wire technology developed in our prior work onto a multirotor UAV for wind estimation. These sensors are omnidirectional, lightweight, fast, and accurate. In order to achieve superior tracking performance in windy conditions, we train a `wind-aware' residual-based controller via reinforcement learning using simulated wind gusts and their aerodynamic effects on the drone. In extensive hardware experiments, we demonstrate the wind-aware controller outperforming two strong `wind-unaware' baseline controllers in challenging windy conditions. See: https://youtu.be/KWqkH9Z-338.
△ Less
Submitted 24 October, 2022; v1 submitted 11 October, 2022;
originally announced October 2022.
-
Leveraging Language for Accelerated Learning of Tool Manipulation
Authors:
Allen Z. Ren,
Bharat Govil,
Tsung-Yen Yang,
Karthik Narasimhan,
Anirudha Majumdar
Abstract:
Robust and generalized tool manipulation requires an understanding of the properties and affordances of different tools. We investigate whether linguistic information about a tool (e.g., its geometry, common uses) can help control policies adapt faster to new tools for a given task. We obtain diverse descriptions of various tools in natural language and use pre-trained language models to generate…
▽ More
Robust and generalized tool manipulation requires an understanding of the properties and affordances of different tools. We investigate whether linguistic information about a tool (e.g., its geometry, common uses) can help control policies adapt faster to new tools for a given task. We obtain diverse descriptions of various tools in natural language and use pre-trained language models to generate their feature representations. We then perform language-conditioned meta-learning to learn policies that can efficiently adapt to new tools given their corresponding text descriptions. Our results demonstrate that combining linguistic information and meta-learning significantly accelerates tool learning in several manipulation tasks including pushing, lifting, sweeping, and hammering.
△ Less
Submitted 27 June, 2022;
originally announced June 2022.
-
Failure Prediction with Statistical Guarantees for Vision-Based Robot Control
Authors:
Alec Farid,
David Snyder,
Allen Z. Ren,
Anirudha Majumdar
Abstract:
We are motivated by the problem of performing failure prediction for safety-critical robotic systems with high-dimensional sensor observations (e.g., vision). Given access to a black-box control policy (e.g., in the form of a neural network) and a dataset of training environments, we present an approach for synthesizing a failure predictor with guaranteed bounds on false-positive and false-negativ…
▽ More
We are motivated by the problem of performing failure prediction for safety-critical robotic systems with high-dimensional sensor observations (e.g., vision). Given access to a black-box control policy (e.g., in the form of a neural network) and a dataset of training environments, we present an approach for synthesizing a failure predictor with guaranteed bounds on false-positive and false-negative errors. In order to achieve this, we utilize techniques from Probably Approximately Correct (PAC)-Bayes generalization theory. In addition, we present novel class-conditional bounds that allow us to trade-off the relative rates of false-positive vs. false-negative errors. We propose algorithms that train failure predictors (that take as input the history of sensor observations) by minimizing our theoretical error bounds. We demonstrate the resulting approach using extensive simulation and hardware experiments for vision-based navigation with a drone and grasping objects with a robotic manipulator equipped with a wrist-mounted RGB-D camera. These experiments illustrate the ability of our approach to (1) provide strong bounds on failure prediction error rates (that closely match empirical error rates), and (2) improve safety by predicting failures.
△ Less
Submitted 5 May, 2022; v1 submitted 11 February, 2022;
originally announced February 2022.
-
Sim-to-Lab-to-Real: Safe Reinforcement Learning with Shielding and Generalization Guarantees
Authors:
Kai-Chieh Hsu,
Allen Z. Ren,
Duy Phuong Nguyen,
Anirudha Majumdar,
Jaime F. Fisac
Abstract:
Safety is a critical component of autonomous systems and remains a challenge for learning-based policies to be utilized in the real world. In particular, policies learned using reinforcement learning often fail to generalize to novel environments due to unsafe behavior. In this paper, we propose Sim-to-Lab-to-Real to bridge the reality gap with a probabilistically guaranteed safety-aware policy di…
▽ More
Safety is a critical component of autonomous systems and remains a challenge for learning-based policies to be utilized in the real world. In particular, policies learned using reinforcement learning often fail to generalize to novel environments due to unsafe behavior. In this paper, we propose Sim-to-Lab-to-Real to bridge the reality gap with a probabilistically guaranteed safety-aware policy distribution. To improve safety, we apply a dual policy setup where a performance policy is trained using the cumulative task reward and a backup (safety) policy is trained by solving the Safety Bellman Equation based on Hamilton-Jacobi (HJ) reachability analysis. In Sim-to-Lab transfer, we apply a supervisory control scheme to shield unsafe actions during exploration; in Lab-to-Real transfer, we leverage the Probably Approximately Correct (PAC)-Bayes framework to provide lower bounds on the expected performance and safety of policies in unseen environments. Additionally, inheriting from the HJ reachability analysis, the bound accounts for the expectation over the worst-case safety in each environment. We empirically study the proposed framework for ego-vision navigation in two types of indoor environments with varying degrees of photorealism. We also demonstrate strong generalization performance through hardware experiments in real indoor spaces with a quadrupedal robot. See https://sites.google.com/princeton.edu/sim-to-lab-to-real for supplementary material.
△ Less
Submitted 1 April, 2023; v1 submitted 20 January, 2022;
originally announced January 2022.
-
Stronger Generalization Guarantees for Robot Learning by Combining Generative Models and Real-World Data
Authors:
Abhinav Agarwal,
Sushant Veer,
Allen Z. Ren,
Anirudha Majumdar
Abstract:
We are motivated by the problem of learning policies for robotic systems with rich sensory inputs (e.g., vision) in a manner that allows us to guarantee generalization to environments unseen during training. We provide a framework for providing such generalization guarantees by leveraging a finite dataset of real-world environments in combination with a (potentially inaccurate) generative model of…
▽ More
We are motivated by the problem of learning policies for robotic systems with rich sensory inputs (e.g., vision) in a manner that allows us to guarantee generalization to environments unseen during training. We provide a framework for providing such generalization guarantees by leveraging a finite dataset of real-world environments in combination with a (potentially inaccurate) generative model of environments. The key idea behind our approach is to utilize the generative model in order to implicitly specify a prior over policies. This prior is updated using the real-world dataset of environments by minimizing an upper bound on the expected cost across novel environments derived via Probably Approximately Correct (PAC)-Bayes generalization theory. We demonstrate our approach on two simulated systems with nonlinear/hybrid dynamics and rich sensing modalities: (i) quadrotor navigation with an onboard vision sensor, and (ii) grasping objects using a depth sensor. Comparisons with prior work demonstrate the ability of our approach to obtain stronger generalization guarantees by utilizing generative models. We also present hardware experiments for validating our bounds for the grasping task.
△ Less
Submitted 22 July, 2022; v1 submitted 16 November, 2021;
originally announced November 2021.
-
Distributionally Robust Policy Learning via Adversarial Environment Generation
Authors:
Allen Z. Ren,
Anirudha Majumdar
Abstract:
Our goal is to train control policies that generalize well to unseen environments. Inspired by the Distributionally Robust Optimization (DRO) framework, we propose DRAGEN - Distributionally Robust policy learning via Adversarial Generation of ENvironments - for iteratively improving robustness of policies to realistic distribution shifts by generating adversarial environments. The key idea is to l…
▽ More
Our goal is to train control policies that generalize well to unseen environments. Inspired by the Distributionally Robust Optimization (DRO) framework, we propose DRAGEN - Distributionally Robust policy learning via Adversarial Generation of ENvironments - for iteratively improving robustness of policies to realistic distribution shifts by generating adversarial environments. The key idea is to learn a generative model for environments whose latent variables capture cost-predictive and realistic variations in environments. We perform DRO with respect to a Wasserstein ball around the empirical distribution of environments by generating realistic adversarial environments via gradient ascent on the latent space. We demonstrate strong Out-of-Distribution (OoD) generalization in simulation for (i) swinging up a pendulum with onboard vision and (ii) grasping realistic 3D objects. Grasping experiments on hardware demonstrate better sim2real performance compared to domain randomization.
△ Less
Submitted 7 July, 2022; v1 submitted 13 July, 2021;
originally announced July 2021.
-
Improving DNN Fault Tolerance using Weight Pruning and Differential Crossbar Mapping for ReRAM-based Edge AI
Authors:
Geng Yuan,
Zhiheng Liao,
Xiaolong Ma,
Yuxuan Cai,
Zhenglun Kong,
Xuan Shen,
Jingyan Fu,
Zhengang Li,
Chengming Zhang,
Hongwu Peng,
Ning Liu,
Ao Ren,
Jinhui Wang,
Yanzhi Wang
Abstract:
Recent research demonstrated the promise of using resistive random access memory (ReRAM) as an emerging technology to perform inherently parallel analog domain in-situ matrix-vector multiplication -- the intensive and key computation in deep neural networks (DNNs). However, hardware failure, such as stuck-at-fault defects, is one of the main concerns that impedes the ReRAM devices to be a feasible…
▽ More
Recent research demonstrated the promise of using resistive random access memory (ReRAM) as an emerging technology to perform inherently parallel analog domain in-situ matrix-vector multiplication -- the intensive and key computation in deep neural networks (DNNs). However, hardware failure, such as stuck-at-fault defects, is one of the main concerns that impedes the ReRAM devices to be a feasible solution for real implementations. The existing solutions to address this issue usually require an optimization to be conducted for each individual device, which is impractical for mass-produced products (e.g., IoT devices). In this paper, we rethink the value of weight pruning in ReRAM-based DNN design from the perspective of model fault tolerance. And a differential mapping scheme is proposed to improve the fault tolerance under a high stuck-on fault rate. Our method can tolerate almost an order of magnitude higher failure rate than the traditional two-column method in representative DNN tasks. More importantly, our method does not require extra hardware cost compared to the traditional two-column mapping scheme. The improvement is universal and does not require the optimization process for each individual device.
△ Less
Submitted 18 June, 2021; v1 submitted 16 June, 2021;
originally announced June 2021.
-
CSAFL: A Clustered Semi-Asynchronous Federated Learning Framework
Authors:
Yu Zhang,
Moming Duan,
Duo Liu,
Li Li,
Ao Ren,
Xianzhang Chen,
Yujuan Tan,
Chengliang Wang
Abstract:
Federated learning (FL) is an emerging distributed machine learning paradigm that protects privacy and tackles the problem of isolated data islands. At present, there are two main communication strategies of FL: synchronous FL and asynchronous FL. The advantages of synchronous FL are that the model has high precision and fast convergence speed. However, this synchronous communication strategy has…
▽ More
Federated learning (FL) is an emerging distributed machine learning paradigm that protects privacy and tackles the problem of isolated data islands. At present, there are two main communication strategies of FL: synchronous FL and asynchronous FL. The advantages of synchronous FL are that the model has high precision and fast convergence speed. However, this synchronous communication strategy has the risk that the central server waits too long for the devices, namely, the straggler effect which has a negative impact on some time-critical applications. Asynchronous FL has a natural advantage in mitigating the straggler effect, but there are threats of model quality degradation and server crash. Therefore, we combine the advantages of these two strategies to propose a clustered semi-asynchronous federated learning (CSAFL) framework. We evaluate CSAFL based on four imbalanced federated datasets in a non-IID setting and compare CSAFL to the baseline methods. The experimental results show that CSAFL significantly improves test accuracy by more than +5% on the four datasets compared to TA-FedAvg. In particular, CSAFL improves absolute test accuracy by +34.4% on non-IID FEMNIST compared to TA-FedAvg.
△ Less
Submitted 16 April, 2021;
originally announced April 2021.
-
FedSAE: A Novel Self-Adaptive Federated Learning Framework in Heterogeneous Systems
Authors:
Li Li,
Moming Duan,
Duo Liu,
Yu Zhang,
Ao Ren,
Xianzhang Chen,
Yujuan Tan,
Chengliang Wang
Abstract:
Federated Learning (FL) is a novel distributed machine learning which allows thousands of edge devices to train model locally without uploading data concentrically to the server. But since real federated settings are resource-constrained, FL is encountered with systems heterogeneity which causes a lot of stragglers directly and then leads to significantly accuracy reduction indirectly. To solve th…
▽ More
Federated Learning (FL) is a novel distributed machine learning which allows thousands of edge devices to train model locally without uploading data concentrically to the server. But since real federated settings are resource-constrained, FL is encountered with systems heterogeneity which causes a lot of stragglers directly and then leads to significantly accuracy reduction indirectly. To solve the problems caused by systems heterogeneity, we introduce a novel self-adaptive federated framework FedSAE which adjusts the training task of devices automatically and selects participants actively to alleviate the performance degradation. In this work, we 1) propose FedSAE which leverages the complete information of devices' historical training tasks to predict the affordable training workloads for each device. In this way, FedSAE can estimate the reliability of each device and self-adaptively adjust the amount of training load per client in each round. 2) combine our framework with Active Learning to self-adaptively select participants. Then the framework accelerates the convergence of the global model. In our framework, the server evaluates devices' value of training based on their training loss. Then the server selects those clients with bigger value for the global model to reduce communication overhead. The experimental result indicates that in a highly heterogeneous system, FedSAE converges faster than FedAvg, the vanilla FL framework. Furthermore, FedSAE outperforms than FedAvg on several federated datasets - FedSAE improves test accuracy by 26.7% and reduces stragglers by 90.3% on average.
△ Less
Submitted 15 April, 2021;
originally announced April 2021.
-
Multivariate Time Series Classification with Hierarchical Variational Graph Pooling
Authors:
Ziheng Duan,
Haoyan Xu,
Yueyang Wang,
Yida Huang,
Anni Ren,
Zhongbin Xu,
Yizhou Sun,
Wei Wang
Abstract:
With the advancement of sensing technology, multivariate time series classification (MTSC) has recently received considerable attention. Existing deep learning-based MTSC techniques, which mostly rely on convolutional or recurrent neural networks, are primarily concerned with the temporal dependency of single time series. As a result, they struggle to express pairwise dependencies among multivaria…
▽ More
With the advancement of sensing technology, multivariate time series classification (MTSC) has recently received considerable attention. Existing deep learning-based MTSC techniques, which mostly rely on convolutional or recurrent neural networks, are primarily concerned with the temporal dependency of single time series. As a result, they struggle to express pairwise dependencies among multivariate variables directly. Furthermore, current spatial-temporal modeling (e.g., graph classification) methodologies based on Graph Neural Networks (GNNs) are inherently flat and cannot aggregate hub data in a hierarchical manner. To address these limitations, we propose a novel graph pooling-based framework MTPool to obtain the expressive global representation of MTS. We first convert MTS slices to graphs by utilizing interactions of variables via graph structure learning module and attain the spatial-temporal graph node features via temporal convolutional module. To get global graph-level representation, we design an "encoder-decoder" based variational graph pooling module for creating adaptive centroids for cluster assignments. Then we combine GNNs and our proposed variational graph pooling layers for joint graph representation learning and graph coarsening, after which the graph is progressively coarsened to one node. At last, a differentiable classifier takes this coarsened representation to get the final predicted class. Experiments on ten benchmark datasets exhibit MTPool outperforms state-of-the-art strategies in the MTSC task.
△ Less
Submitted 5 November, 2021; v1 submitted 12 October, 2020;
originally announced October 2020.
-
MTHetGNN: A Heterogeneous Graph Embedding Framework for Multivariate Time Series Forecasting
Authors:
Yueyang Wang,
Ziheng Duan,
Yida Huang,
Haoyan Xu,
Jie Feng,
Anni Ren
Abstract:
Multivariate time series forecasting, which analyzes historical time series to predict future trends, can effectively help decision-making. Complex relations among variables in MTS, including static, dynamic, predictable, and latent relations, have made it possible to mining more features of MTS. Modeling complex relations are not only essential in characterizing latent dependency as well as model…
▽ More
Multivariate time series forecasting, which analyzes historical time series to predict future trends, can effectively help decision-making. Complex relations among variables in MTS, including static, dynamic, predictable, and latent relations, have made it possible to mining more features of MTS. Modeling complex relations are not only essential in characterizing latent dependency as well as modeling temporal dependence but also brings great challenges in the MTS forecasting task. However, existing methods mainly focus on modeling certain relations among MTS variables. In this paper, we propose a novel end-to-end deep learning model, termed Multivariate Time Series Forecasting via Heterogeneous Graph Neural Networks (MTHetGNN). To characterize complex relations among variables, a relation embedding module is designed in MTHetGNN, where each variable is regarded as a graph node, and each type of edge represents a specific static or dynamic relationship. Meanwhile, a temporal embedding module is introduced for time series features extraction, where involving convolutional neural network (CNN) filters with different perception scales. Finally, a heterogeneous graph embedding module is adopted to handle the complex structural information generated by the two modules. Three benchmark datasets from the real world are used to evaluate the proposed MTHetGNN. The comprehensive experiments show that MTHetGNN achieves state-of-the-art results in the MTS forecasting task.
△ Less
Submitted 14 December, 2021; v1 submitted 19 August, 2020;
originally announced August 2020.
-
Parallel Extraction of Long-term Trends and Short-term Fluctuation Framework for Multivariate Time Series Forecasting
Authors:
Yifu Zhou,
Ziheng Duan,
Haoyan Xu,
Jie Feng,
Anni Ren,
Yueyang Wang,
Xiaoqian Wang
Abstract:
Multivariate time series forecasting is widely used in various fields. Reasonable prediction results can assist people in planning and decision-making, generate benefits and avoid risks. Normally, there are two characteristics of time series, that is, long-term trend and short-term fluctuation. For example, stock prices will have a long-term upward trend with the market, but there may be a small d…
▽ More
Multivariate time series forecasting is widely used in various fields. Reasonable prediction results can assist people in planning and decision-making, generate benefits and avoid risks. Normally, there are two characteristics of time series, that is, long-term trend and short-term fluctuation. For example, stock prices will have a long-term upward trend with the market, but there may be a small decline in the short term. These two characteristics are often relatively independent of each other. However, the existing prediction methods often do not distinguish between them, which reduces the accuracy of the prediction model. In this paper, a MTS forecasting framework that can capture the long-term trends and short-term fluctuations of time series in parallel is proposed. This method uses the original time series and its first difference to characterize long-term trends and short-term fluctuations. Three prediction sub-networks are constructed to predict long-term trends, short-term fluctuations and the final value to be predicted. In the overall optimization goal, the idea of multi-task learning is used for reference, which is to make the prediction results of long-term trends and short-term fluctuations as close to the real values as possible while requiring to approximate the values to be predicted. In this way, the proposed method uses more supervision information and can more accurately capture the changing trend of the time series, thereby improving the forecasting performance.
△ Less
Submitted 22 March, 2021; v1 submitted 17 August, 2020;
originally announced August 2020.
-
Generalization Guarantees for Imitation Learning
Authors:
Allen Z. Ren,
Sushant Veer,
Anirudha Majumdar
Abstract:
Control policies from imitation learning can often fail to generalize to novel environments due to imperfect demonstrations or the inability of imitation learning algorithms to accurately infer the expert's policies. In this paper, we present rigorous generalization guarantees for imitation learning by leveraging the Probably Approximately Correct (PAC)-Bayes framework to provide upper bounds on t…
▽ More
Control policies from imitation learning can often fail to generalize to novel environments due to imperfect demonstrations or the inability of imitation learning algorithms to accurately infer the expert's policies. In this paper, we present rigorous generalization guarantees for imitation learning by leveraging the Probably Approximately Correct (PAC)-Bayes framework to provide upper bounds on the expected cost of policies in novel environments. We propose a two-stage training method where a latent policy distribution is first embedded with multi-modal expert behavior using a conditional variational autoencoder, and then "fine-tuned" in new training environments to explicitly optimize the generalization bound. We demonstrate strong generalization bounds and their tightness relative to empirical performance in simulation for (i) grasping diverse mugs, (ii) planar pushing with visual feedback, and (iii) vision-based indoor navigation, as well as through hardware experiments for the two manipulation tasks.
△ Less
Submitted 3 December, 2020; v1 submitted 4 August, 2020;
originally announced August 2020.
-
DARB: A Density-Aware Regular-Block Pruning for Deep Neural Networks
Authors:
Ao Ren,
Tao Zhang,
Yuhao Wang,
Sheng Lin,
Peiyan Dong,
Yen-kuang Chen,
Yuan Xie,
Yanzhi Wang
Abstract:
The rapidly growing parameter volume of deep neural networks (DNNs) hinders the artificial intelligence applications on resource constrained devices, such as mobile and wearable devices. Neural network pruning, as one of the mainstream model compression techniques, is under extensive study to reduce the number of parameters and computations. In contrast to irregular pruning that incurs high index…
▽ More
The rapidly growing parameter volume of deep neural networks (DNNs) hinders the artificial intelligence applications on resource constrained devices, such as mobile and wearable devices. Neural network pruning, as one of the mainstream model compression techniques, is under extensive study to reduce the number of parameters and computations. In contrast to irregular pruning that incurs high index storage and decoding overhead, structured pruning techniques have been proposed as the promising solutions. However, prior studies on structured pruning tackle the problem mainly from the perspective of facilitating hardware implementation, without analyzing the characteristics of sparse neural networks. The neglect on the study of sparse neural networks causes inefficient trade-off between regularity and pruning ratio. Consequently, the potential of structurally pruning neural networks is not sufficiently mined.
In this work, we examine the structural characteristics of the irregularly pruned weight matrices, such as the diverse redundancy of different rows, the sensitivity of different rows to pruning, and the positional characteristics of retained weights. By leveraging the gained insights as a guidance, we first propose the novel block-max weight masking (BMWM) method, which can effectively retain the salient weights while imposing high regularity to the weight matrix. As a further optimization, we propose a density-adaptive regular-block (DARB) pruning that outperforms prior structured pruning work with high pruning ratio and decoding efficiency. Our experimental results show that DARB can achieve 13$\times$ to 25$\times$ pruning ratio, which are 2.8$\times$ to 4.3$\times$ improvements than the state-of-the-art counterparts on multiple neural network models and tasks. Moreover, DARB can achieve 14.3$\times$ decoding efficiency than block pruning with higher pruning ratio.
△ Less
Submitted 20 November, 2019; v1 submitted 18 November, 2019;
originally announced November 2019.
-
Interpretable Models for Understanding Immersive Simulations
Authors:
Nicholas Hoernle,
Kobi Gal,
Barbara Grosz,
Leilah Lyons,
Ada Ren,
Andee Rubin
Abstract:
This paper describes methods for comparative evaluation of the interpretability of models of high dimensional time series data inferred by unsupervised machine learning algorithms. The time series data used in this investigation were logs from an immersive simulation like those commonly used in education and healthcare training. The structures learnt by the models provide representations of partic…
▽ More
This paper describes methods for comparative evaluation of the interpretability of models of high dimensional time series data inferred by unsupervised machine learning algorithms. The time series data used in this investigation were logs from an immersive simulation like those commonly used in education and healthcare training. The structures learnt by the models provide representations of participants' activities in the simulation which are intended to be meaningful to people's interpretation. To choose the model that induces the best representation, we designed two interpretability tests, each of which evaluates the extent to which a model's output aligns with people's expectations or intuitions of what has occurred in the simulation. We compared the performance of the models on these interpretability tests to their performance on statistical information criteria. We show that the models that optimize interpretability quality differ from those that optimize (statistical) information theoretic criteria. Furthermore, we found that a model using a fully Bayesian approach performed well on both the statistical and human-interpretability measures. The Bayesian approach is a good candidate for fully automated model selection, i.e., when direct empirical investigations of interpretability are costly or infeasible.
△ Less
Submitted 4 May, 2020; v1 submitted 24 September, 2019;
originally announced September 2019.
-
A Stochastic-Computing based Deep Learning Framework using Adiabatic Quantum-Flux-Parametron SuperconductingTechnology
Authors:
Ruizhe Cai,
Ao Ren,
Olivia Chen,
Ning Liu,
Caiwen Ding,
Xuehai Qian,
Jie Han,
Wenhui Luo,
Nobuyuki Yoshikawa,
Yanzhi Wang
Abstract:
The Adiabatic Quantum-Flux-Parametron (AQFP) superconducting technology has been recently developed, which achieves the highest energy efficiency among superconducting logic families, potentially huge gain compared with state-of-the-art CMOS. In 2016, the successful fabrication and testing of AQFP-based circuits with the scale of 83,000 JJs have demonstrated the scalability and potential of implem…
▽ More
The Adiabatic Quantum-Flux-Parametron (AQFP) superconducting technology has been recently developed, which achieves the highest energy efficiency among superconducting logic families, potentially huge gain compared with state-of-the-art CMOS. In 2016, the successful fabrication and testing of AQFP-based circuits with the scale of 83,000 JJs have demonstrated the scalability and potential of implementing large-scale systems using AQFP. As a result, it will be promising for AQFP in high-performance computing and deep space applications, with Deep Neural Network (DNN) inference acceleration as an important example. Besides ultra-high energy efficiency, AQFP exhibits two unique characteristics: the deep pipelining nature since each AQFP logic gate is connected with an AC clock signal, which increases the difficulty to avoid RAW hazards; the second is the unique opportunity of true random number generation (RNG) using a single AQFP buffer, far more efficient than RNG in CMOS. We point out that these two characteristics make AQFP especially compatible with the \emph{stochastic computing} (SC) technique, which uses a time-independent bit sequence for value representation, and is compatible with the deep pipelining nature. Further, the application of SC has been investigated in DNNs in prior work, and the suitability has been illustrated as SC is more compatible with approximate computations. This work is the first to develop an SC-based DNN acceleration framework using AQFP technology.
△ Less
Submitted 21 July, 2019;
originally announced July 2019.
-
A Novel Secure Authentication Scheme for Heterogeneous Internet of Thing
Authors:
Jingwei Liu,
Ailian Ren,
Lihuan Zhang,
Rong Sun,
Xiaojiang Du,
Mohsen Guizani
Abstract:
Today, Internet of Things (IoT) technology is being increasingly popular which is applied in a wide range of industry sectors such as healthcare, transportation and some critical infrastructures. With the widespread applications of IoT technology, people's lives have changed dramatically. Due to its capabilities of sensitive data-aware, information collection, communication and processing, it rais…
▽ More
Today, Internet of Things (IoT) technology is being increasingly popular which is applied in a wide range of industry sectors such as healthcare, transportation and some critical infrastructures. With the widespread applications of IoT technology, people's lives have changed dramatically. Due to its capabilities of sensitive data-aware, information collection, communication and processing, it raises security and privacy concerns. Moreover, a malicious attacker may impersonate a legitimate user, which may cause security threat and violation privacy. In allusion to the above problems, we propose a novel and lightweight anonymous authentication and key agreement scheme for heterogeneous IoT, which is innovatively designed to shift between the public key infrastructure (PKI) and certificateless cryptography (CLC) environment. The proposed scheme not only achieves secure communication among the legal authorized users, but also possesses more attributes with user anonymity, non-repudiation and key agreement fairness. Through the security analysis, it is proved that the proposed scheme can resist replay attacks and denial of service (DOS) attacks. Finally, the performance evaluation demonstrates that our scheme is more lightweight and innovative.
△ Less
Submitted 10 February, 2019;
originally announced February 2019.
-
ADMM-NN: An Algorithm-Hardware Co-Design Framework of DNNs Using Alternating Direction Method of Multipliers
Authors:
Ao Ren,
Tianyun Zhang,
Shaokai Ye,
Jiayu Li,
Wenyao Xu,
Xuehai Qian,
Xue Lin,
Yanzhi Wang
Abstract:
To facilitate efficient embedded and hardware implementations of deep neural networks (DNNs), two important categories of DNN model compression techniques: weight pruning and weight quantization are investigated. The former leverages the redundancy in the number of weights, whereas the latter leverages the redundancy in bit representation of weights. However, there lacks a systematic framework of…
▽ More
To facilitate efficient embedded and hardware implementations of deep neural networks (DNNs), two important categories of DNN model compression techniques: weight pruning and weight quantization are investigated. The former leverages the redundancy in the number of weights, whereas the latter leverages the redundancy in bit representation of weights. However, there lacks a systematic framework of joint weight pruning and quantization of DNNs, thereby limiting the available model compression ratio. Moreover, the computation reduction, energy efficiency improvement, and hardware performance overhead need to be accounted for besides simply model size reduction.
To address these limitations, we present ADMM-NN, the first algorithm-hardware co-optimization framework of DNNs using Alternating Direction Method of Multipliers (ADMM), a powerful technique to deal with non-convex optimization problems with possibly combinatorial constraints. The first part of ADMM-NN is a systematic, joint framework of DNN weight pruning and quantization using ADMM. It can be understood as a smart regularization technique with regularization target dynamically updated in each ADMM iteration, thereby resulting in higher performance in model compression than prior work. The second part is hardware-aware DNN optimizations to facilitate hardware-level implementations.
Without accuracy loss, we can achieve 85$\times$ and 24$\times$ pruning on LeNet-5 and AlexNet models, respectively, significantly higher than prior work. The improvement becomes more significant when focusing on computation reductions. Combining weight pruning and quantization, we achieve 1,910$\times$ and 231$\times$ reductions in overall model size on these two benchmarks, when focusing on data storage. Highly promising results are also observed on other representative DNNs such as VGGNet and ResNet-50.
△ Less
Submitted 30 December, 2018;
originally announced December 2018.
-
Towards Budget-Driven Hardware Optimization for Deep Convolutional Neural Networks using Stochastic Computing
Authors:
Zhe Li,
Ji Li,
Ao Ren,
Caiwen Ding,
Jeffrey Draper,
Qinru Qiu,
Bo Yuan,
Yanzhi Wang
Abstract:
Recently, Deep Convolutional Neural Network (DCNN) has achieved tremendous success in many machine learning applications. Nevertheless, the deep structure has brought significant increases in computation complexity. Largescale deep learning systems mainly operate in high-performance server clusters, thus restricting the application extensions to personal or mobile devices. Previous works on GPU an…
▽ More
Recently, Deep Convolutional Neural Network (DCNN) has achieved tremendous success in many machine learning applications. Nevertheless, the deep structure has brought significant increases in computation complexity. Largescale deep learning systems mainly operate in high-performance server clusters, thus restricting the application extensions to personal or mobile devices. Previous works on GPU and/or FPGA acceleration for DCNNs show increasing speedup, but ignore other constraints, such as area, power, and energy. Stochastic Computing (SC), as a unique data representation and processing technique, has the potential to enable the design of fully parallel and scalable hardware implementations of large-scale deep learning systems. This paper proposed an automatic design allocation algorithm driven by budget requirement considering overall accuracy performance. This systematic method enables the automatic design of a DCNN where all design parameters are jointly optimized. Experimental results demonstrate that proposed algorithm can achieve a joint optimization of all design parameters given the comprehensive budget of a DCNN.
△ Less
Submitted 10 May, 2018;
originally announced May 2018.
-
Structured Weight Matrices-Based Hardware Accelerators in Deep Neural Networks: FPGAs and ASICs
Authors:
Caiwen Ding,
Ao Ren,
Geng Yuan,
Xiaolong Ma,
Jiayu Li,
Ning Liu,
Bo Yuan,
Yanzhi Wang
Abstract:
Both industry and academia have extensively investigated hardware accelerations. In this work, to address the increasing demands in computational capability and memory requirement, we propose structured weight matrices (SWM)-based compression techniques for both \emph{field programmable gate array} (FPGA) and \emph{application-specific integrated circuit} (ASIC) implementations. In algorithm part,…
▽ More
Both industry and academia have extensively investigated hardware accelerations. In this work, to address the increasing demands in computational capability and memory requirement, we propose structured weight matrices (SWM)-based compression techniques for both \emph{field programmable gate array} (FPGA) and \emph{application-specific integrated circuit} (ASIC) implementations. In algorithm part, SWM-based framework adopts block-circulant matrices to achieve a fine-grained tradeoff between accuracy and compression ratio. The SWM-based technique can reduce computational complexity from O($n^2$) to O($n\log n$) and storage complexity from O($n^2$) to O($n$) for each layer and both training and inference phases. For FPGA implementations on deep convolutional neural networks (DCNNs), we achieve at least 152X and 72X improvement in performance and energy efficiency, respectively using the SWM-based framework, compared with the baseline of IBM TrueNorth processor under same accuracy constraints using the data set of MNIST, SVHN, and CIFAR-10. For FPGA implementations on long short term memory (LSTM) networks, the proposed SWM-based LSTM can achieve up to 21X enhancement in performance and 33.5X gains in energy efficiency compared with the baseline accelerator. For ASIC implementations, the SWM-based ASIC design exhibits impressive advantages in terms of power, throughput, and energy efficiency. Experimental results indicate that this method is greatly suitable for applying DNNs onto both FPGAs and mobile/IoT devices.
△ Less
Submitted 28 March, 2018;
originally announced April 2018.
-
An Area and Energy Efficient Design of Domain-Wall Memory-Based Deep Convolutional Neural Networks using Stochastic Computing
Authors:
Xiaolong Ma,
Yipeng Zhang,
Geng Yuan,
Ao Ren,
Zhe Li,
Jie Han,
Jingtong Hu,
Yanzhi Wang
Abstract:
With recent trend of wearable devices and Internet of Things (IoTs), it becomes attractive to develop hardware-based deep convolutional neural networks (DCNNs) for embedded applications, which require low power/energy consumptions and small hardware footprints. Recent works demonstrated that the Stochastic Computing (SC) technique can radically simplify the hardware implementation of arithmetic un…
▽ More
With recent trend of wearable devices and Internet of Things (IoTs), it becomes attractive to develop hardware-based deep convolutional neural networks (DCNNs) for embedded applications, which require low power/energy consumptions and small hardware footprints. Recent works demonstrated that the Stochastic Computing (SC) technique can radically simplify the hardware implementation of arithmetic units and has the potential to satisfy the stringent power requirements in embedded devices. However, in these works, the memory design optimization is neglected for weight storage, which will inevitably result in large hardware cost. Moreover, if conventional volatile SRAM or DRAM cells are utilized for weight storage, the weights need to be re-initialized whenever the DCNN platform is re-started.
In order to overcome these limitations, in this work we adopt an emerging non-volatile Domain-Wall Memory (DWM), which can achieve ultra-high density, to replace SRAM for weight storage in SC-based DCNNs. We propose DW-CNN, the first comprehensive design optimization framework of DWM-based weight storage method. We derive the optimal memory type, precision, and organization, as well as whether to store binary or stochastic numbers. We present effective resource sharing scheme for DWM-based weight storage in the convolutional and fully-connected layers of SC-based DCNNs to achieve a desirable balance among area, power (energy) consumption, and application-level accuracy.
△ Less
Submitted 3 February, 2018;
originally announced February 2018.
-
Algorithm-Hardware Co-Optimization of the Memristor-Based Framework for Solving SOCP and Homogeneous QCQP Problems
Authors:
Ao Ren,
Sijia Liu,
Ruizhe Cai,
Wujie Wen,
Pramod K Varshney,
Yanzhi Wang
Abstract:
A memristor crossbar, which is constructed with memristor devices, has the unique ability to change and memorize the state of each of its memristor elements. It also has other highly desirable features such as high density, low power operation and excellent scalability. Hence the memristor crossbar technology can potentially be utilized for developing low-complexity and high-scalability solution f…
▽ More
A memristor crossbar, which is constructed with memristor devices, has the unique ability to change and memorize the state of each of its memristor elements. It also has other highly desirable features such as high density, low power operation and excellent scalability. Hence the memristor crossbar technology can potentially be utilized for developing low-complexity and high-scalability solution frameworks for solving a large class of convex optimization problems, which involve extensive matrix operations and have critical applications in multiple disciplines. This paper, as the first attempt towards this direction, proposes a novel memristor crossbar-based framework for solving two important convex optimization problems, i.e., second-order cone programming (SOCP) and homogeneous quadratically constrained quadratic programming (QCQP) problems. In this paper, the alternating direction method of multipliers (ADMM) is adopted. It splits the SOCP and homogeneous QCQP problems into sub-problems that involve the solution of linear systems, which could be effectively solved using the memristor crossbar in O(1) time complexity. The proposed algorithm is an iterative procedure that iterates a constant number of times. Therefore, algorithms to solve SOCP and homogeneous QCQP problems have pseudo-O(N) complexity, which is a significant reduction compared to the state-of-the-art software solvers (O(N^3.5) - O(N^4)).
△ Less
Submitted 2 February, 2018;
originally announced February 2018.
-
VIBNN: Hardware Acceleration of Bayesian Neural Networks
Authors:
Ruizhe Cai,
Ao Ren,
Ning Liu,
Caiwen Ding,
Luhao Wang,
Xuehai Qian,
Massoud Pedram,
Yanzhi Wang
Abstract:
Bayesian Neural Networks (BNNs) have been proposed to address the problem of model uncertainty in training and inference. By introducing weights associated with conditioned probability distributions, BNNs are capable of resolving the overfitting issue commonly seen in conventional neural networks and allow for small-data training, through the variational inference process. Frequent usage of Gaussi…
▽ More
Bayesian Neural Networks (BNNs) have been proposed to address the problem of model uncertainty in training and inference. By introducing weights associated with conditioned probability distributions, BNNs are capable of resolving the overfitting issue commonly seen in conventional neural networks and allow for small-data training, through the variational inference process. Frequent usage of Gaussian random variables in this process requires a properly optimized Gaussian Random Number Generator (GRNG). The high hardware cost of conventional GRNG makes the hardware implementation of BNNs challenging.
In this paper, we propose VIBNN, an FPGA-based hardware accelerator design for variational inference on BNNs. We explore the design space for massive amount of Gaussian variable sampling tasks in BNNs. Specifically, we introduce two high performance Gaussian (pseudo) random number generators: the RAM-based Linear Feedback Gaussian Random Number Generator (RLF-GRNG), which is inspired by the properties of binomial distribution and linear feedback logics; and the Bayesian Neural Network-oriented Wallace Gaussian Random Number Generator. To achieve high scalability and efficient memory access, we propose a deep pipelined accelerator architecture with fast execution and good hardware utilization. Experimental results demonstrate that the proposed VIBNN implementations on an FPGA can achieve throughput of 321,543.4 Images/s and energy efficiency upto 52,694.8 Images/J while maintaining similar accuracy as its software counterpart.
△ Less
Submitted 2 February, 2018;
originally announced February 2018.
-
Deep Reinforcement Learning: Framework, Applications, and Embedded Implementations
Authors:
Hongjia Li,
Tianshu Wei,
Ao Ren,
Qi Zhu,
Yanzhi Wang
Abstract:
The recent breakthroughs of deep reinforcement learning (DRL) technique in Alpha Go and playing Atari have set a good example in handling large state and actions spaces of complicated control problems. The DRL technique is comprised of (i) an offline deep neural network (DNN) construction phase, which derives the correlation between each state-action pair of the system and its value function, and…
▽ More
The recent breakthroughs of deep reinforcement learning (DRL) technique in Alpha Go and playing Atari have set a good example in handling large state and actions spaces of complicated control problems. The DRL technique is comprised of (i) an offline deep neural network (DNN) construction phase, which derives the correlation between each state-action pair of the system and its value function, and (ii) an online deep Q-learning phase, which adaptively derives the optimal action and updates value estimates. In this paper, we first present the general DRL framework, which can be widely utilized in many applications with different optimization objectives. This is followed by the introduction of three specific applications: the cloud computing resource allocation problem, the residential smart grid task scheduling problem, and building HVAC system optimal control problem. The effectiveness of the DRL technique in these three cyber-physical applications have been validated. Finally, this paper investigates the stochastic computing-based hardware implementations of the DRL framework, which consumes a significant improvement in area efficiency and power consumption compared with binary-based implementation counterparts.
△ Less
Submitted 10 October, 2017;
originally announced October 2017.
-
Hardware-Driven Nonlinear Activation for Stochastic Computing Based Deep Convolutional Neural Networks
Authors:
Ji Li,
Zihao Yuan,
Zhe Li,
Caiwen Ding,
Ao Ren,
Qinru Qiu,
Jeffrey Draper,
Yanzhi Wang
Abstract:
Recently, Deep Convolutional Neural Networks (DCNNs) have made unprecedented progress, achieving the accuracy close to, or even better than human-level perception in various tasks. There is a timely need to map the latest software DCNNs to application-specific hardware, in order to achieve orders of magnitude improvement in performance, energy efficiency and compactness. Stochastic Computing (SC),…
▽ More
Recently, Deep Convolutional Neural Networks (DCNNs) have made unprecedented progress, achieving the accuracy close to, or even better than human-level perception in various tasks. There is a timely need to map the latest software DCNNs to application-specific hardware, in order to achieve orders of magnitude improvement in performance, energy efficiency and compactness. Stochastic Computing (SC), as a low-cost alternative to the conventional binary computing paradigm, has the potential to enable massively parallel and highly scalable hardware implementation of DCNNs. One major challenge in SC based DCNNs is designing accurate nonlinear activation functions, which have a significant impact on the network-level accuracy but cannot be implemented accurately by existing SC computing blocks. In this paper, we design and optimize SC based neurons, and we propose highly accurate activation designs for the three most frequently used activation functions in software DCNNs, i.e, hyperbolic tangent, logistic, and rectified linear units. Experimental results on LeNet-5 using MNIST dataset demonstrate that compared with a binary ASIC hardware DCNN, the DCNN with the proposed SC neurons can achieve up to 61X, 151X, and 2X improvement in terms of area, power, and energy, respectively, at the cost of small precision degradation.In addition, the SC approach achieves up to 21X and 41X of the area, 41X and 72X of the power, and 198200X and 96443X of the energy, compared with CPU and GPU approaches, respectively, while the error is increased by less than 3.07%. ReLU activation is suggested for future SC based DCNNs considering its superior performance under a small bit stream length.
△ Less
Submitted 12 March, 2017;
originally announced March 2017.
-
SC-DCNN: Highly-Scalable Deep Convolutional Neural Network using Stochastic Computing
Authors:
Ao Ren,
Ji Li,
Zhe Li,
Caiwen Ding,
Xuehai Qian,
Qinru Qiu,
Bo Yuan,
Yanzhi Wang
Abstract:
With recent advancing of Internet of Things (IoTs), it becomes very attractive to implement the deep convolutional neural networks (DCNNs) onto embedded/portable systems. Presently, executing the software-based DCNNs requires high-performance server clusters in practice, restricting their widespread deployment on the mobile devices. To overcome this issue, considerable research efforts have been c…
▽ More
With recent advancing of Internet of Things (IoTs), it becomes very attractive to implement the deep convolutional neural networks (DCNNs) onto embedded/portable systems. Presently, executing the software-based DCNNs requires high-performance server clusters in practice, restricting their widespread deployment on the mobile devices. To overcome this issue, considerable research efforts have been conducted in the context of developing highly-parallel and specific DCNN hardware, utilizing GPGPUs, FPGAs, and ASICs. Stochastic Computing (SC), which uses bit-stream to represent a number within [-1, 1] by counting the number of ones in the bit-stream, has a high potential for implementing DCNNs with high scalability and ultra-low hardware footprint. Since multiplications and additions can be calculated using AND gates and multiplexers in SC, significant reductions in power/energy and hardware footprint can be achieved compared to the conventional binary arithmetic implementations. The tremendous savings in power (energy) and hardware resources bring about immense design space for enhancing scalability and robustness for hardware DCNNs. This paper presents the first comprehensive design and optimization framework of SC-based DCNNs (SC-DCNNs). We first present the optimal designs of function blocks that perform the basic operations, i.e., inner product, pooling, and activation function. Then we propose the optimal design of four types of combinations of basic function blocks, named feature extraction blocks, which are in charge of extracting features from input feature maps. Besides, weight storage methods are investigated to reduce the area and power/energy consumption for storing weights. Finally, the whole SC-DCNN implementation is optimized, with feature extraction blocks carefully selected, to minimize area and power/energy consumption while maintaining a high network accuracy level.
△ Less
Submitted 31 January, 2017; v1 submitted 17 November, 2016;
originally announced November 2016.
-
Toward Practical Differential Privacy in Smart Grid with Capacity-Limited Rechargeable Batteries
Authors:
Zijian Zhang,
Zhan Qin,
Liehuang Zhu,
Wei Jiang,
Chen Xu,
and Kui Ren
Abstract:
The technology of differential privacy, adding a noise drawn from the Laplace distribution, successfully overcomes a difficulty of keeping both the privacy of individual data and the utility of the statistical result simultaneously. Therefore, it is prevalent to use a rechargeable battery as the noise for achieving differential privacy in the application of smart grid. Unfortunately, to the best o…
▽ More
The technology of differential privacy, adding a noise drawn from the Laplace distribution, successfully overcomes a difficulty of keeping both the privacy of individual data and the utility of the statistical result simultaneously. Therefore, it is prevalent to use a rechargeable battery as the noise for achieving differential privacy in the application of smart grid. Unfortunately, to the best of our knowledge, we observe that the existing privacy protection mechanisms cannot satisfy differential privacy, when considering physical restrictions of the battery in practice. In this paper, we first classify two types of challenges caused by two physical restrictions, the maximum charging/discharging rate and the capacity of battery. We then propose a stateless privacy protection scheme by exploring a boundary-changeable distribution for noise, and prove this scheme satisfies differential privacy, with regard to the first type of challenge. We further explain the difficulty to achieve differential privacy under the second type of challenge, and formalize the definition of a relaxed differential privacy. Finally, we present a stateful privacy protection scheme that satisfies the relaxed differential privacy. Experimental analysis shows that the maximum privacy leakage of our privacy protection schemes at each time point stably outperforms that of the existing work.
△ Less
Submitted 22 July, 2015; v1 submitted 10 July, 2015;
originally announced July 2015.