Search | arXiv e-print repository

doi 10.1109/IROS55552.2023.10341472

Skill Generalization with Verbs

Authors: Rachel Ma, Lyndon Lam, Benjamin A. Spiegel, Aditya Ganeshan, Roma Patel, Ben Abbatematteo, David Paulius, Stefanie Tellex, George Konidaris

Abstract: It is imperative that robots can understand natural language commands issued by humans. Such commands typically contain verbs that signify what action should be performed on a given object and that are applicable to many objects. We propose a method for generalizing manipulation skills to novel objects using verbs. Our method learns a probabilistic classifier that determines whether a given object… ▽ More It is imperative that robots can understand natural language commands issued by humans. Such commands typically contain verbs that signify what action should be performed on a given object and that are applicable to many objects. We propose a method for generalizing manipulation skills to novel objects using verbs. Our method learns a probabilistic classifier that determines whether a given object trajectory can be described by a specific verb. We show that this classifier accurately generalizes to novel object categories with an average accuracy of 76.69% across 13 object categories and 14 verbs. We then perform policy search over the object kinematics to find an object trajectory that maximizes classifier prediction for a given verb. Our method allows a robot to generate a trajectory for a novel object based on a verb, which can then be used as input to a motion planner. We show that our model can generate trajectories that are usable for executing five verb commands applied to novel instances of two different object categories on a real robot. △ Less

Submitted 17 October, 2024; originally announced October 2024.

Comments: 7 pages + 2 pages (references), 6 figures. Accepted at IROS 2023. Code, dataset info and demo videos can be found at: https://rachelma80000.github.io/SkillGenVerbs/

arXiv:2409.12262 [pdf, other]

Bootstrapping Object-level Planning with Large Language Models

Authors: David Paulius, Alejandro Agostini, Benedict Quartey, George Konidaris

Abstract: We introduce a new method that extracts knowledge from a large language model (LLM) to produce object-level plans, which describe high-level changes to object state, and uses them to bootstrap task and motion planning (TAMP). Existing work uses LLMs to directly output task plans or generate goals in representations like PDDL. However, these methods fall short because they rely on the LLM to do the… ▽ More We introduce a new method that extracts knowledge from a large language model (LLM) to produce object-level plans, which describe high-level changes to object state, and uses them to bootstrap task and motion planning (TAMP). Existing work uses LLMs to directly output task plans or generate goals in representations like PDDL. However, these methods fall short because they rely on the LLM to do the actual planning or output a hard-to-satisfy goal. Our approach instead extracts knowledge from an LLM in the form of plan schemas as an object-level representation called functional object-oriented networks (FOON), from which we automatically generate PDDL subgoals. Our method markedly outperforms alternative planning strategies in completing several pick-and-place tasks in simulation. △ Less

Submitted 21 March, 2025; v1 submitted 18 September, 2024; originally announced September 2024.

Comments: Accepted to ICRA 2025; 11 pages (6 pages + 1 page references + 4 pages appendix); for demo videos, please see https://davidpaulius.github.io/olp_llm/

arXiv:2211.09935 [pdf, other]

CAPE: Corrective Actions from Precondition Errors using Large Language Models

Authors: Shreyas Sundara Raman, Vanya Cohen, Ifrah Idrees, Eric Rosen, Ray Mooney, Stefanie Tellex, David Paulius

Abstract: Extracting commonsense knowledge from a large language model (LLM) offers a path to designing intelligent robots. Existing approaches that leverage LLMs for planning are unable to recover when an action fails and often resort to retrying failed actions, without resolving the error's underlying cause. We propose a novel approach (CAPE) that attempts to propose corrective actions to resolve precondi… ▽ More Extracting commonsense knowledge from a large language model (LLM) offers a path to designing intelligent robots. Existing approaches that leverage LLMs for planning are unable to recover when an action fails and often resort to retrying failed actions, without resolving the error's underlying cause. We propose a novel approach (CAPE) that attempts to propose corrective actions to resolve precondition errors during planning. CAPE improves the quality of generated plans by leveraging few-shot reasoning from action preconditions. Our approach enables embodied agents to execute more tasks than baseline methods while ensuring semantic correctness and minimizing re-prompting. In VirtualHome, CAPE generates executable plans while improving a human-annotated plan correctness metric from 28.89% to 49.63% over SayCan. Our improvements transfer to a Boston Dynamics Spot robot initialized with a set of skills (specified in language) and associated preconditions, where CAPE improves the correctness metric of the executed task plans by 76.49% compared to SayCan. Our approach enables the robot to follow natural language commands and robustly recover from failures, which baseline approaches largely cannot resolve or address inefficiently. △ Less

Submitted 9 March, 2024; v1 submitted 17 November, 2022; originally announced November 2022.

Comments: 17 pages, 6 figures, accepted at ICRA 2024

MSC Class: 68T20; 68T50 ACM Class: I.2.7; I.2.8; I.2.2; I.2.4

arXiv:2207.05800 [pdf, other]

Long-Horizon Planning and Execution with Functional Object-Oriented Networks

Authors: David Paulius, Alejandro Agostini, Dongheui Lee

Abstract: Following work on joint object-action representations, functional object-oriented networks (FOON) were introduced as a knowledge graph representation for robots. A FOON contains symbolic concepts useful to a robot's understanding of tasks and its environment for object-level planning. Prior to this work, little has been done to show how plans acquired from FOON can be executed by a robot, as the c… ▽ More Following work on joint object-action representations, functional object-oriented networks (FOON) were introduced as a knowledge graph representation for robots. A FOON contains symbolic concepts useful to a robot's understanding of tasks and its environment for object-level planning. Prior to this work, little has been done to show how plans acquired from FOON can be executed by a robot, as the concepts in a FOON are too abstract for execution. We thereby introduce the idea of exploiting object-level knowledge as a FOON for task planning and execution. Our approach automatically transforms FOON into PDDL and leverages off-the-shelf planners, action contexts, and robot skills in a hierarchical planning pipeline to generate executable task plans. We demonstrate our entire approach on long-horizon tasks in CoppeliaSim and show how learned action contexts can be extended to never-before-seen scenarios. △ Less

Submitted 2 June, 2023; v1 submitted 12 July, 2022; originally announced July 2022.

Comments: To be published in RA-L, 8 pages, Joint First Authors (Alejandro and David). For project website, see https://davidpaulius.github.io/foon-lhpe

arXiv:2207.03693 [pdf, other]

Approximate Task Tree Retrieval in a Knowledge Network for Robotic Cooking

Authors: Md. Sadman Sakib, David Paulius, Yu Sun

Abstract: Flexible task planning continues to pose a difficult challenge for robots, where a robot is unable to creatively adapt their task plans to new or unseen problems, which is mainly due to the limited knowledge it has about its actions and world. Motivated by a human's ability to adapt, we explore how task plans from a knowledge graph, known as the Functional Object- Oriented Network (FOON), can be g… ▽ More Flexible task planning continues to pose a difficult challenge for robots, where a robot is unable to creatively adapt their task plans to new or unseen problems, which is mainly due to the limited knowledge it has about its actions and world. Motivated by a human's ability to adapt, we explore how task plans from a knowledge graph, known as the Functional Object- Oriented Network (FOON), can be generated for novel problems requiring concepts that are not readily available to the robot in its knowledge base. Knowledge from 140 cooking recipes are structured in a FOON knowledge graph, which is used for acquiring task plan sequences known as task trees. Task trees can be modified to replicate recipes in a FOON knowledge graph format, which can be useful for enriching FOON with new recipes containing unknown object and state combinations, by relying upon semantic similarity. We demonstrate the power of task tree generation to create task trees with never-before-seen ingredient and state combinations as seen in recipes from the Recipe1M+ dataset, with which we evaluate the quality of the trees based on how accurately they depict newly added ingredients. Our experimental results show that our system is able to provide task sequences with 76% correctness. △ Less

Submitted 8 July, 2022; originally announced July 2022.

arXiv:2204.02274 [pdf, other]

Grounding of the Functional Object-Oriented Network in Industrial Tasks

Authors: Rafik Ayari, Matteo Pantano, David Paulius

Abstract: In this preliminary work, we propose to design an activity recognition system that is suitable for Industrie 4.0 (I4.0) applications, especially focusing on Learning from Demonstration (LfD) in collaborative robot tasks. More precisely, we focus on the issue of data exchange between an activity recognition system and a collaborative robotic system. We propose an activity recognition system with li… ▽ More In this preliminary work, we propose to design an activity recognition system that is suitable for Industrie 4.0 (I4.0) applications, especially focusing on Learning from Demonstration (LfD) in collaborative robot tasks. More precisely, we focus on the issue of data exchange between an activity recognition system and a collaborative robotic system. We propose an activity recognition system with linked data using functional object-oriented network (FOON) to facilitate industrial use cases. Initially, we drafted a FOON for our use case. Afterwards, an action is estimated by using object and hand recognition systems coupled with a recurrent neural network, which refers to FOON objects and states. Finally, the detected action is shared via a context broker using an existing linked data model, thus enabling the robotic system to interpret the action and execute it afterwards. Our initial results show that FOON can be used for an industrial use case and that we can use existing linked data models in LfD applications. △ Less

Submitted 5 April, 2022; originally announced April 2022.

arXiv:2112.02433 [pdf, other]

Functional Task Tree Generation from a Knowledge Graph to Solve Unseen Problems

Authors: Md. Sadman Sakib, David Paulius, Yu Sun

Abstract: A major component for developing intelligent and autonomous robots is a suitable knowledge representation, from which a robot can acquire knowledge about its actions or world. However, unlike humans, robots cannot creatively adapt to novel scenarios, as their knowledge and environment are rigidly defined. To address the problem of producing novel and flexible task plans called task trees, we explo… ▽ More A major component for developing intelligent and autonomous robots is a suitable knowledge representation, from which a robot can acquire knowledge about its actions or world. However, unlike humans, robots cannot creatively adapt to novel scenarios, as their knowledge and environment are rigidly defined. To address the problem of producing novel and flexible task plans called task trees, we explore how we can derive plans with concepts not originally in the robot's knowledge base. Existing knowledge in the form of a knowledge graph is used as a base of reference to create task trees that are modified with new object or state combinations. To demonstrate the flexibility of our method, we randomly selected recipes from the Recipe1M+ dataset and generated their task trees. The task trees were then thoroughly checked with a visualization tool that portrays how each ingredient changes with each action to produce the desired meal. Our results indicate that the proposed method can produce task plans with high accuracy even for never-before-seen ingredient combinations. △ Less

Submitted 4 December, 2021; originally announced December 2021.

arXiv:2106.00728 [pdf, other]

Evaluating Recipes Generated from Functional Object-Oriented Network

Authors: Md Sadman Sakib, Hailey Baez, David Paulius, Yu Sun

Abstract: The functional object-oriented network (FOON) has been introduced as a knowledge representation, which takes the form of a graph, for symbolic task planning. To get a sequential plan for a manipulation task, a robot can obtain a task tree through a knowledge retrieval process from the FOON. To evaluate the quality of an acquired task tree, we compare it with a conventional form of task knowledge,… ▽ More The functional object-oriented network (FOON) has been introduced as a knowledge representation, which takes the form of a graph, for symbolic task planning. To get a sequential plan for a manipulation task, a robot can obtain a task tree through a knowledge retrieval process from the FOON. To evaluate the quality of an acquired task tree, we compare it with a conventional form of task knowledge, such as recipes or manuals. We first automatically convert task trees to recipes, and we then compare them with the human-created recipes in the Recipe1M+ dataset via a survey. Our preliminary study finds no significant difference between the recipes in Recipe1M+ and the recipes generated from FOON task trees in terms of correctness, completeness, and clarity. △ Less

Submitted 1 June, 2021; originally announced June 2021.

Comments: This manuscript has been accepted at Ubiquitous Robots 2021

arXiv:2106.00158 [pdf, other]

A Road-map to Robot Task Execution with the Functional Object-Oriented Network

Authors: David Paulius, Alejandro Agostini, Yu Sun, Dongheui Lee

Abstract: Following work on joint object-action representations, the functional object-oriented network (FOON) was introduced as a knowledge graph representation for robots. Taking the form of a bipartite graph, a FOON contains symbolic or high-level information that would be pertinent to a robot's understanding of its environment and tasks in a way that mirrors human understanding of actions. In this work,… ▽ More Following work on joint object-action representations, the functional object-oriented network (FOON) was introduced as a knowledge graph representation for robots. Taking the form of a bipartite graph, a FOON contains symbolic or high-level information that would be pertinent to a robot's understanding of its environment and tasks in a way that mirrors human understanding of actions. In this work, we outline a road-map for future development of FOON and its application in robotic systems for task planning as well as knowledge acquisition from demonstration. We propose preliminary ideas to show how a FOON can be created in a real-world scenario with a robot and human teacher in a way that can jointly augment existing knowledge in a FOON and teach a robot the skills it needs to replicate the demonstrated actions and solve a given manipulation problem. △ Less

Submitted 31 May, 2021; originally announced June 2021.

Comments: Ubiquitous Robots 2021 Submission -- 4 pages

arXiv:2012.05438 [pdf, other]

doi 10.1109/ICPR48806.2021.9413030

Developing Motion Code Embedding for Action Recognition in Videos

Authors: Maxat Alibayev, David Paulius, Yu Sun

Abstract: In this work, we propose a motion embedding strategy known as motion codes, which is a vectorized representation of motions based on a manipulation's salient mechanical attributes. These motion codes provide a robust motion representation, and they are obtained using a hierarchy of features called the motion taxonomy. We developed and trained a deep neural network model that combines visual and se… ▽ More In this work, we propose a motion embedding strategy known as motion codes, which is a vectorized representation of motions based on a manipulation's salient mechanical attributes. These motion codes provide a robust motion representation, and they are obtained using a hierarchy of features called the motion taxonomy. We developed and trained a deep neural network model that combines visual and semantic features to identify the features found in our motion taxonomy to embed or annotate videos with motion codes. To demonstrate the potential of motion codes as features for machine learning tasks, we integrated the extracted features from the motion embedding model into the current state-of-the-art action recognition model. The obtained model achieved higher accuracy than the baseline model for the verb classification task on egocentric videos from the EPIC-KITCHENS dataset. △ Less

Submitted 9 December, 2020; originally announced December 2020.

Comments: Accepted by 25th International Conference on Pattern Recognition (ICPR2020)

arXiv:2007.15841 [pdf, other]

doi 10.1109/IROS45743.2020.9341065

Estimating Motion Codes from Demonstration Videos

Authors: Maxat Alibayev, David Paulius, Yu Sun

Abstract: A motion taxonomy can encode manipulations as a binary-encoded representation, which we refer to as motion codes. These motion codes innately represent a manipulation action in an embedded space that describes the motion's mechanical features, including contact and trajectory type. The key advantage of using motion codes for embedding is that motions can be more appropriately defined with robotic-… ▽ More A motion taxonomy can encode manipulations as a binary-encoded representation, which we refer to as motion codes. These motion codes innately represent a manipulation action in an embedded space that describes the motion's mechanical features, including contact and trajectory type. The key advantage of using motion codes for embedding is that motions can be more appropriately defined with robotic-relevant features, and their distances can be more reasonably measured using these motion features. In this paper, we develop a deep learning pipeline to extract motion codes from demonstration videos in an unsupervised manner so that knowledge from these videos can be properly represented and used for robots. Our evaluations show that motion codes can be extracted from demonstrations of action in the EPIC-KITCHENS dataset. △ Less

Submitted 31 July, 2020; originally announced July 2020.

Comments: IROS 2020 Submission -- 6 pages; initial upload (Last updated July 31st 2020)

arXiv:2007.06695 [pdf, other]

doi 10.15607/RSS.2020.XVI.045

A Motion Taxonomy for Manipulation Embedding

Authors: David Paulius, Nicholas Eales, Yu Sun

Abstract: To represent motions from a mechanical point of view, this paper explores motion embedding using the motion taxonomy. With this taxonomy, manipulations can be described and represented as binary strings called motion codes. Motion codes capture mechanical properties, such as contact type and trajectory, that should be used to define suitable distance metrics between motions or loss functions for d… ▽ More To represent motions from a mechanical point of view, this paper explores motion embedding using the motion taxonomy. With this taxonomy, manipulations can be described and represented as binary strings called motion codes. Motion codes capture mechanical properties, such as contact type and trajectory, that should be used to define suitable distance metrics between motions or loss functions for deep learning and reinforcement learning. Motion codes can also be used to consolidate aliases or cluster motion types that share similar properties. Using existing data sets as a reference, we discuss how motion codes can be created and assigned to actions that are commonly seen in activities of daily living based on intuition as well as real data. Motion codes are compared to vectors from pre-trained Word2Vec models, and we show that motion codes maintain distances that closely match the reality of manipulation. △ Less

Submitted 13 July, 2020; originally announced July 2020.

Comments: RSS 2020 Submission -- Corrected Several Errors in Paper (last updated July 13th, 2020)

Journal ref: Proceedings of Robotics: Science and Systems 2020

arXiv:1910.00532 [pdf, other]

doi 10.1109/IROS40897.2019.8967754

Manipulation Motion Taxonomy and Coding for Robots

Authors: David Paulius, Yongqiang Huang, Jason Meloncon, Yu Sun

Abstract: This paper introduces a taxonomy of manipulations as seen especially in cooking for 1) grouping manipulations from the robotics point of view, 2) consolidating aliases and removing ambiguity for motion types, and 3) provide a path to transferring learned manipulations to new unlearned manipulations. Using instructional videos as a reference, we selected a list of common manipulation motions seen i… ▽ More This paper introduces a taxonomy of manipulations as seen especially in cooking for 1) grouping manipulations from the robotics point of view, 2) consolidating aliases and removing ambiguity for motion types, and 3) provide a path to transferring learned manipulations to new unlearned manipulations. Using instructional videos as a reference, we selected a list of common manipulation motions seen in cooking activities grouped into similar motions based on several trajectory and contact attributes. Manipulation codes are then developed based on the taxonomy attributes to represent the manipulation motions. The manipulation taxonomy is then used for comparing motion data in the Daily Interactive Manipulation (DIM) data set to reveal their motion similarities. △ Less

Submitted 31 July, 2020; v1 submitted 1 October, 2019; originally announced October 2019.

Comments: IROS 2019 Submission -- 6 pages

arXiv:1905.00502 [pdf, other]

Task Planning with a Weighted Functional Object-Oriented Network

Authors: David Paulius, Kelvin Sheng Pei Dong, Yu Sun

Abstract: In reality, there is still much to be done for robots to be able to perform manipulation actions with full autonomy. Complicated manipulation tasks, such as cooking, may still require a person to perform some actions that are very risky for a robot to perform. On the other hand, some other actions may be very risky for a human with physical disabilities to perform. Therefore, it is necessary to ba… ▽ More In reality, there is still much to be done for robots to be able to perform manipulation actions with full autonomy. Complicated manipulation tasks, such as cooking, may still require a person to perform some actions that are very risky for a robot to perform. On the other hand, some other actions may be very risky for a human with physical disabilities to perform. Therefore, it is necessary to balance the workload of a robot and a human based on their limitations while minimizing the effort needed from a human in a collaborative robot (cobot) set-up. This paper proposes a new version of our functional object-oriented network (FOON) that integrates weights in its functional units to reflect a robot's chance of successfully executing an action of that functional unit. The paper also presents a task planning algorithm for the weighted FOON to allocate manipulation action load to the robot and human to achieve optimal performance while minimizing human effort. Through a number of experiments, this paper shows several successful cases in which using the proposed weighted FOON and the task planning algorithm allow a robot and a human to successfully complete complicated tasks together with higher success rates than a robot doing them alone. △ Less

Submitted 25 March, 2021; v1 submitted 1 May, 2019; originally announced May 2019.

Comments: ICRA 2021 Submission -- 7 Pages, Accepted to Conference

arXiv:1902.01537 [pdf, other]

doi 10.1109/IROS.2016.7759413

Functional Object-Oriented Network for Manipulation Learning

Authors: David Paulius, Yongqiang Huang, Roger Milton, William D. Buchanan, Jeanine Sam, Yu Sun

Abstract: This paper presents a novel structured knowledge representation called the functional object-oriented network (FOON) to model the connectivity of the functional-related objects and their motions in manipulation tasks. The graphical model FOON is learned by observing object state change and human manipulations with the objects. Using a well-trained FOON, robots can decipher a task goal, seek the co… ▽ More This paper presents a novel structured knowledge representation called the functional object-oriented network (FOON) to model the connectivity of the functional-related objects and their motions in manipulation tasks. The graphical model FOON is learned by observing object state change and human manipulations with the objects. Using a well-trained FOON, robots can decipher a task goal, seek the correct objects at the desired states on which to operate, and generate a sequence of proper manipulation motions. The paper describes FOON's structure and an approach to form a universal FOON with extracted knowledge from online instructional videos. A graph retrieval approach is presented to generate manipulation motion sequences from the FOON to achieve a desired goal, demonstrating the flexibility of FOON in creating a novel and adaptive means of solving a problem using knowledge gathered from multiple sources. The results are demonstrated in a simulated environment to illustrate the motion sequences generated from the FOON to carry out the desired tasks. △ Less

Submitted 28 November, 2020; v1 submitted 4 February, 2019; originally announced February 2019.

Comments: IROS 2016 Submission -- Corrected several errors from the published version (last updated November 28th, 2020)

Journal ref: 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pgs. 2655-2662

arXiv:1807.02192 [pdf, other]

doi 10.1016/j.robot.2019.03.005

A Survey of Knowledge Representation in Service Robotics

Authors: David Paulius, Yu Sun

Abstract: Within the realm of service robotics, researchers have placed a great amount of effort into learning, understanding, and representing motions as manipulations for task execution by robots. The task of robot learning and problem-solving is very broad, as it integrates a variety of tasks such as object detection, activity recognition, task/motion planning, localization, knowledge representation and… ▽ More Within the realm of service robotics, researchers have placed a great amount of effort into learning, understanding, and representing motions as manipulations for task execution by robots. The task of robot learning and problem-solving is very broad, as it integrates a variety of tasks such as object detection, activity recognition, task/motion planning, localization, knowledge representation and retrieval, and the intertwining of perception/vision and machine learning techniques. In this paper, we solely focus on knowledge representations and notably how knowledge is typically gathered, represented, and reproduced to solve problems as done by researchers in the past decades. In accordance with the definition of knowledge representations, we discuss the key distinction between such representations and useful learning models that have extensively been introduced and studied in recent years, such as machine learning, deep learning, probabilistic modelling, and semantic graphical structures. Along with an overview of such tools, we discuss the problems which have existed in robot learning and how they have been built and used as solutions, technologies or developments (if any) which have contributed to solving them. Finally, we discuss key principles that should be considered when designing an effective knowledge representation. △ Less

Submitted 21 June, 2023; v1 submitted 5 July, 2018; originally announced July 2018.

Comments: Featured in Special Issue on Semantic Policy and Action Representations for Autonomous Robots, 22 Pages, Elsevier Format

Journal ref: Robotics and Autonomous Systems 118 (2019) 13-30

arXiv:1807.02189 [pdf, other]

doi 10.1109/ICRA.2018.8460200

Functional Object-Oriented Network: Construction & Expansion

Authors: David Paulius, Ahmad Babaeian Jelodar, Yu Sun

Abstract: We build upon the functional object-oriented network (FOON), a structured knowledge representation which is constructed from observations of human activities and manipulations. A FOON can be used for representing object-motion affordances. Knowledge retrieval through graph search allows us to obtain novel manipulation sequences using knowledge spanning across many video sources, hence the novelty… ▽ More We build upon the functional object-oriented network (FOON), a structured knowledge representation which is constructed from observations of human activities and manipulations. A FOON can be used for representing object-motion affordances. Knowledge retrieval through graph search allows us to obtain novel manipulation sequences using knowledge spanning across many video sources, hence the novelty in our approach. However, we are limited to the sources collected. To further improve the performance of knowledge retrieval as a follow up to our previous work, we discuss generalizing knowledge to be applied to objects which are similar to what we have in FOON without manually annotating new sources of knowledge. We discuss two means of generalization: 1) expanding our network through the use of object similarity to create new functional units from those we already have, and 2) compressing the functional units by object categories rather than specific objects. We discuss experiments which compare the performance of our knowledge retrieval algorithm with both expansion and compression by categories. △ Less

Submitted 31 July, 2020; v1 submitted 5 July, 2018; originally announced July 2018.

Comments: 7 pages, 3 figures, presented at ICRA 2018

Journal ref: ICRA 2018 Submission -- 7 pages

arXiv:1807.00983 [pdf, other]

Long Activity Video Understanding using Functional Object-Oriented Network

Authors: Ahmad Babaeian Jelodar, David Paulius, Yu Sun

Abstract: Video understanding is one of the most challenging topics in computer vision. In this paper, a four-stage video understanding pipeline is presented to simultaneously recognize all atomic actions and the single on-going activity in a video. This pipeline uses objects and motions from the video and a graph-based knowledge representation network as prior reference. Two deep networks are trained to id… ▽ More Video understanding is one of the most challenging topics in computer vision. In this paper, a four-stage video understanding pipeline is presented to simultaneously recognize all atomic actions and the single on-going activity in a video. This pipeline uses objects and motions from the video and a graph-based knowledge representation network as prior reference. Two deep networks are trained to identify objects and motions in each video sequence associated with an action. Low Level image features are then used to identify objects of interest in that video sequence. Confidence scores are assigned to objects of interest based on their involvement in the action and to motion classes based on results from a deep neural network that classifies the on-going action in video into motion classes. Confidence scores are computed for each candidate functional unit associated with an action using a knowledge representation network, object confidences, and motion confidences. Each action is therefore associated with a functional unit and the sequence of actions is further evaluated to identify the single on-going activity in the video. The knowledge representation used in the pipeline is called the functional object-oriented network which is a graph-based network useful for encoding knowledge about manipulation tasks. Experiments are performed on a dataset of cooking videos to test the proposed algorithm with action inference and activity classification. Experiments show that using functional object oriented network improves video understanding significantly. △ Less

Submitted 3 July, 2018; originally announced July 2018.

Comments: 12 pages, 12 figures

Showing 1–18 of 18 results for author: Paulius, D