Search | arXiv e-print repository

InteractAvatar: Modeling Hand-Face Interaction in Photorealistic Avatars with Deformable Gaussians

Authors: Kefan Chen, Sergiu Oprea, Justin Theiss, Sreyas Mohan, Srinath Sridhar, Aayush Prakash

Abstract: With the rising interest from the community in digital avatars coupled with the importance of expressions and gestures in communication, modeling natural avatar behavior remains an important challenge across many industries such as teleconferencing, gaming, and AR/VR. Human hands are the primary tool for interacting with the environment and essential for realistic human behavior modeling, yet exis… ▽ More With the rising interest from the community in digital avatars coupled with the importance of expressions and gestures in communication, modeling natural avatar behavior remains an important challenge across many industries such as teleconferencing, gaming, and AR/VR. Human hands are the primary tool for interacting with the environment and essential for realistic human behavior modeling, yet existing 3D hand and head avatar models often overlook the crucial aspect of hand-body interactions, such as between hand and face. We present InteracttAvatar, the first model to faithfully capture the photorealistic appearance of dynamic hand and non-rigid hand-face interactions. Our novel Dynamic Gaussian Hand model, combining template model and 3D Gaussian Splatting as well as a dynamic refinement module, captures pose-dependent change, e.g. the fine wrinkles and complex shadows that occur during articulation. Importantly, our hand-face interaction module models the subtle geometry and appearance dynamics that underlie common gestures. Through experiments of novel view synthesis, self reenactment and cross-identity reenactment, we demonstrate that InteracttAvatar can reconstruct hand and hand-face interactions from monocular or multiview videos with high-fidelity details and be animated with novel poses. △ Less

Submitted 10 April, 2025; originally announced April 2025.

arXiv:2503.18155 [pdf, other]

Decorum: A Language-Based Approach For Style-Conditioned Synthesis of Indoor 3D Scenes

Authors: Kelly O. Marshall, Omid Poursaeed, Sergiu Oprea, Amit Kumar, Anushrut Jignasu, Chinmay Hegde, Yilei Li, Rakesh Ranjan

Abstract: 3D indoor scene generation is an important problem for the design of digital and real-world environments. To automate this process, a scene generation model should be able to not only generate plausible scene layouts, but also take into consideration visual features and style preferences. Existing methods for this task exhibit very limited control over these attributes, only allowing text inputs i… ▽ More 3D indoor scene generation is an important problem for the design of digital and real-world environments. To automate this process, a scene generation model should be able to not only generate plausible scene layouts, but also take into consideration visual features and style preferences. Existing methods for this task exhibit very limited control over these attributes, only allowing text inputs in the form of simple object-level descriptions or pairwise spatial relationships. Our proposed method Decorum enables users to control the scene generation process with natural language by adopting language-based representations at each stage. This enables us to harness recent advancements in Large Language Models (LLMs) to model language-to-language mappings. In addition, we show that using a text-based representation allows us to select furniture for our scenes using a novel object retrieval method based on multimodal LLMs. Evaluations on the benchmark 3D-FRONT dataset show that our methods achieve improvements over existing work in text-conditioned scene synthesis and object retrieval. △ Less

Submitted 25 March, 2025; v1 submitted 23 March, 2025; originally announced March 2025.

arXiv:2407.02987 [pdf, other]

LoRA-Guard: Parameter-Efficient Guardrail Adaptation for Content Moderation of Large Language Models

Authors: Hayder Elesedy, Pedro M. Esperança, Silviu Vlad Oprea, Mete Ozay

Abstract: Guardrails have emerged as an alternative to safety alignment for content moderation of large language models (LLMs). Existing model-based guardrails have not been designed for resource-constrained computational portable devices, such as mobile phones, more and more of which are running LLM-based applications locally. We introduce LoRA-Guard, a parameter-efficient guardrail adaptation method that… ▽ More Guardrails have emerged as an alternative to safety alignment for content moderation of large language models (LLMs). Existing model-based guardrails have not been designed for resource-constrained computational portable devices, such as mobile phones, more and more of which are running LLM-based applications locally. We introduce LoRA-Guard, a parameter-efficient guardrail adaptation method that relies on knowledge sharing between LLMs and guardrail models. LoRA-Guard extracts language features from the LLMs and adapts them for the content moderation task using low-rank adapters, while a dual-path design prevents any performance degradation on the generative task. We show that LoRA-Guard outperforms existing approaches with 100-1000x lower parameter overhead while maintaining accuracy, enabling on-device content moderation. △ Less

Submitted 18 December, 2024; v1 submitted 3 July, 2024; originally announced July 2024.

arXiv:2104.11776 [pdf, other]

UnrealROX+: An Improved Tool for Acquiring Synthetic Data from Virtual 3D Environments

Authors: Pablo Martinez-Gonzalez, Sergiu Oprea, John Alejandro Castro-Vargas, Alberto Garcia-Garcia, Sergio Orts-Escolano, Jose Garcia-Rodriguez, Markus Vincze

Abstract: Synthetic data generation has become essential in last years for feeding data-driven algorithms, which surpassed traditional techniques performance in almost every computer vision problem. Gathering and labelling the amount of data needed for these data-hungry models in the real world may become unfeasible and error-prone, while synthetic data give us the possibility of generating huge amounts of… ▽ More Synthetic data generation has become essential in last years for feeding data-driven algorithms, which surpassed traditional techniques performance in almost every computer vision problem. Gathering and labelling the amount of data needed for these data-hungry models in the real world may become unfeasible and error-prone, while synthetic data give us the possibility of generating huge amounts of data with pixel-perfect annotations. However, most synthetic datasets lack from enough realism in their rendered images. In that context UnrealROX generation tool was presented in 2019, allowing to generate highly realistic data, at high resolutions and framerates, with an efficient pipeline based on Unreal Engine, a cutting-edge videogame engine. UnrealROX enabled robotic vision researchers to generate realistic and visually plausible data with full ground truth for a wide variety of problems such as class and instance semantic segmentation, object detection, depth estimation, visual grasping, and navigation. Nevertheless, its workflow was very tied to generate image sequences from a robotic on-board camera, making hard to generate data for other purposes. In this work, we present UnrealROX+, an improved version of UnrealROX where its decoupled and easy-to-use data acquisition system allows to quickly design and generate data in a much more flexible and customizable way. Moreover, it is packaged as an Unreal plug-in, which makes it more comfortable to use with already existing Unreal projects, and it also includes new features such as generating albedo or a Python API for interacting with the virtual environment from Deep Learning frameworks. △ Less

Submitted 23 April, 2021; originally announced April 2021.

Comments: Accepted at International Joint Conference on Neural Networks (IJCNN) 2021

arXiv:2103.15017 [pdf, other]

H-GAN: the power of GANs in your Hands

Authors: Sergiu Oprea, Giorgos Karvounas, Pablo Martinez-Gonzalez, Nikolaos Kyriazis, Sergio Orts-Escolano, Iason Oikonomidis, Alberto Garcia-Garcia, Aggeliki Tsoli, Jose Garcia-Rodriguez, Antonis Argyros

Abstract: We present HandGAN (H-GAN), a cycle-consistent adversarial learning approach implementing multi-scale perceptual discriminators. It is designed to translate synthetic images of hands to the real domain. Synthetic hands provide complete ground-truth annotations, yet they are not representative of the target distribution of real-world data. We strive to provide the perfect blend of a realistic hand… ▽ More We present HandGAN (H-GAN), a cycle-consistent adversarial learning approach implementing multi-scale perceptual discriminators. It is designed to translate synthetic images of hands to the real domain. Synthetic hands provide complete ground-truth annotations, yet they are not representative of the target distribution of real-world data. We strive to provide the perfect blend of a realistic hand appearance with synthetic annotations. Relying on image-to-image translation, we improve the appearance of synthetic hands to approximate the statistical distribution underlying a collection of real images of hands. H-GAN tackles not only the cross-domain tone mapping but also structural differences in localized areas such as shading discontinuities. Results are evaluated on a qualitative and quantitative basis improving previous works. Furthermore, we relied on the hand classification task to claim our generated hands are statistically similar to the real domain of hands. △ Less

Submitted 21 April, 2021; v1 submitted 27 March, 2021; originally announced March 2021.

Comments: Paper accepted at The International Joint Conference on Neural Networks (IJCNN) 2021

arXiv:2011.13200 [pdf, other]

Unsupervised Word Translation Pairing using Refinement based Point Set Registration

Authors: Silviu Oprea, Sourav Dutta, Haytham Assem

Abstract: Cross-lingual alignment of word embeddings play an important role in knowledge transfer across languages, for improving machine translation and other multi-lingual applications. Current unsupervised approaches rely on similarities in geometric structure of word embedding spaces across languages, to learn structure-preserving linear transformations using adversarial networks and refinement strategi… ▽ More Cross-lingual alignment of word embeddings play an important role in knowledge transfer across languages, for improving machine translation and other multi-lingual applications. Current unsupervised approaches rely on similarities in geometric structure of word embedding spaces across languages, to learn structure-preserving linear transformations using adversarial networks and refinement strategies. However, such techniques, in practice, tend to suffer from instability and convergence issues, requiring tedious fine-tuning for precise parameter setting. This paper proposes BioSpere, a novel framework for unsupervised mapping of bi-lingual word embeddings onto a shared vector space, by combining adversarial initialization and refinement procedure with point set registration algorithm used in image processing. We show that our framework alleviates the shortcomings of existing methodologies, and is relatively invariant to variable adversarial learning performance, depicting robustness in terms of parameter choices and training losses. Experimental evaluation on parallel dictionary induction task demonstrates state-of-the-art results for our framework on diverse language pairs. △ Less

Submitted 26 November, 2020; originally announced November 2020.

arXiv:2004.05214 [pdf, other]

doi 10.1109/TPAMI.2020.3045007

A Review on Deep Learning Techniques for Video Prediction

Authors: Sergiu Oprea, Pablo Martinez-Gonzalez, Alberto Garcia-Garcia, John Alejandro Castro-Vargas, Sergio Orts-Escolano, Jose Garcia-Rodriguez, Antonis Argyros

Abstract: The ability to predict, anticipate and reason about future outcomes is a key component of intelligent decision-making systems. In light of the success of deep learning in computer vision, deep-learning-based video prediction emerged as a promising research direction. Defined as a self-supervised learning task, video prediction represents a suitable framework for representation learning, as it demo… ▽ More The ability to predict, anticipate and reason about future outcomes is a key component of intelligent decision-making systems. In light of the success of deep learning in computer vision, deep-learning-based video prediction emerged as a promising research direction. Defined as a self-supervised learning task, video prediction represents a suitable framework for representation learning, as it demonstrated potential capabilities for extracting meaningful representations of the underlying patterns in natural videos. Motivated by the increasing interest in this task, we provide a review on the deep learning methods for prediction in video sequences. We firstly define the video prediction fundamentals, as well as mandatory background concepts and the most used datasets. Next, we carefully analyze existing video prediction models organized according to a proposed taxonomy, highlighting their contributions and their significance in the field. The summary of the datasets and methods is accompanied with experimental results that facilitate the assessment of the state of the art on a quantitative basis. The paper is summarized by drawing some general conclusions, identifying open research challenges and by pointing out future research directions. △ Less

Submitted 14 April, 2020; v1 submitted 10 April, 2020; originally announced April 2020.

Comments: Submitted to TPAMI

arXiv:2004.04945 [pdf, ps, other]

The Effect of Sociocultural Variables on Sarcasm Communication Online

Authors: Silviu Vlad Oprea, Walid Magdy

Abstract: Online social networks (OSN) play an essential role for connecting people and allowing them to communicate online. OSN users share their thoughts, moments, and news with their network. The messages they share online can include sarcastic posts, where the intended meaning expressed by the written text is different from the literal one. This could result in miscommunication. Previous research in psy… ▽ More Online social networks (OSN) play an essential role for connecting people and allowing them to communicate online. OSN users share their thoughts, moments, and news with their network. The messages they share online can include sarcastic posts, where the intended meaning expressed by the written text is different from the literal one. This could result in miscommunication. Previous research in psycholinguistics has studied the sociocultural factors the might lead to sarcasm misunderstanding between speakers and listeners. However, there is a lack of such studies in the context of OSN. In this paper we fill this gap by performing a quantitative analysis on the influence of sociocultural variables, including gender, age, country, and English language nativeness, on the effectiveness of sarcastic communication online. We collect examples of sarcastic tweets directly from the authors who posted them. Further, we ask third-party annotators of different sociocultural backgrounds to label these tweets for sarcasm. Our analysis indicates that age, English language nativeness, and country are significantly influential and should be considered in the design of future social analysis tools that either study sarcasm directly, or look at related phenomena where sarcasm may have an influence. We also make observations about the social ecology surrounding sarcastic exchanges on OSNs. We conclude by suggesting ways in which our findings can be included in future work. △ Less

Submitted 10 April, 2020; originally announced April 2020.

Comments: Accepted as a full paper at CSCW 2020. Please cite the CSCW version

arXiv:2003.03167 [pdf, other]

doi 10.3390/app10217524

When Deep Learning Meets Data Alignment: A Review on Deep Registration Networks (DRNs)

Authors: Victor Villena-Martinez, Sergiu Oprea, Marcelo Saval-Calvo, Jorge Azorin-Lopez, Andres Fuster-Guillo, Robert B. Fisher

Abstract: Registration is the process that computes the transformation that aligns sets of data. Commonly, a registration process can be divided into four main steps: target selection, feature extraction, feature matching, and transform computation for the alignment. The accuracy of the result depends on multiple factors, the most significant are the quantity of input data, the presence of noise, outliers a… ▽ More Registration is the process that computes the transformation that aligns sets of data. Commonly, a registration process can be divided into four main steps: target selection, feature extraction, feature matching, and transform computation for the alignment. The accuracy of the result depends on multiple factors, the most significant are the quantity of input data, the presence of noise, outliers and occlusions, the quality of the extracted features, real-time requirements and the type of transformation, especially those ones defined by multiple parameters, like non-rigid deformations. Recent advancements in machine learning could be a turning point in these issues, particularly with the development of deep learning (DL) techniques, which are helping to improve multiple computer vision problems through an abstract understanding of the input data. In this paper, a review of deep learning-based registration methods is presented. We classify the different papers proposing a framework extracted from the traditional registration pipeline to analyse the new learning-based proposal strengths. Deep Registration Networks (DRNs) try to solve the alignment task either replacing part of the traditional pipeline with a network or fully solving the registration problem. The main conclusions extracted are, on the one hand, 1) learning-based registration techniques cannot always be clearly classified in the traditional pipeline. 2) These approaches allow more complex inputs like conceptual models as well as the traditional 3D datasets. 3) In spite of the generality of learning, the current proposals are still ad hoc solutions. Finally, 4) this is a young topic that still requires a large effort to reach general solutions able to cope with the problems that affect traditional approaches. △ Less

Submitted 28 October, 2020; v1 submitted 6 March, 2020; originally announced March 2020.

Comments: Published in Applied Sciences

Journal ref: Appl. Sci. 2020, 10(21), 7524

arXiv:1911.03123 [pdf, other]

iSarcasm: A Dataset of Intended Sarcasm

Authors: Silviu Oprea, Walid Magdy

Abstract: We consider the distinction between intended and perceived sarcasm in the context of textual sarcasm detection. The former occurs when an utterance is sarcastic from the perspective of its author, while the latter occurs when the utterance is interpreted as sarcastic by the audience. We show the limitations of previous labelling methods in capturing intended sarcasm and introduce the iSarcasm data… ▽ More We consider the distinction between intended and perceived sarcasm in the context of textual sarcasm detection. The former occurs when an utterance is sarcastic from the perspective of its author, while the latter occurs when the utterance is interpreted as sarcastic by the audience. We show the limitations of previous labelling methods in capturing intended sarcasm and introduce the iSarcasm dataset of tweets labeled for sarcasm directly by their authors. Examining the state-of-the-art sarcasm detection models on our dataset showed low performance compared to previously studied datasets, which indicates that these datasets might be biased or obvious and sarcasm could be a phenomenon under-studied computationally thus far. By providing the iSarcasm dataset, we aim to encourage future NLP research to develop methods for detecting sarcasm in text as intended by the authors of the text, not as labeled under assumptions that we demonstrate to be sub-optimal. △ Less

Submitted 1 May, 2020; v1 submitted 8 November, 2019; originally announced November 2019.

Comments: 9 pages

arXiv:1910.11932 [pdf, other]

doi 10.18653/v1/P19-1275

Exploring Author Context for Detecting Intended vs Perceived Sarcasm

Authors: Silviu Oprea, Walid Magdy

Abstract: We investigate the impact of using author context on textual sarcasm detection. We define author context as the embedded representation of their historical posts on Twitter and suggest neural models that extract these representations. We experiment with two tweet datasets, one labelled manually for sarcasm, and the other via tag-based distant supervision. We achieve state-of-the-art performance on… ▽ More We investigate the impact of using author context on textual sarcasm detection. We define author context as the embedded representation of their historical posts on Twitter and suggest neural models that extract these representations. We experiment with two tweet datasets, one labelled manually for sarcasm, and the other via tag-based distant supervision. We achieve state-of-the-art performance on the second dataset, but not on the one labelled manually, indicating a difference between intended sarcasm, captured by distant supervision, and perceived sarcasm, captured by manual labelling. △ Less

Submitted 25 October, 2019; originally announced October 2019.

Comments: 6 pages, 1 figure, ACL 2020

Journal ref: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, 2019, pages 2854-2859

arXiv:1910.03019 [pdf, other]

Flood Detection On Low Cost Orbital Hardware

Authors: Gonzalo Mateo-Garcia, Silviu Oprea, Lewis Smith, Josh Veitch-Michaelis, Guy Schumann, Yarin Gal, Atılım Güneş Baydin, Dietmar Backes

Abstract: Satellite imaging is a critical technology for monitoring and responding to natural disasters such as flooding. Despite the capabilities of modern satellites, there is still much to be desired from the perspective of first response organisations like UNICEF. Two main challenges are rapid access to data, and the ability to automatically identify flooded regions in images. We describe a prototypical… ▽ More Satellite imaging is a critical technology for monitoring and responding to natural disasters such as flooding. Despite the capabilities of modern satellites, there is still much to be desired from the perspective of first response organisations like UNICEF. Two main challenges are rapid access to data, and the ability to automatically identify flooded regions in images. We describe a prototypical flood segmentation system, identifying cloud, water and land, that could be deployed on a constellation of small satellites, performing processing on board to reduce downlink bandwidth by 2 orders of magnitude. We target PhiSat-1, part of the FSSCAT mission, which is planned to be launched by the European Space Agency (ESA) near the start of 2020 as a proof of concept for this new technology. △ Less

Submitted 15 January, 2020; v1 submitted 4 October, 2019; originally announced October 2019.

Journal ref: Artificial Intelligence for Humanitarian Assistance and Disaster Response Workshop, 33rd Conference on Neural Information Processing Systems (NeurIPS 2019), Vancouver, Canada

arXiv:1903.05238 [pdf, other]

doi 10.1016/j.cag.2019.07.003

A Visually Plausible Grasping System for Object Manipulation and Interaction in Virtual Reality Environments

Authors: Sergiu Oprea, Pablo Martinez-Gonzalez, Alberto Garcia-Garcia, John Alejandro Castro-Vargas, Sergio Orts-Escolano, Jose Garcia-Rodriguez

Abstract: Interaction in virtual reality (VR) environments is essential to achieve a pleasant and immersive experience. Most of the currently existing VR applications, lack of robust object grasping and manipulation, which are the cornerstone of interactive systems. Therefore, we propose a realistic, flexible and robust grasping system that enables rich and real-time interactions in virtual environments. It… ▽ More Interaction in virtual reality (VR) environments is essential to achieve a pleasant and immersive experience. Most of the currently existing VR applications, lack of robust object grasping and manipulation, which are the cornerstone of interactive systems. Therefore, we propose a realistic, flexible and robust grasping system that enables rich and real-time interactions in virtual environments. It is visually realistic because it is completely user-controlled, flexible because it can be used for different hand configurations, and robust because it allows the manipulation of objects regardless their geometry, i.e. hand is automatically fitted to the object shape. In order to validate our proposal, an exhaustive qualitative and quantitative performance analysis has been carried out. On the one hand, qualitative evaluation was used in the assessment of the abstract aspects such as: hand movement realism, interaction realism and motor control. On the other hand, for the quantitative evaluation a novel error metric has been proposed to visually analyze the performed grips. This metric is based on the computation of the distance from the finger phalanges to the nearest contact point on the object surface. These contact points can be used with different application purposes, mainly in the field of robotics. As a conclusion, system evaluation reports a similar performance between users with previous experience in virtual reality applications and inexperienced users, referring to a steep learning curve. △ Less

Submitted 12 March, 2019; originally announced March 2019.

arXiv:1901.06514 [pdf, other]

The RobotriX: An eXtremely Photorealistic and Very-Large-Scale Indoor Dataset of Sequences with Robot Trajectories and Interactions

Authors: Alberto Garcia-Garcia, Pablo Martinez-Gonzalez, Sergiu Oprea, John Alejandro Castro-Vargas, Sergio Orts-Escolano, Jose Garcia-Rodriguez, Alvaro Jover-Alvarez

Abstract: Enter the RobotriX, an extremely photorealistic indoor dataset designed to enable the application of deep learning techniques to a wide variety of robotic vision problems. The RobotriX consists of hyperrealistic indoor scenes which are explored by robot agents which also interact with objects in a visually realistic manner in that simulated world. Photorealistic scenes and robots are rendered by U… ▽ More Enter the RobotriX, an extremely photorealistic indoor dataset designed to enable the application of deep learning techniques to a wide variety of robotic vision problems. The RobotriX consists of hyperrealistic indoor scenes which are explored by robot agents which also interact with objects in a visually realistic manner in that simulated world. Photorealistic scenes and robots are rendered by Unreal Engine into a virtual reality headset which captures gaze so that a human operator can move the robot and use controllers for the robotic hands; scene information is dumped on a per-frame basis so that it can be reproduced offline to generate raw data and ground truth labels. By taking this approach, we were able to generate a dataset of 38 semantic classes totaling 8M stills recorded at +60 frames per second with full HD resolution. For each frame, RGB-D and 3D information is provided with full annotations in both spaces. Thanks to the high quality and quantity of both raw information and annotations, the RobotriX will serve as a new milestone for investigating 2D and 3D robotic vision tasks with large-scale data-driven techniques. △ Less

Submitted 19 January, 2019; originally announced January 2019.

arXiv:1810.06936 [pdf, other]

UnrealROX: An eXtremely Photorealistic Virtual Reality Environment for Robotics Simulations and Synthetic Data Generation

Authors: Pablo Martinez-Gonzalez, Sergiu Oprea, Alberto Garcia-Garcia, Alvaro Jover-Alvarez, Sergio Orts-Escolano, Jose Garcia-Rodriguez

Abstract: Data-driven algorithms have surpassed traditional techniques in almost every aspect in robotic vision problems. Such algorithms need vast amounts of quality data to be able to work properly after their training process. Gathering and annotating that sheer amount of data in the real world is a time-consuming and error-prone task. Those problems limit scale and quality. Synthetic data generation has… ▽ More Data-driven algorithms have surpassed traditional techniques in almost every aspect in robotic vision problems. Such algorithms need vast amounts of quality data to be able to work properly after their training process. Gathering and annotating that sheer amount of data in the real world is a time-consuming and error-prone task. Those problems limit scale and quality. Synthetic data generation has become increasingly popular since it is faster to generate and automatic to annotate. However, most of the current datasets and environments lack realism, interactions, and details from the real world. UnrealROX is an environment built over Unreal Engine 4 which aims to reduce that reality gap by leveraging hyperrealistic indoor scenes that are explored by robot agents which also interact with objects in a visually realistic manner in that simulated world. Photorealistic scenes and robots are rendered by Unreal Engine into a virtual reality headset which captures gaze so that a human operator can move the robot and use controllers for the robotic hands; scene information is dumped on a per-frame basis so that it can be reproduced offline to generate raw data and ground truth annotations. This virtual reality environment enables robotic vision researchers to generate realistic and visually plausible data with full ground truth for a wide variety of problems such as class and instance semantic segmentation, object detection, depth estimation, visual grasping, and navigation. △ Less

Submitted 8 November, 2019; v1 submitted 16 October, 2018; originally announced October 2018.

Comments: Published in Virtual Reality journal

arXiv:1704.06857 [pdf, other]

A Review on Deep Learning Techniques Applied to Semantic Segmentation

Authors: Alberto Garcia-Garcia, Sergio Orts-Escolano, Sergiu Oprea, Victor Villena-Martinez, Jose Garcia-Rodriguez

Abstract: Image semantic segmentation is more and more being of interest for computer vision and machine learning researchers. Many applications on the rise need accurate and efficient segmentation mechanisms: autonomous driving, indoor navigation, and even virtual or augmented reality systems to name a few. This demand coincides with the rise of deep learning approaches in almost every field or application… ▽ More Image semantic segmentation is more and more being of interest for computer vision and machine learning researchers. Many applications on the rise need accurate and efficient segmentation mechanisms: autonomous driving, indoor navigation, and even virtual or augmented reality systems to name a few. This demand coincides with the rise of deep learning approaches in almost every field or application target related to computer vision, including semantic segmentation or scene understanding. This paper provides a review on deep learning methods for semantic segmentation applied to various application areas. Firstly, we describe the terminology of this field as well as mandatory background concepts. Next, the main datasets and challenges are exposed to help researchers decide which are the ones that best suit their needs and their targets. Then, existing methods are reviewed, highlighting their contributions and their significance in the field. Finally, quantitative results are given for the described methods and the datasets in which they were evaluated, following up with a discussion of the results. At last, we point out a set of promising future works and draw our own conclusions about the state of the art of semantic segmentation using deep learning techniques. △ Less

Submitted 22 April, 2017; originally announced April 2017.

Comments: Submitted to TPAMI on Apr. 22, 2017

arXiv:1404.6547 [pdf, ps, other]

E-books and Graphics with LaTeXML

Authors: Deyan Ginev, Bruce R. Miller, Silviu Oprea

Abstract: Marked by the highlights of native generation of EPUB E-books and TikZ support for creating SVG images, we present an annual report of LaTeXML development in 2013. LaTeXML provides a reimplementation of the $\TeX$ parser, geared towards preserving macro semantics; it supports an array of output formats, notably HTML5, EPUB, XHTML and its own $\LaTeX$-near XML. Other highlights include enhancing pe… ▽ More Marked by the highlights of native generation of EPUB E-books and TikZ support for creating SVG images, we present an annual report of LaTeXML development in 2013. LaTeXML provides a reimplementation of the $\TeX$ parser, geared towards preserving macro semantics; it supports an array of output formats, notably HTML5, EPUB, XHTML and its own $\LaTeX$-near XML. Other highlights include enhancing performance when used inside high-throughput build-systems, via incorporating a native ZIP archive workflow, as well as a simplified installation procedure that now allows to deploy LaTeXML as a cloud service. To this end, we also introduce an official plugin-based scheme for publishing new features that go beyond the core scope of LaTeXML, such as web services or unconventional post-processors. The software suite has now migrated to GitHub and we welcome forks and patches from the wider FLOSS community. △ Less

Submitted 25 April, 2014; originally announced April 2014.

Comments: 4 pages, accepted at Conferences on Intelligent Computer Mathematics CICM2014

Showing 1–17 of 17 results for author: Oprea, S