-
Graph-based Path Planning with Dynamic Obstacle Avoidance for Autonomous Parking
Authors:
Farhad Nawaz,
Minjun Sung,
Darshan Gadginmath,
Jovin D'sa,
Sangjae Bae,
David Isele,
Nadia Figueroa,
Nikolai Matni,
Faizan M. Tariq
Abstract:
Safe and efficient path planning in parking scenarios presents a significant challenge due to the presence of cluttered environments filled with static and dynamic obstacles. To address this, we propose a novel and computationally efficient planning strategy that seamlessly integrates the predictions of dynamic obstacles into the planning process, ensuring the generation of collision-free paths. O…
▽ More
Safe and efficient path planning in parking scenarios presents a significant challenge due to the presence of cluttered environments filled with static and dynamic obstacles. To address this, we propose a novel and computationally efficient planning strategy that seamlessly integrates the predictions of dynamic obstacles into the planning process, ensuring the generation of collision-free paths. Our approach builds upon the conventional Hybrid A star algorithm by introducing a time-indexed variant that explicitly accounts for the predictions of dynamic obstacles during node exploration in the graph, thus enabling dynamic obstacle avoidance. We integrate the time-indexed Hybrid A star algorithm within an online planning framework to compute local paths at each planning step, guided by an adaptively chosen intermediate goal. The proposed method is validated in diverse parking scenarios, including perpendicular, angled, and parallel parking. Through simulations, we showcase our approach's potential in greatly improving the efficiency and safety when compared to the state of the art spline-based planning method for parking situations.
△ Less
Submitted 7 May, 2025; v1 submitted 16 April, 2025;
originally announced April 2025.
-
Addressing Behavior Model Inaccuracies for Safe Motion Control in Uncertain Dynamic Environments
Authors:
Minjun Sung,
Hunmin Kim,
Naira Hovakimyan
Abstract:
Uncertainties in the environment and behavior model inaccuracies compromise the state estimation of a dynamic obstacle and its trajectory predictions, introducing biases in estimation and shifts in predictive distributions. Addressing these challenges is crucial to safely control an autonomous system. In this paper, we propose a novel algorithm SIED-MPC, which synergistically integrates Simultaneo…
▽ More
Uncertainties in the environment and behavior model inaccuracies compromise the state estimation of a dynamic obstacle and its trajectory predictions, introducing biases in estimation and shifts in predictive distributions. Addressing these challenges is crucial to safely control an autonomous system. In this paper, we propose a novel algorithm SIED-MPC, which synergistically integrates Simultaneous State and Input Estimation (SSIE) and Distributionally Robust Model Predictive Control (DR-MPC) using model confidence evaluation. The SSIE process produces unbiased state estimates and optimal input gap estimates to assess the confidence of the behavior model, defining the ambiguity radius for DR-MPC to handle predictive distribution shifts. This systematic confidence evaluation leads to producing safe inputs with an adequate level of conservatism. Our algorithm demonstrated a reduced collision rate in autonomous driving simulations through improved state estimation, with a 54% shorter average computation time.
△ Less
Submitted 26 July, 2024;
originally announced July 2024.
-
Robust Model Based Reinforcement Learning Using $\mathcal{L}_1$ Adaptive Control
Authors:
Minjun Sung,
Sambhu H. Karumanchi,
Aditya Gahlawat,
Naira Hovakimyan
Abstract:
We introduce $\mathcal{L}_1$-MBRL, a control-theoretic augmentation scheme for Model-Based Reinforcement Learning (MBRL) algorithms. Unlike model-free approaches, MBRL algorithms learn a model of the transition function using data and use it to design a control input. Our approach generates a series of approximate control-affine models of the learned transition function according to the proposed s…
▽ More
We introduce $\mathcal{L}_1$-MBRL, a control-theoretic augmentation scheme for Model-Based Reinforcement Learning (MBRL) algorithms. Unlike model-free approaches, MBRL algorithms learn a model of the transition function using data and use it to design a control input. Our approach generates a series of approximate control-affine models of the learned transition function according to the proposed switching law. Using the approximate model, control input produced by the underlying MBRL is perturbed by the $\mathcal{L}_1$ adaptive control, which is designed to enhance the robustness of the system against uncertainties. Importantly, this approach is agnostic to the choice of MBRL algorithm, enabling the use of the scheme with various MBRL algorithms. MBRL algorithms with $\mathcal{L}_1$ augmentation exhibit enhanced performance and sample efficiency across multiple MuJoCo environments, outperforming the original MBRL algorithms, both with and without system noise.
△ Less
Submitted 21 March, 2024;
originally announced March 2024.
-
KU-DMIS-MSRA at RadSum23: Pre-trained Vision-Language Model for Radiology Report Summarization
Authors:
Gangwoo Kim,
Hajung Kim,
Lei Ji,
Seongsu Bae,
Chanhwi Kim,
Mujeen Sung,
Hyunjae Kim,
Kun Yan,
Eric Chang,
Jaewoo Kang
Abstract:
In this paper, we introduce CheXOFA, a new pre-trained vision-language model (VLM) for the chest X-ray domain. Our model is initially pre-trained on various multimodal datasets within the general domain before being transferred to the chest X-ray domain. Following a prominent VLM, we unify various domain-specific tasks into a simple sequence-to-sequence schema. It enables the model to effectively…
▽ More
In this paper, we introduce CheXOFA, a new pre-trained vision-language model (VLM) for the chest X-ray domain. Our model is initially pre-trained on various multimodal datasets within the general domain before being transferred to the chest X-ray domain. Following a prominent VLM, we unify various domain-specific tasks into a simple sequence-to-sequence schema. It enables the model to effectively learn the required knowledge and skills from limited resources in the domain. Demonstrating superior performance on the benchmark datasets provided by the BioNLP shared task, our model benefits from its training across multiple tasks and domains. With subtle techniques including ensemble and factual calibration, our system achieves first place on the RadSum23 leaderboard for the hidden test set.
△ Less
Submitted 10 July, 2023;
originally announced July 2023.
-
Protective Mission against a Highly Maneuverable Rogue Drone Using Defense Margin Strategy
Authors:
Minjun Sung,
Christophe Johannes Hiltebrandt-McIntosh,
Hunmin Kim,
Naira Hovakimyan
Abstract:
The current paper studies a protective mission to defend a domain called the safe zone from a rogue drone invasion. We consider a one attacker and one defender drone scenario where only a noisy observation of the attacker at every time step is accessible to the defender. Directly applying strategies used in existing problems such as pursuit-evasion games are shown to be insufficient for our missio…
▽ More
The current paper studies a protective mission to defend a domain called the safe zone from a rogue drone invasion. We consider a one attacker and one defender drone scenario where only a noisy observation of the attacker at every time step is accessible to the defender. Directly applying strategies used in existing problems such as pursuit-evasion games are shown to be insufficient for our mission. We introduce a new concept of defense margin to complement an existing strategy and construct a control strategy that successfully solves our problem. We provide analytical proofs to point out the limitations of the existing strategy and how our defense margin strategy can be used to enhance performance. Simulation results show that our suggested strategy outperforms that of the existing strategy at least by 36.0 percentage points in terms of mission success.
△ Less
Submitted 12 September, 2022; v1 submitted 29 March, 2022;
originally announced March 2022.
-
Integrated Semantic and Phonetic Post-correction for Chinese Speech Recognition
Authors:
Yi-Chang Chen,
Chun-Yen Cheng,
Chien-An Chen,
Ming-Chieh Sung,
Yi-Ren Yeh
Abstract:
Due to the recent advances of natural language processing, several works have applied the pre-trained masked language model (MLM) of BERT to the post-correction of speech recognition. However, existing pre-trained models only consider the semantic correction while the phonetic features of words is neglected. The semantic-only post-correction will consequently decrease the performance since homopho…
▽ More
Due to the recent advances of natural language processing, several works have applied the pre-trained masked language model (MLM) of BERT to the post-correction of speech recognition. However, existing pre-trained models only consider the semantic correction while the phonetic features of words is neglected. The semantic-only post-correction will consequently decrease the performance since homophonic errors are fairly common in Chinese ASR. In this paper, we proposed a novel approach to collectively exploit the contextualized representation and the phonetic information between the error and its replacing candidates to alleviate the error rate of Chinese ASR. Our experiment results on real world speech recognition datasets showed that our proposed method has evidently lower CER than the baseline model, which utilized a pre-trained BERT MLM as the corrector.
△ Less
Submitted 16 November, 2021;
originally announced November 2021.
-
Unsupervised Spoken Term Discovery Based on Re-clustering of Hypothesized Speech Segments with Siamese and Triplet Networks
Authors:
Man-Ling Sung,
Tan Lee
Abstract:
Spoken term discovery from untranscribed speech audio could be achieved via a two-stage process. In the first stage, the unlabelled speech is decoded into a sequence of subword units that are learned and modelled in an unsupervised manner. In the second stage, partial sequence matching and clustering are performed on the decoded subword sequences, resulting in a set of discovered words or phrases.…
▽ More
Spoken term discovery from untranscribed speech audio could be achieved via a two-stage process. In the first stage, the unlabelled speech is decoded into a sequence of subword units that are learned and modelled in an unsupervised manner. In the second stage, partial sequence matching and clustering are performed on the decoded subword sequences, resulting in a set of discovered words or phrases. A limitation of this approach is that the results of subword decoding could be erroneous, and the errors would impact the subsequent steps. While Siamese/Triplet network is one approach to learn segment representations that can improve the discovery process, the challenge in spoken term discovery under a complete unsupervised scenario is that training examples are unavailable. In this paper, we propose to generate training examples from initial hypothesized sequence clusters. The Siamese/Triplet network is trained on the hypothesized examples to measure the similarity between two speech segments and hereby perform re-clustering of all hypothesized subword sequences to achieve spoken term discovery. Experimental results show that the proposed approach is effective in obtaining training examples for Siamese and Triplet networks, improving the efficacy of spoken term discovery as compared with the original two-stage method.
△ Less
Submitted 2 June, 2021; v1 submitted 27 November, 2020;
originally announced November 2020.
-
Unsupervised Spoken Term Discovery on Untranscribed Speech
Authors:
Man-Ling Sung
Abstract:
(Part of the abstract) In this thesis, we investigate the use of unsupervised spoken term discovery in tackling this problem. Unsupervised spoken term discovery aims to discover topic-related terminologies in a speech without knowing the phonetic properties of the language and content. It can be further divided into two parts: Acoustic segment modelling (ASM) and unsupervised pattern discovery. AS…
▽ More
(Part of the abstract) In this thesis, we investigate the use of unsupervised spoken term discovery in tackling this problem. Unsupervised spoken term discovery aims to discover topic-related terminologies in a speech without knowing the phonetic properties of the language and content. It can be further divided into two parts: Acoustic segment modelling (ASM) and unsupervised pattern discovery. ASM learns the phonetic structures of zero-resource language audio with no phonetic knowledge available, generating self-derived "phonemes". The audio are labelled with these "phonemes" to obtain "phoneme" sequences. Unsupervised pattern discovery searches for repetitive patterns in the "phoneme" sequences. The discovered patterns can be grouped to determine the keywords of the audio. Multilingual neural network with bottleneck layer is used for feature extraction. Experiments show that bottleneck features facilitate the training of ASM compared to conventional features such as MFCC. The unsupervised spoken term discovery system is experimented with online lectures covering different topics by different speakers. It is shown that the system learns the phonetic information of the language and can discover frequent spoken terms that align with text transcription. By using information retrieval technology such as word embedding and TFIDF, it is shown that the discovered keywords can be further used for topic comparison.
△ Less
Submitted 27 November, 2020;
originally announced November 2020.
-
Unsupervised Pattern Discovery from Thematic Speech Archives Based on Multilingual Bottleneck Features
Authors:
Man-Ling Sung,
Siyuan Feng,
Tan Lee
Abstract:
The present study tackles the problem of automatically discovering spoken keywords from untranscribed audio archives without requiring word-by-word speech transcription by automatic speech recognition (ASR) technology. The problem is of practical significance in many applications of speech analytics, including those concerning low-resource languages, and large amount of multilingual and multi-genr…
▽ More
The present study tackles the problem of automatically discovering spoken keywords from untranscribed audio archives without requiring word-by-word speech transcription by automatic speech recognition (ASR) technology. The problem is of practical significance in many applications of speech analytics, including those concerning low-resource languages, and large amount of multilingual and multi-genre data. We propose a two-stage approach, which comprises unsupervised acoustic modeling and decoding, followed by pattern mining in acoustic unit sequences. The whole process starts by deriving and modeling a set of subword-level speech units with untranscribed data. With the unsupervisedly trained acoustic models, a given audio archive is represented by a pseudo transcription, from which spoken keywords can be discovered by string mining algorithms. For unsupervised acoustic modeling, a deep neural network trained by multilingual speech corpora is used to generate speech segmentation and compute bottleneck features for segment clustering. Experimental results show that the proposed system is able to effectively extract topic-related words and phrases from the lecture recordings on MIT OpenCourseWare.
△ Less
Submitted 3 November, 2020;
originally announced November 2020.
-
Deformation-Aware 3D Model Embedding and Retrieval
Authors:
Mikaela Angelina Uy,
Jingwei Huang,
Minhyuk Sung,
Tolga Birdal,
Leonidas Guibas
Abstract:
We introduce a new problem of retrieving 3D models that are deformable to a given query shape and present a novel deep deformation-aware embedding to solve this retrieval task. 3D model retrieval is a fundamental operation for recovering a clean and complete 3D model from a noisy and partial 3D scan. However, given a finite collection of 3D shapes, even the closest model to a query may not be sati…
▽ More
We introduce a new problem of retrieving 3D models that are deformable to a given query shape and present a novel deep deformation-aware embedding to solve this retrieval task. 3D model retrieval is a fundamental operation for recovering a clean and complete 3D model from a noisy and partial 3D scan. However, given a finite collection of 3D shapes, even the closest model to a query may not be satisfactory. This motivates us to apply 3D model deformation techniques to adapt the retrieved model so as to better fit the query. Yet, certain restrictions are enforced in most 3D deformation techniques to preserve important features of the original model that prevent a perfect fitting of the deformed model to the query. This gap between the deformed model and the query induces asymmetric relationships among the models, which cannot be handled by typical metric learning techniques. Thus, to retrieve the best models for fitting, we propose a novel deep embedding approach that learns the asymmetric relationships by leveraging location-dependent egocentric distance fields. We also propose two strategies for training the embedding network. We demonstrate that both of these approaches outperform other baselines in our experiments with both synthetic and real data. Our project page can be found at https://deformscan2cad.github.io/.
△ Less
Submitted 31 July, 2020; v1 submitted 2 April, 2020;
originally announced April 2020.