-
DeepVideo-R1: Video Reinforcement Fine-Tuning via Difficulty-aware Regressive GRPO
Authors:
Jinyoung Park,
Jeehye Na,
Jinyoung Kim,
Hyunwoo J. Kim
Abstract:
Recent works have demonstrated the effectiveness of reinforcement learning (RL)-based post-training in enhancing the reasoning capabilities of large language models (LLMs). In particular, Group Relative Policy Optimization (GRPO) has shown impressive success by employing a PPO-style reinforcement algorithm with group-based normalized rewards. However, the application of GRPO to Video Large Languag…
▽ More
Recent works have demonstrated the effectiveness of reinforcement learning (RL)-based post-training in enhancing the reasoning capabilities of large language models (LLMs). In particular, Group Relative Policy Optimization (GRPO) has shown impressive success by employing a PPO-style reinforcement algorithm with group-based normalized rewards. However, the application of GRPO to Video Large Language Models (Video LLMs) has been less studied. In this paper, we explore GRPO for video LLMs and identify two primary issues that impede its effective learning: (1) reliance on safeguards, and (2) the vanishing advantage problem. To mitigate these challenges, we propose DeepVideo-R1, a video large language model trained with our proposed Reg-GRPO (Regressive GRPO) and difficulty-aware data augmentation strategy. Reg-GRPO reformulates the GRPO objective as a regression task, directly predicting the advantage in GRPO. This design eliminates the need for safeguards like clipping and min functions, thereby facilitating more direct policy guidance by aligning the model with the advantage values. We also design the difficulty-aware data augmentation strategy that dynamically augments training samples at solvable difficulty levels, fostering diverse and informative reward signals. Our comprehensive experiments show that DeepVideo-R1 significantly improves video reasoning performance across multiple video reasoning benchmarks.
△ Less
Submitted 12 June, 2025; v1 submitted 9 June, 2025;
originally announced June 2025.
-
Ranked Entropy Minimization for Continual Test-Time Adaptation
Authors:
Jisu Han,
Jaemin Na,
Wonjun Hwang
Abstract:
Test-time adaptation aims to adapt to realistic environments in an online manner by learning during test time. Entropy minimization has emerged as a principal strategy for test-time adaptation due to its efficiency and adaptability. Nevertheless, it remains underexplored in continual test-time adaptation, where stability is more important. We observe that the entropy minimization method often suff…
▽ More
Test-time adaptation aims to adapt to realistic environments in an online manner by learning during test time. Entropy minimization has emerged as a principal strategy for test-time adaptation due to its efficiency and adaptability. Nevertheless, it remains underexplored in continual test-time adaptation, where stability is more important. We observe that the entropy minimization method often suffers from model collapse, where the model converges to predicting a single class for all images due to a trivial solution. We propose ranked entropy minimization to mitigate the stability problem of the entropy minimization method and extend its applicability to continuous scenarios. Our approach explicitly structures the prediction difficulty through a progressive masking strategy. Specifically, it gradually aligns the model's probability distributions across different levels of prediction difficulty while preserving the rank order of entropy. The proposed method is extensively evaluated across various benchmarks, demonstrating its effectiveness through empirical results. Our code is available at https://github.com/pilsHan/rem
△ Less
Submitted 22 May, 2025;
originally announced May 2025.
-
Control-Oriented Modelling and Adaptive Parameter Estimation for Hybrid Wind-Wave Energy Systems
Authors:
Yingbo Huang,
Bozhong Yuan,
Haoran He,
Jing Na,
Yu Feng,
Guang Li,
Jing Zhao,
Pak Kin Wong,
Lin Cui
Abstract:
Hybrid wind-wave energy system, integrating floating offshore wind turbine and wave energy converters, has received much attention in recent years due to its potential benefit in increasing the power harvest density and reducing the levelized cost of electricity. Apart from the design complexities of the hybrid wind-wave energy systems, their energy conversion efficiency, power output smoothness a…
▽ More
Hybrid wind-wave energy system, integrating floating offshore wind turbine and wave energy converters, has received much attention in recent years due to its potential benefit in increasing the power harvest density and reducing the levelized cost of electricity. Apart from the design complexities of the hybrid wind-wave energy systems, their energy conversion efficiency, power output smoothness and their safe operations introduce new challenges for their control system designs. Recent studies show that advanced model-based control strategies have the great potential to significantly improve their overall control performance. However the performance of these advanced control strategies rely on the computationally efficient control-oriented models with sufficient fidelity, which are normally difficult to derive due to the complexity of the hydro-, aero-dynamic effects and the couplings.In most available results, the hybrid wind-wave energy system models are established by using the Boundary Element Method, devoting to understanding the hydrodynamic responses and performance analysis. However, such models are complex and involved relatively heavy computational burden, which cannot be directly used for the advanced model-based control methods that are essential for improving power capture efficiency from implementing in practice. To overcome this issue, this paper proposes a control-oriented model of the hybrid windwave energy system with six degrees of freedom. First, ...
△ Less
Submitted 8 April, 2025;
originally announced April 2025.
-
Denoising, segmentation and volumetric rendering of optical coherence tomography angiography (OCTA) image using deep learning techniques: a review
Authors:
Kejie Chen,
Xiaochun Yang,
Jing Na,
Wenbo Wang
Abstract:
Optical coherence tomography angiography (OCTA) is a non-invasive imaging technique widely used to study vascular structures and micro-circulation dynamics in the retina and choroid. OCTA has been widely used in clinics for diagnosing ocular disease and monitoring its progression, because OCTA is safer and faster than dye-based angiography while retaining the ability to characterize micro-scale st…
▽ More
Optical coherence tomography angiography (OCTA) is a non-invasive imaging technique widely used to study vascular structures and micro-circulation dynamics in the retina and choroid. OCTA has been widely used in clinics for diagnosing ocular disease and monitoring its progression, because OCTA is safer and faster than dye-based angiography while retaining the ability to characterize micro-scale structures. However, OCTA data contains many inherent noises from the devices and acquisition protocols and suffers from various types of artifacts, which impairs diagnostic accuracy and repeatability. Deep learning (DL) based imaging analysis models are able to automatically detect and remove artifacts and noises, and enhance the quality of image data. It is also a powerful tool for segmentation and identification of normal and pathological structures in the images. Thus, the value of OCTA imaging can be significantly enhanced by the DL-based approaches for interpreting and performing measurements and predictions on the OCTA data. In this study, we reviewed literature on the DL models for OCTA images in the latest five years. In particular, we focused on discussing the current problems in the OCTA data and the corresponding design principles of the DL models. We also reviewed the state-of-art DL models for 3D volumetric reconstruction of the vascular networks and pathological structures such as the edema and distorted optic disc. In addition, the publicly available dataset of OCTA images are summarized at the end of this review. Overall, this review can provide valuable insights for engineers to develop novel DL models by utilizing the characteristics of OCTA signals and images. The pros and cons of each DL methods and their applications discussed in this review can be helpful to assist technicians and clinicians to use proper DL models for fundamental research and disease screening.
△ Less
Submitted 20 February, 2025;
originally announced February 2025.
-
Echo-Teddy: Preliminary Design and Development of Large Language Model-based Social Robot for Autistic Students
Authors:
Unggi Lee,
Hansung Kim,
Juhong Eom,
Hyeonseo Jeong,
Seungyeon Lee,
Gyuri Byun,
Yunseo Lee,
Minji Kang,
Gospel Kim,
Jihoi Na,
Jewoong Moon,
Hyeoncheol Kim
Abstract:
Autistic students often face challenges in social interaction, which can hinder their educational and personal development. This study introduces Echo-Teddy, a Large Language Model (LLM)-based social robot designed to support autistic students in developing social and communication skills. Unlike previous chatbot-based solutions, Echo-Teddy leverages advanced LLM capabilities to provide more natur…
▽ More
Autistic students often face challenges in social interaction, which can hinder their educational and personal development. This study introduces Echo-Teddy, a Large Language Model (LLM)-based social robot designed to support autistic students in developing social and communication skills. Unlike previous chatbot-based solutions, Echo-Teddy leverages advanced LLM capabilities to provide more natural and adaptive interactions. The research addresses two key questions: (1) What are the design principles and initial prototype characteristics of an effective LLM-based social robot for autistic students? (2) What improvements can be made based on developer reflection-on-action and expert interviews? The study employed a mixed-methods approach, combining prototype development with qualitative analysis of developer reflections and expert interviews. Key design principles identified include customizability, ethical considerations, and age-appropriate interactions. The initial prototype, built on a Raspberry Pi platform, features custom speech components and basic motor functions. Evaluation of the prototype revealed potential improvements in areas such as user interface, educational value, and practical implementation in educational settings. This research contributes to the growing field of AI-assisted special education by demonstrating the potential of LLM-based social robots in supporting autistic students. The findings provide valuable insights for future developments in accessible and effective social support tools for special education.
△ Less
Submitted 6 February, 2025;
originally announced February 2025.
-
Spin-Weighted Spherical Harmonics for Polarized Light Transport
Authors:
Shinyoung Yi,
Donggun Kim,
Jiwoong Na,
Xin Tong,
Min H. Kim
Abstract:
The objective of polarization rendering is to simulate the interaction of light with materials exhibiting polarization-dependent behavior. However, integrating polarization into rendering is challenging and increases computational costs significantly. The primary difficulty lies in efficiently modeling and computing the complex reflection phenomena associated with polarized light. Specifically, fr…
▽ More
The objective of polarization rendering is to simulate the interaction of light with materials exhibiting polarization-dependent behavior. However, integrating polarization into rendering is challenging and increases computational costs significantly. The primary difficulty lies in efficiently modeling and computing the complex reflection phenomena associated with polarized light. Specifically, frequency-domain analysis, essential for efficient environment lighting and storage of complex light interactions, is lacking. To efficiently simulate and reproduce polarized light interactions using frequency-domain techniques, we address the challenge of maintaining continuity in polarized light transport represented by Stokes vectors within angular domains. The conventional spherical harmonics method cannot effectively handle continuity and rotation invariance for Stokes vectors. To overcome this, we develop a new method called polarized spherical harmonics (PSH) based on the spin-weighted spherical harmonics theory. Our method provides a rotation-invariant representation of Stokes vector fields. Furthermore, we introduce frequency domain formulations of polarized rendering equations and spherical convolution based on PSH. We first define spherical convolution on Stokes vector fields in the angular domain, and it also provides efficient computation of polarized light transport, nearly on an entry-wise product in the frequency domain. Our frequency domain formulation, including spherical convolution, led to the development of the first real-time polarization rendering technique under polarized environmental illumination, named precomputed polarized radiance transfer, using our polarized spherical harmonics. Results demonstrate that our method can effectively and accurately simulate and reproduce polarized light interactions in complex reflection phenomena.
△ Less
Submitted 29 December, 2024;
originally announced January 2025.
-
VidChain: Chain-of-Tasks with Metric-based Direct Preference Optimization for Dense Video Captioning
Authors:
Ji Soo Lee,
Jongha Kim,
Jeehye Na,
Jinyoung Park,
Hyunwoo J. Kim
Abstract:
Despite the advancements of Video Large Language Models (VideoLLMs) in various tasks, they struggle with fine-grained temporal understanding, such as Dense Video Captioning (DVC). DVC is a complicated task of describing all events within a video while also temporally localizing them, which integrates multiple fine-grained tasks, including video segmentation, video captioning, and temporal video gr…
▽ More
Despite the advancements of Video Large Language Models (VideoLLMs) in various tasks, they struggle with fine-grained temporal understanding, such as Dense Video Captioning (DVC). DVC is a complicated task of describing all events within a video while also temporally localizing them, which integrates multiple fine-grained tasks, including video segmentation, video captioning, and temporal video grounding. Previous VideoLLMs attempt to solve DVC in a single step, failing to utilize their reasoning capability. Moreover, previous training objectives for VideoLLMs do not fully reflect the evaluation metrics, therefore not providing supervision directly aligned to target tasks. To address such a problem, we propose a novel framework named VidChain comprised of Chain-of-Tasks (CoTasks) and Metric-based Direct Preference Optimization (M-DPO). CoTasks decompose a complex task into a sequence of sub-tasks, allowing VideoLLMs to leverage their reasoning capabilities more effectively. M-DPO aligns a VideoLLM with evaluation metrics, providing fine-grained supervision to each task that is well-aligned with metrics. Applied to two different VideoLLMs, VidChain consistently improves their fine-grained video understanding, thereby outperforming previous VideoLLMs on two different DVC benchmarks and also on the temporal video grounding task. Code is available at \url{https://github.com/mlvlab/VidChain}.
△ Less
Submitted 12 January, 2025;
originally announced January 2025.
-
Global self-similar solutions for the 3D Muskat equation
Authors:
Jungkyoung Na
Abstract:
In this paper, we establish the existence of global self-similar solutions to the 3D Muskat equation when the two fluids have the same viscosity but different densities. These self-similar solutions are globally defined in both space and time, with exact cones as their initial data. Furthermore we estimate the difference between our self-similar solutions and solutions of the linearized equation a…
▽ More
In this paper, we establish the existence of global self-similar solutions to the 3D Muskat equation when the two fluids have the same viscosity but different densities. These self-similar solutions are globally defined in both space and time, with exact cones as their initial data. Furthermore we estimate the difference between our self-similar solutions and solutions of the linearized equation around the flat interface in terms of critical spaces and some weighted $\dot{W}^{k,\infty}(\mathbb{R}^2)$ spaces for $k=1,2$. The main ingredients of the proof are new estimates in the sense of $\dot{H}^{s_1}(\mathbb{R}^2) \cap \dot{H}^{s_2}(\mathbb{R}^2)$ with $3/2<s_1<2<s_2<3$, which is continuously embedded in critical spaces for the 3D Muskat problem: $\dot{H}^2(\mathbb{R}^2)$ and $\dot{W}^{1,\infty}(\mathbb{R}^2)$.
△ Less
Submitted 3 November, 2024;
originally announced November 2024.
-
Statistical Analysis by Semiparametric Additive Regression and LSTM-FCN Based Hierarchical Classification for Computer Vision Quantification of Parkinsonian Bradykinesia
Authors:
Youngseo Cho,
In Hee Kwak,
Dohyeon Kim,
Jinhee Na,
Hanjoo Sung,
Jeongjae Lee,
Young Eun Kim,
Hyeo-il Ma
Abstract:
Bradykinesia, characterized by involuntary slowing or decrement of movement, is a fundamental symptom of Parkinson's Disease (PD) and is vital for its clinical diagnosis. Despite various methodologies explored to quantify bradykinesia, computer vision-based approaches have shown promising results. However, these methods often fall short in adequately addressing key bradykinesia characteristics in…
▽ More
Bradykinesia, characterized by involuntary slowing or decrement of movement, is a fundamental symptom of Parkinson's Disease (PD) and is vital for its clinical diagnosis. Despite various methodologies explored to quantify bradykinesia, computer vision-based approaches have shown promising results. However, these methods often fall short in adequately addressing key bradykinesia characteristics in repetitive limb movements: "occasional arrest" and "decrement in amplitude."
This research advances vision-based quantification of bradykinesia by introducing nuanced numerical analysis to capture decrement in amplitudes and employing a simple deep learning technique, LSTM-FCN, for precise classification of occasional arrests. Our approach structures the classification process hierarchically, tailoring it to the unique dynamics of bradykinesia in PD.
Statistical analysis of the extracted features, including those representing arrest and fatigue, has demonstrated their statistical significance in most cases. This finding underscores the importance of considering "occasional arrest" and "decrement in amplitude" in bradykinesia quantification of limb movement. Our enhanced diagnostic tool has been rigorously tested on an extensive dataset comprising 1396 motion videos from 310 PD patients, achieving an accuracy of 80.3%. The results confirm the robustness and reliability of our method.
△ Less
Submitted 31 March, 2024;
originally announced April 2024.
-
Aligning Large Language Models for Enhancing Psychiatric Interviews Through Symptom Delineation and Summarization: Pilot Study
Authors:
Jae-hee So,
Joonhwan Chang,
Eunji Kim,
Junho Na,
JiYeon Choi,
Jy-yong Sohn,
Byung-Hoon Kim,
Sang Hui Chu
Abstract:
Background: Advancements in large language models (LLMs) have opened new possibilities in psychiatric interviews, an underexplored area where LLMs could be valuable. This study focuses on enhancing psychiatric interviews by analyzing counseling data from North Korean defectors who have experienced trauma and mental health issues.
Objective: The study investigates whether LLMs can (1) identify pa…
▽ More
Background: Advancements in large language models (LLMs) have opened new possibilities in psychiatric interviews, an underexplored area where LLMs could be valuable. This study focuses on enhancing psychiatric interviews by analyzing counseling data from North Korean defectors who have experienced trauma and mental health issues.
Objective: The study investigates whether LLMs can (1) identify parts of conversations that suggest psychiatric symptoms and recognize those symptoms, and (2) summarize stressors and symptoms based on interview transcripts.
Methods: LLMs are tasked with (1) extracting stressors from transcripts, (2) identifying symptoms and their corresponding sections, and (3) generating interview summaries using the extracted data. The transcripts were labeled by mental health experts for training and evaluation.
Results: In the zero-shot inference setting using GPT-4 Turbo, 73 out of 102 segments demonstrated a recall mid-token distance d < 20 in identifying symptom-related sections. For recognizing specific symptoms, fine-tuning outperformed zero-shot inference, achieving an accuracy, precision, recall, and F1-score of 0.82. For the generative summarization task, LLMs using symptom and stressor information scored highly on G-Eval metrics: coherence (4.66), consistency (4.73), fluency (2.16), and relevance (4.67). Retrieval-augmented generation showed no notable performance improvement.
Conclusions: LLMs, with fine-tuning or appropriate prompting, demonstrated strong accuracy (over 0.8) for symptom delineation and achieved high coherence (4.6+) in summarization. This study highlights their potential to assist mental health practitioners in analyzing psychiatric interviews.
△ Less
Submitted 10 February, 2025; v1 submitted 26 March, 2024;
originally announced March 2024.
-
OurDB: Ouroboric Domain Bridging for Multi-Target Domain Adaptive Semantic Segmentation
Authors:
Seungbeom Woo,
Geonwoo Baek,
Taehoon Kim,
Jaemin Na,
Joong-won Hwang,
Wonjun Hwang
Abstract:
Multi-target domain adaptation (MTDA) for semantic segmentation poses a significant challenge, as it involves multiple target domains with varying distributions. The goal of MTDA is to minimize the domain discrepancies among a single source and multi-target domains, aiming to train a single model that excels across all target domains. Previous MTDA approaches typically employ multiple teacher arch…
▽ More
Multi-target domain adaptation (MTDA) for semantic segmentation poses a significant challenge, as it involves multiple target domains with varying distributions. The goal of MTDA is to minimize the domain discrepancies among a single source and multi-target domains, aiming to train a single model that excels across all target domains. Previous MTDA approaches typically employ multiple teacher architectures, where each teacher specializes in one target domain to simplify the task. However, these architectures hinder the student model from fully assimilating comprehensive knowledge from all target-specific teachers and escalate training costs with increasing target domains. In this paper, we propose an ouroboric domain bridging (OurDB) framework, offering an efficient solution to the MTDA problem using a single teacher architecture. This framework dynamically cycles through multiple target domains, aligning each domain individually to restrain the biased alignment problem, and utilizes Fisher information to minimize the forgetting of knowledge from previous target domains. We also propose a context-guided class-wise mixup (CGMix) that leverages contextual information tailored to diverse target contexts in MTDA. Experimental evaluations conducted on four urban driving datasets (i.e., GTA5, Cityscapes, IDD, and Mapillary) demonstrate the superiority of our method over existing state-of-the-art approaches.
△ Less
Submitted 18 March, 2024;
originally announced March 2024.
-
Semantic Prompting with Image-Token for Continual Learning
Authors:
Jisu Han,
Jaemin Na,
Wonjun Hwang
Abstract:
Continual learning aims to refine model parameters for new tasks while retaining knowledge from previous tasks. Recently, prompt-based learning has emerged to leverage pre-trained models to be prompted to learn subsequent tasks without the reliance on the rehearsal buffer. Although this approach has demonstrated outstanding results, existing methods depend on preceding task-selection process to ch…
▽ More
Continual learning aims to refine model parameters for new tasks while retaining knowledge from previous tasks. Recently, prompt-based learning has emerged to leverage pre-trained models to be prompted to learn subsequent tasks without the reliance on the rehearsal buffer. Although this approach has demonstrated outstanding results, existing methods depend on preceding task-selection process to choose appropriate prompts. However, imperfectness in task-selection may lead to negative impacts on the performance particularly in the scenarios where the number of tasks is large or task distributions are imbalanced. To address this issue, we introduce I-Prompt, a task-agnostic approach focuses on the visual semantic information of image tokens to eliminate task prediction. Our method consists of semantic prompt matching, which determines prompts based on similarities between tokens, and image token-level prompting, which applies prompts directly to image tokens in the intermediate layers. Consequently, our method achieves competitive performance on four benchmarks while significantly reducing training time compared to state-of-the-art methods. Moreover, we demonstrate the superiority of our method across various scenarios through extensive experiments.
△ Less
Submitted 18 March, 2024;
originally announced March 2024.
-
D3T: Distinctive Dual-Domain Teacher Zigzagging Across RGB-Thermal Gap for Domain-Adaptive Object Detection
Authors:
Dinh Phat Do,
Taehoon Kim,
Jaemin Na,
Jiwon Kim,
Keonho Lee,
Kyunghwan Cho,
Wonjun Hwang
Abstract:
Domain adaptation for object detection typically entails transferring knowledge from one visible domain to another visible domain. However, there are limited studies on adapting from the visible to the thermal domain, because the domain gap between the visible and thermal domains is much larger than expected, and traditional domain adaptation can not successfully facilitate learning in this situat…
▽ More
Domain adaptation for object detection typically entails transferring knowledge from one visible domain to another visible domain. However, there are limited studies on adapting from the visible to the thermal domain, because the domain gap between the visible and thermal domains is much larger than expected, and traditional domain adaptation can not successfully facilitate learning in this situation. To overcome this challenge, we propose a Distinctive Dual-Domain Teacher (D3T) framework that employs distinct training paradigms for each domain. Specifically, we segregate the source and target training sets for building dual-teachers and successively deploy exponential moving average to the student model to individual teachers of each domain. The framework further incorporates a zigzag learning method between dual teachers, facilitating a gradual transition from the visible to thermal domains during training. We validate the superiority of our method through newly designed experimental protocols with well-known thermal datasets, i.e., FLIR and KAIST. Source code is available at https://github.com/EdwardDo69/D3T .
△ Less
Submitted 14 March, 2024;
originally announced March 2024.
-
Player Pressure Map -- A Novel Representation of Pressure in Soccer for Evaluating Player Performance in Different Game Contexts
Authors:
Chaoyi Gu,
Jiaming Na,
Yisheng Pei,
Varuna De Silva
Abstract:
In soccer, contextual player performance metrics are invaluable to coaches. For example, the ability to perform under pressure during matches distinguishes the elite from the average. Appropriate pressure metric enables teams to assess players' performance accurately under pressure and design targeted training scenarios to address their weaknesses. The primary objective of this paper is to leverag…
▽ More
In soccer, contextual player performance metrics are invaluable to coaches. For example, the ability to perform under pressure during matches distinguishes the elite from the average. Appropriate pressure metric enables teams to assess players' performance accurately under pressure and design targeted training scenarios to address their weaknesses. The primary objective of this paper is to leverage both tracking and event data and game footage to capture the pressure experienced by the possession team in a soccer game scene. We propose a player pressure map to represent a given game scene, which lowers the dimension of raw data and still contains rich contextual information. Not only does it serve as an effective tool for visualizing and evaluating the pressure on the team and each individual, but it can also be utilized as a backbone for accessing players' performance. Overall, our model provides coaches and analysts with a deeper understanding of players' performance under pressure so that they make data-oriented tactical decisions.
△ Less
Submitted 7 March, 2024; v1 submitted 29 January, 2024;
originally announced January 2024.
-
ParaHome: Parameterizing Everyday Home Activities Towards 3D Generative Modeling of Human-Object Interactions
Authors:
Jeonghwan Kim,
Jisoo Kim,
Jeonghyeon Na,
Hanbyul Joo
Abstract:
To enable machines to understand the way humans interact with the physical world in daily life, 3D interaction signals should be captured in natural settings, allowing people to engage with multiple objects in a range of sequential and casual manipulations. To achieve this goal, we introduce our ParaHome system designed to capture dynamic 3D movements of humans and objects within a common home env…
▽ More
To enable machines to understand the way humans interact with the physical world in daily life, 3D interaction signals should be captured in natural settings, allowing people to engage with multiple objects in a range of sequential and casual manipulations. To achieve this goal, we introduce our ParaHome system designed to capture dynamic 3D movements of humans and objects within a common home environment. Our system features a multi-view setup with 70 synchronized RGB cameras, along with wearable motion capture devices including an IMU-based body suit and hand motion capture gloves. By leveraging the ParaHome system, we collect a new human-object interaction dataset, including 486 minutes of sequences across 207 captures with 38 participants, offering advancements with three key aspects: (1) capturing body motion and dexterous hand manipulation motion alongside multiple objects within a contextual home environment; (2) encompassing sequential and concurrent manipulations paired with text descriptions; and (3) including articulated objects with multiple parts represented by 3D parameterized models. We present detailed design justifications for our system, and perform key generative modeling experiments to demonstrate the potential of our dataset.
△ Less
Submitted 22 January, 2025; v1 submitted 18 January, 2024;
originally announced January 2024.
-
Universal Time-Series Representation Learning: A Survey
Authors:
Patara Trirat,
Yooju Shin,
Junhyeok Kang,
Youngeun Nam,
Jihye Na,
Minyoung Bae,
Joeun Kim,
Byunghyun Kim,
Jae-Gil Lee
Abstract:
Time-series data exists in every corner of real-world systems and services, ranging from satellites in the sky to wearable devices on human bodies. Learning representations by extracting and inferring valuable information from these time series is crucial for understanding the complex dynamics of particular phenomena and enabling informed decisions. With the learned representations, we can perform…
▽ More
Time-series data exists in every corner of real-world systems and services, ranging from satellites in the sky to wearable devices on human bodies. Learning representations by extracting and inferring valuable information from these time series is crucial for understanding the complex dynamics of particular phenomena and enabling informed decisions. With the learned representations, we can perform numerous downstream analyses more effectively. Among several approaches, deep learning has demonstrated remarkable performance in extracting hidden patterns and features from time-series data without manual feature engineering. This survey first presents a novel taxonomy based on three fundamental elements in designing state-of-the-art universal representation learning methods for time series. According to the proposed taxonomy, we comprehensively review existing studies and discuss their intuitions and insights into how these methods enhance the quality of learned representations. Finally, as a guideline for future studies, we summarize commonly used experimental setups and datasets and discuss several promising research directions. An up-to-date corresponding resource is available at https://github.com/itouchz/awesome-deep-time-series-representations.
△ Less
Submitted 27 August, 2024; v1 submitted 8 January, 2024;
originally announced January 2024.
-
Switching Temporary Teachers for Semi-Supervised Semantic Segmentation
Authors:
Jaemin Na,
Jung-Woo Ha,
Hyung Jin Chang,
Dongyoon Han,
Wonjun Hwang
Abstract:
The teacher-student framework, prevalent in semi-supervised semantic segmentation, mainly employs the exponential moving average (EMA) to update a single teacher's weights based on the student's. However, EMA updates raise a problem in that the weights of the teacher and student are getting coupled, causing a potential performance bottleneck. Furthermore, this problem may become more severe when t…
▽ More
The teacher-student framework, prevalent in semi-supervised semantic segmentation, mainly employs the exponential moving average (EMA) to update a single teacher's weights based on the student's. However, EMA updates raise a problem in that the weights of the teacher and student are getting coupled, causing a potential performance bottleneck. Furthermore, this problem may become more severe when training with more complicated labels such as segmentation masks but with few annotated data. This paper introduces Dual Teacher, a simple yet effective approach that employs dual temporary teachers aiming to alleviate the coupling problem for the student. The temporary teachers work in shifts and are progressively improved, so consistently prevent the teacher and student from becoming excessively close. Specifically, the temporary teachers periodically take turns generating pseudo-labels to train a student model and maintain the distinct characteristics of the student model for each epoch. Consequently, Dual Teacher achieves competitive performance on the PASCAL VOC, Cityscapes, and ADE20K benchmarks with remarkably shorter training times than state-of-the-art methods. Moreover, we demonstrate that our approach is model-agnostic and compatible with both CNN- and Transformer-based models. Code is available at \url{https://github.com/naver-ai/dual-teacher}.
△ Less
Submitted 28 October, 2023;
originally announced October 2023.
-
Dense 2D-3D Indoor Prediction with Sound via Aligned Cross-Modal Distillation
Authors:
Heeseung Yun,
Joonil Na,
Gunhee Kim
Abstract:
Sound can convey significant information for spatial reasoning in our daily lives. To endow deep networks with such ability, we address the challenge of dense indoor prediction with sound in both 2D and 3D via cross-modal knowledge distillation. In this work, we propose a Spatial Alignment via Matching (SAM) distillation framework that elicits local correspondence between the two modalities in vis…
▽ More
Sound can convey significant information for spatial reasoning in our daily lives. To endow deep networks with such ability, we address the challenge of dense indoor prediction with sound in both 2D and 3D via cross-modal knowledge distillation. In this work, we propose a Spatial Alignment via Matching (SAM) distillation framework that elicits local correspondence between the two modalities in vision-to-audio knowledge transfer. SAM integrates audio features with visually coherent learnable spatial embeddings to resolve inconsistencies in multiple layers of a student model. Our approach does not rely on a specific input representation, allowing for flexibility in the input shapes or dimensions without performance degradation. With a newly curated benchmark named Dense Auditory Prediction of Surroundings (DAPS), we are the first to tackle dense indoor prediction of omnidirectional surroundings in both 2D and 3D with audio observations. Specifically, for audio-based depth estimation, semantic segmentation, and challenging 3D scene reconstruction, the proposed distillation framework consistently achieves state-of-the-art performance across various metrics and backbone architectures.
△ Less
Submitted 20 September, 2023;
originally announced September 2023.
-
Weakened vortex stretching effect in three scale hierarchy for the 3D Euler equations
Authors:
In-Jee Jeong,
Jungkyoung Na,
Tsuyoshi Yoneda
Abstract:
We consider the 3D incompressible Euler equations under the following three scale hierarchical situation: large-scale vortex stretching the middle-scale, and at the same time, the middle-scale stretching the small-scale. In this situation, we show that, the stretching effect of this middle-scale flow is weakened by the large-scale. In other words, the vortices being stretched could have the corres…
▽ More
We consider the 3D incompressible Euler equations under the following three scale hierarchical situation: large-scale vortex stretching the middle-scale, and at the same time, the middle-scale stretching the small-scale. In this situation, we show that, the stretching effect of this middle-scale flow is weakened by the large-scale. In other words, the vortices being stretched could have the corresponding stress tensor being weakened.
△ Less
Submitted 12 September, 2023;
originally announced September 2023.
-
Well-posedness for Ohkitani model and long-time existence for surface quasi-geostrophic equations
Authors:
Dongho Chae,
In-Jee Jeong,
Jungkyoung Na,
Sung-Jin Oh
Abstract:
We consider the Cauchy problem for the logarithmically singular surface quasi-geostrophic (SQG) equation, introduced by Ohkitani, $$\partial_t θ- \nabla^\perp \log(10+(-Δ)^{\frac12})θ\cdot \nabla θ= 0 ,$$ and establish local existence and uniqueness of smooth solutions in the scale of Sobolev spaces with exponent decreasing with time. Such a decrease of the Sobolev exponent is necessary, as we hav…
▽ More
We consider the Cauchy problem for the logarithmically singular surface quasi-geostrophic (SQG) equation, introduced by Ohkitani, $$\partial_t θ- \nabla^\perp \log(10+(-Δ)^{\frac12})θ\cdot \nabla θ= 0 ,$$ and establish local existence and uniqueness of smooth solutions in the scale of Sobolev spaces with exponent decreasing with time. Such a decrease of the Sobolev exponent is necessary, as we have shown in the companion paper that the problem is strongly ill-posed in any fixed Sobolev spaces. The time dependence of the Sobolev exponent can be removed when there is a dissipation term strictly stronger than log. These results improve wellposedness statements by Chae, Constantin, Córdoba, Gancedo, and Wu in \cite{CCCGW}.
This well-posedness result can be applied to describe the long-time dynamics of the $δ$-SQG equations, defined by $$\partial_t θ+ \nabla^\perp (10+(-Δ)^{\frac12})^{-δ}θ\cdot \nabla θ= 0,$$ for all sufficiently small $δ>0$ depending on the size of the initial data. For the same range of $δ$, we establish global well-posedness of smooth solutions to the logarithmically dissipative counterpart: $$\partial_t θ+ \nabla^\perp (10+(-Δ)^{\frac12})^{-δ}θ\cdot \nabla θ+ \log(10+(-Δ)^{\frac12})θ= 0.$$
△ Less
Submitted 16 February, 2025; v1 submitted 3 August, 2023;
originally announced August 2023.
-
Remote Bio-Sensing: Open Source Benchmark Framework for Fair Evaluation of rPPG
Authors:
Dae-Yeol Kim,
Eunsu Goh,
KwangKee Lee,
JongEui Chae,
JongHyeon Mun,
Junyeong Na,
Chae-bong Sohn,
Do-Yup Kim
Abstract:
rPPG (Remote photoplethysmography) is a technology that measures and analyzes BVP (Blood Volume Pulse) by using the light absorption characteristics of hemoglobin captured through a camera. Analyzing the measured BVP can derive various physiological signals such as heart rate, stress level, and blood pressure, which can be applied to various applications such as telemedicine, remote patient monito…
▽ More
rPPG (Remote photoplethysmography) is a technology that measures and analyzes BVP (Blood Volume Pulse) by using the light absorption characteristics of hemoglobin captured through a camera. Analyzing the measured BVP can derive various physiological signals such as heart rate, stress level, and blood pressure, which can be applied to various applications such as telemedicine, remote patient monitoring, and early prediction of cardiovascular disease. rPPG is rapidly evolving and attracting great attention from both academia and industry by providing great usability and convenience as it can measure biosignals using a camera-equipped device without medical or wearable devices. Despite extensive efforts and advances in this field, serious challenges remain, including issues related to skin color, camera characteristics, ambient lighting, and other sources of noise and artifacts, which degrade accuracy performance. We argue that fair and evaluable benchmarking is urgently required to overcome these challenges and make meaningful progress from both academic and commercial perspectives. In most existing work, models are trained, tested, and validated only on limited datasets. Even worse, some studies lack available code or reproducibility, making it difficult to fairly evaluate and compare performance. Therefore, the purpose of this study is to provide a benchmarking framework to evaluate various rPPG techniques across a wide range of datasets for fair evaluation and comparison, including both conventional non-deep neural network (non-DNN) and deep neural network (DNN) methods. GitHub URL: https://github.com/remotebiosensing/rppg
△ Less
Submitted 18 August, 2023; v1 submitted 24 July, 2023;
originally announced July 2023.
-
Adaptive Parameter Estimation under Finite Excitation
Authors:
Siyu Chen,
Jing Na,
Yingbo Huang
Abstract:
Although persistent excitation is often acknowledged as a sufficient condition to exponentially converge in the field of adaptive parameter estimation, it must be noted that in practical applications this may be unguaranteed. Recently, more attention has turned to another relaxed condition, i.e., finite excitation. In this paper, for a class of nominal nonlinear systems with unknown constant param…
▽ More
Although persistent excitation is often acknowledged as a sufficient condition to exponentially converge in the field of adaptive parameter estimation, it must be noted that in practical applications this may be unguaranteed. Recently, more attention has turned to another relaxed condition, i.e., finite excitation. In this paper, for a class of nominal nonlinear systems with unknown constant parameters, a novel method that combines the Newton algorithm and the time-varying factor is proposed, which can achieve exponential convergence under finite excitation. First, by introducing pre-filtering, the nominal system is transformed to a linear parameterized form. Then the detailed mathematical derivation is outlined from an estimation error accumulated cost function. And it is given that the theoretical analysis of the proposed method in stability and robustness. Finally, comparative numerical simulations are given to illustrate the superiority of the proposed method.
△ Less
Submitted 17 March, 2024; v1 submitted 22 May, 2023;
originally announced May 2023.
-
SRIL: Selective Regularization for Class-Incremental Learning
Authors:
Jisu Han,
Jaemin Na,
Wonjun Hwang
Abstract:
Human intelligence gradually accepts new information and accumulates knowledge throughout the lifespan. However, deep learning models suffer from a catastrophic forgetting phenomenon, where they forget previous knowledge when acquiring new information. Class-Incremental Learning aims to create an integrated model that balances plasticity and stability to overcome this challenge. In this paper, we…
▽ More
Human intelligence gradually accepts new information and accumulates knowledge throughout the lifespan. However, deep learning models suffer from a catastrophic forgetting phenomenon, where they forget previous knowledge when acquiring new information. Class-Incremental Learning aims to create an integrated model that balances plasticity and stability to overcome this challenge. In this paper, we propose a selective regularization method that accepts new knowledge while maintaining previous knowledge. We first introduce an asymmetric feature distillation method for old and new classes inspired by cognitive science, using the gradient of classification and knowledge distillation losses to determine whether to perform pattern completion or pattern separation. We also propose a method to selectively interpolate the weight of the previous model for a balance between stability and plasticity, and we adjust whether to transfer through model confidence to ensure the performance of the previous class and enable exploratory learning. We validate the effectiveness of the proposed method, which surpasses the performance of existing methods through extensive experimental protocols using CIFAR-100, ImageNet-Subset, and ImageNet-Full.
△ Less
Submitted 9 May, 2023;
originally announced May 2023.
-
Uncertainty Aware Active Learning for Reconfiguration of Pre-trained Deep Object-Detection Networks for New Target Domains
Authors:
Jiaming Na,
Varuna De-Silva
Abstract:
Object detection is one of the most important and fundamental aspects of computer vision tasks, which has been broadly utilized in pose estimation, object tracking and instance segmentation models. To obtain training data for object detection model efficiently, many datasets opt to obtain their unannotated data in video format and the annotator needs to draw a bounding box around each object in th…
▽ More
Object detection is one of the most important and fundamental aspects of computer vision tasks, which has been broadly utilized in pose estimation, object tracking and instance segmentation models. To obtain training data for object detection model efficiently, many datasets opt to obtain their unannotated data in video format and the annotator needs to draw a bounding box around each object in the images. Annotating every frame from a video is costly and inefficient since many frames contain very similar information for the model to learn from. How to select the most informative frames from a video to annotate has become a highly practical task to solve but attracted little attention in research. In this paper, we proposed a novel active learning algorithm for object detection models to tackle this problem. In the proposed active learning algorithm, both classification and localization informativeness of unlabelled data are measured and aggregated. Utilizing the temporal information from video frames, two novel localization informativeness measurements are proposed. Furthermore, a weight curve is proposed to avoid querying adjacent frames. Proposed active learning algorithm with multiple configurations was evaluated on the MuPoTS dataset and FootballPD dataset.
△ Less
Submitted 22 March, 2023;
originally announced March 2023.
-
Materials Discovery with Extreme Properties via Reinforcement Learning-Guided Combinatorial Chemistry
Authors:
Hyunseung Kim,
Haeyeon Choi,
Dongju Kang,
Won Bo Lee,
Jonggeol Na
Abstract:
The goal of most materials discovery is to discover materials that are superior to those currently known. Fundamentally, this is close to extrapolation, which is a weak point for most machine learning models that learn the probability distribution of data. Herein, we develop reinforcement learning-guided combinatorial chemistry, which is a rule-based molecular designer driven by trained policy for…
▽ More
The goal of most materials discovery is to discover materials that are superior to those currently known. Fundamentally, this is close to extrapolation, which is a weak point for most machine learning models that learn the probability distribution of data. Herein, we develop reinforcement learning-guided combinatorial chemistry, which is a rule-based molecular designer driven by trained policy for selecting subsequent molecular fragments to get a target molecule. Since our model has the potential to generate all possible molecular structures that can be obtained from combinations of molecular fragments, unknown molecules with superior properties can be discovered. We theoretically and empirically demonstrate that our model is more suitable for discovering better compounds than probability distribution-learning models. In an experiment aimed at discovering molecules that hit seven extreme target properties, our model discovered 1,315 of all target-hitting molecules and 7,629 of five target-hitting molecules out of 100,000 trials, whereas the probability distribution-learning models failed. Moreover, it has been confirmed that every molecule generated under the binding rules of molecular fragments is 100% chemically valid. To illustrate the performance in actual problems, we also demonstrate that our models work well on two practical applications: discovering protein docking molecules and HIV inhibitors.
△ Less
Submitted 7 May, 2024; v1 submitted 21 March, 2023;
originally announced March 2023.
-
Active Semi-Supervised Learning by Exploring Per-Sample Uncertainty and Consistency
Authors:
Jaeseung Lim,
Jongkeun Na,
Nojun Kwak
Abstract:
Active Learning (AL) and Semi-supervised Learning are two techniques that have been studied to reduce the high cost of deep learning by using a small amount of labeled data and a large amount of unlabeled data. To improve the accuracy of models at a lower cost, we propose a method called Active Semi-supervised Learning (ASSL), which combines AL and SSL. To maximize the synergy between AL and SSL,…
▽ More
Active Learning (AL) and Semi-supervised Learning are two techniques that have been studied to reduce the high cost of deep learning by using a small amount of labeled data and a large amount of unlabeled data. To improve the accuracy of models at a lower cost, we propose a method called Active Semi-supervised Learning (ASSL), which combines AL and SSL. To maximize the synergy between AL and SSL, we focused on the differences between ASSL and AL. ASSL involves more dynamic model updates than AL due to the use of unlabeled data in the training process, resulting in the temporal instability of the predicted probabilities of the unlabeled data. This makes it difficult to determine the true uncertainty of the unlabeled data in ASSL. To address this, we adopted techniques such as exponential moving average (EMA) and upper confidence bound (UCB) used in reinforcement learning. Additionally, we analyzed the effect of label noise on unsupervised learning by using weak and strong augmentation pairs to address datainconsistency. By considering both uncertainty and datainconsistency, we acquired data samples that were used in the proposed ASSL method. Our experiments showed that ASSL achieved about 5.3 times higher computational efficiency than SSL while achieving the same performance, and it outperformed the state-of-the-art AL method.
△ Less
Submitted 15 March, 2023;
originally announced March 2023.
-
Optimal Planning of Hybrid Energy Storage Systems using Curtailed Renewable Energy through Deep Reinforcement Learning
Authors:
Dongju Kang,
Doeun Kang,
Sumin Hwangbo,
Haider Niaz,
Won Bo Lee,
J. Jay Liu,
Jonggeol Na
Abstract:
Energy management systems (EMS) are becoming increasingly important in order to utilize the continuously growing curtailed renewable energy. Promising energy storage systems (ESS), such as batteries and green hydrogen should be employed to maximize the efficiency of energy stakeholders. However, optimal decision-making, i.e., planning the leveraging between different strategies, is confronted with…
▽ More
Energy management systems (EMS) are becoming increasingly important in order to utilize the continuously growing curtailed renewable energy. Promising energy storage systems (ESS), such as batteries and green hydrogen should be employed to maximize the efficiency of energy stakeholders. However, optimal decision-making, i.e., planning the leveraging between different strategies, is confronted with the complexity and uncertainties of large-scale problems. Here, we propose a sophisticated deep reinforcement learning (DRL) methodology with a policy-based algorithm to realize the real-time optimal ESS planning under the curtailed renewable energy uncertainty. A quantitative performance comparison proved that the DRL agent outperforms the scenario-based stochastic optimization (SO) algorithm, even with a wide action and observation space. Owing to the uncertainty rejection capability of the DRL, we could confirm a robust performance, under a large uncertainty of the curtailed renewable energy, with a maximizing net profit and stable system. Action-mapping was performed for visually assessing the action taken by the DRL agent according to the state. The corresponding results confirmed that the DRL agent learns the way like what a human expert would do, suggesting reliable application of the proposed methodology.
△ Less
Submitted 11 December, 2022;
originally announced December 2022.
-
Global well-posedness for a two-dimensional Keller-Segel-Euler system of consumption type
Authors:
Jungkyoung Na
Abstract:
We consider the Cauchy problem for the Keller-Segel system of consumption type coupled with the incompressible Euler equations in $\mathbb{R}^2$. This coupled system describes a biological phenomenon in which aerobic bacteria living in slightly viscous fluids (such as water) move towards a higher oxygen concentration to survive. We firstly prove the local existence of smooth solutions for arbitrar…
▽ More
We consider the Cauchy problem for the Keller-Segel system of consumption type coupled with the incompressible Euler equations in $\mathbb{R}^2$. This coupled system describes a biological phenomenon in which aerobic bacteria living in slightly viscous fluids (such as water) move towards a higher oxygen concentration to survive. We firstly prove the local existence of smooth solutions for arbitrary smooth initial data. Then we show that these smooth solutions can be extended globally if the initial density of oxygen is sufficiently small. The main ingredient in the proof is the $W^{1,q}$-energy estimate $(q>2)$ motivated by the partially inviscid two-dimensional Boussinesq system in \cite{C06}. Our result improves the well-known global well-posedness of the two-dimensional Keller-Segel system of consumption type coupled with the incompressible Navier-Stokes equations.
△ Less
Submitted 17 January, 2024; v1 submitted 7 December, 2022;
originally announced December 2022.
-
Finite-time blow-up to hyperbolic Keller-Segel system of consumption type with logarithmic sensitivity
Authors:
Jungkyoung Na
Abstract:
This paper deals with a hyperbolic Keller-Segel system of consumption type with the logarithmic sensitivity \begin{equation*}
\partial_{t} ρ= - χ\nabla \cdot \left (ρ\nabla \log c\right),\quad \partial_{t} c = - μcρ\quad (χ,\,μ>0) \end{equation*} in $\mathbb{R}^d\; (d \ge1)$ for nonvanishing initial data. This system is closely related to tumor angiogenesis, an important example of chemotaxis. W…
▽ More
This paper deals with a hyperbolic Keller-Segel system of consumption type with the logarithmic sensitivity \begin{equation*}
\partial_{t} ρ= - χ\nabla \cdot \left (ρ\nabla \log c\right),\quad \partial_{t} c = - μcρ\quad (χ,\,μ>0) \end{equation*} in $\mathbb{R}^d\; (d \ge1)$ for nonvanishing initial data. This system is closely related to tumor angiogenesis, an important example of chemotaxis. We firstly show the local existence of smooth solutions corresponding to nonvanishing smooth initial data. Next, through Riemann invariants, we present some sufficient conditions of this initial data for finite-time singularity formation when $d=1$. We then prove that for any $d\ge1$, some nonvanishing $C^\infty$-data can become singular in finite time. Moreover, we derive detailed information about the behaviors of solutions when the singularity occurs. In particular, this information tells that singularity formation from some initial data is not because $c$ touches zero (which makes $\log c$ diverge) but due to the blowup of $C^1\times C^2$-norm of $(ρ,c)$. As a corollary, we also construct initial data near any constant equilibrium state which blows up in finite time for any $d\ge1$. Our results are the extension of finite-time blow-up results in \cite{IJ21}, where initial data is required to satisfy some vanishing conditions. Furthermore, we interpret our results in a way that some kinds of damping or dissipation of $ρ$ are necessarily required to ensure the global existence of smooth solutions even though initial data are small perturbations around constant equilibrium states.
△ Less
Submitted 7 March, 2024; v1 submitted 7 December, 2022;
originally announced December 2022.
-
ELF22: A Context-based Counter Trolling Dataset to Combat Internet Trolls
Authors:
Huije Lee,
Young Ju NA,
Hoyun Song,
Jisu Shin,
Jong C. Park
Abstract:
Online trolls increase social costs and cause psychological damage to individuals. With the proliferation of automated accounts making use of bots for trolling, it is difficult for targeted individual users to handle the situation both quantitatively and qualitatively. To address this issue, we focus on automating the method to counter trolls, as counter responses to combat trolls encourage commun…
▽ More
Online trolls increase social costs and cause psychological damage to individuals. With the proliferation of automated accounts making use of bots for trolling, it is difficult for targeted individual users to handle the situation both quantitatively and qualitatively. To address this issue, we focus on automating the method to counter trolls, as counter responses to combat trolls encourage community users to maintain ongoing discussion without compromising freedom of expression. For this purpose, we propose a novel dataset for automatic counter response generation. In particular, we constructed a pair-wise dataset that includes troll comments and counter responses with labeled response strategies, which enables models fine-tuned on our dataset to generate responses by varying counter responses according to the specified strategy. We conducted three tasks to assess the effectiveness of our dataset and evaluated the results through both automatic and human evaluation. In human evaluation, we demonstrate that the model fine-tuned on our dataset shows a significantly improved performance in strategy-controlled sentence generation.
△ Less
Submitted 7 September, 2022; v1 submitted 30 July, 2022;
originally announced August 2022.
-
A Generalized Hamming Distance of Sequence Patterns
Authors:
Pengyu Liu,
Jingzhou Na
Abstract:
We define sequence patterns of length $n$ and level $\ell$ to be equivalence classes of sequences that have $n$ elements from the set of $\ell$ integer symbols $\{1,2,\ldots,\ell\}$ with no restriction on repetition, where the equivalence relation is induced by symbol relabeling without swapping positions of symbols. We define a distance for a set of $k$ sequence patterns of length $n$ and level…
▽ More
We define sequence patterns of length $n$ and level $\ell$ to be equivalence classes of sequences that have $n$ elements from the set of $\ell$ integer symbols $\{1,2,\ldots,\ell\}$ with no restriction on repetition, where the equivalence relation is induced by symbol relabeling without swapping positions of symbols. We define a distance for a set of $k$ sequence patterns of length $n$ and level $\ell$ by generalizing the Hamming distance between sequences. We compute the maximal distance for $k$ sequence patterns of length $n$ and level $\ell$ and demonstrate how to calculate the exact distance between a pair of length-$n$ level-$\ell$ sequence patterns.
△ Less
Submitted 6 January, 2023; v1 submitted 4 April, 2022;
originally announced April 2022.
-
Pose-MUM : Reinforcing Key Points Relationship for Semi-Supervised Human Pose Estimation
Authors:
JongMok Kim,
Hwijun Lee,
Jaeseung Lim,
Jongkeun Na,
Nojun Kwak,
Jin Young Choi
Abstract:
A well-designed strong-weak augmentation strategy and the stable teacher to generate reliable pseudo labels are essential in the teacher-student framework of semi-supervised learning (SSL). Considering these in mind, to suit the semi-supervised human pose estimation (SSHPE) task, we propose a novel approach referred to as Pose-MUM that modifies Mix/UnMix (MUM) augmentation. Like MUM in the dense p…
▽ More
A well-designed strong-weak augmentation strategy and the stable teacher to generate reliable pseudo labels are essential in the teacher-student framework of semi-supervised learning (SSL). Considering these in mind, to suit the semi-supervised human pose estimation (SSHPE) task, we propose a novel approach referred to as Pose-MUM that modifies Mix/UnMix (MUM) augmentation. Like MUM in the dense prediction task, the proposed Pose-MUM makes strong-weak augmentation for pose estimation and leads the network to learn the relationship between each human key point much better than the conventional methods by adding the mixing process in intermediate layers in a stochastic manner. In addition, we employ the exponential-moving-average-normalization (EMAN) teacher, which is stable and well-suited to the SSL framework and furthermore boosts the performance. Extensive experiments on MS-COCO dataset show the superiority of our proposed method by consistently improving the performance over the previous methods following SSHPE benchmark.
△ Less
Submitted 15 March, 2022;
originally announced March 2022.
-
A group-based structure for perfect sequence covering arrays
Authors:
Jingzhou Na,
Jonathan Jedwab,
Shuxing Li
Abstract:
An $(n,k)$-perfect sequence covering array with multiplicity $λ$, denoted PSCA$(n,k,λ)$, is a multiset whose elements are permutations of the sequence $(1,2, \dots, n)$ and which collectively contain each ordered length $k$ subsequence exactly $λ$ times. The primary objective is to determine for each pair $(n,k)$ the smallest value of $λ$, denoted $g(n,k)$, for which a PSCA$(n,k,λ)$ exists; and mo…
▽ More
An $(n,k)$-perfect sequence covering array with multiplicity $λ$, denoted PSCA$(n,k,λ)$, is a multiset whose elements are permutations of the sequence $(1,2, \dots, n)$ and which collectively contain each ordered length $k$ subsequence exactly $λ$ times. The primary objective is to determine for each pair $(n,k)$ the smallest value of $λ$, denoted $g(n,k)$, for which a PSCA$(n,k,λ)$ exists; and more generally, the complete set of values $λ$ for which a PSCA$(n,k,λ)$ exists. Yuster recently determined the first known value of $g(n,k)$ greater than 1, namely $g(5,3)=2$, and suggested that finding other such values would be challenging. We show that $g(6,3)=g(7,3)=2$, using a recursive search method inspired by an old algorithm due to Mathon. We then impose a group-based structure on a perfect sequence covering array by restricting it to be a union of distinct cosets of a prescribed nontrivial subgroup of the symmetric group $S_n$. This allows us to determine the new results that $g(7,4)=2$ and $g(7,5) \in \{2,3,4\}$ and $g(8,3) \in \{2,3\}$ and $g(9,3) \in \{2,3,4\}$. We also show that, for each $(n,k) \in \{ (5,3), (6,3), (7,3), (7,4) \}$, there exists a PSCA$(n,k,λ)$ if and only if $λ\ge 2$; and that there exists a PSCA$(8,3,λ)$ if and only if $λ\ge g(8,3)$.
△ Less
Submitted 3 February, 2022;
originally announced February 2022.
-
Contrastive Vicinal Space for Unsupervised Domain Adaptation
Authors:
Jaemin Na,
Dongyoon Han,
Hyung Jin Chang,
Wonjun Hwang
Abstract:
Recent unsupervised domain adaptation methods have utilized vicinal space between the source and target domains. However, the equilibrium collapse of labels, a problem where the source labels are dominant over the target labels in the predictions of vicinal instances, has never been addressed. In this paper, we propose an instance-wise minimax strategy that minimizes the entropy of high uncertaint…
▽ More
Recent unsupervised domain adaptation methods have utilized vicinal space between the source and target domains. However, the equilibrium collapse of labels, a problem where the source labels are dominant over the target labels in the predictions of vicinal instances, has never been addressed. In this paper, we propose an instance-wise minimax strategy that minimizes the entropy of high uncertainty instances in the vicinal space to tackle the stated problem. We divide the vicinal space into two subspaces through the solution of the minimax problem: contrastive space and consensus space. In the contrastive space, inter-domain discrepancy is mitigated by constraining instances to have contrastive views and labels, and the consensus space reduces the confusion between intra-domain categories. The effectiveness of our method is demonstrated on public benchmarks, including Office-31, Office-Home, and VisDA-C, achieving state-of-the-art performances. We further show that our method outperforms the current state-of-the-art methods on PACS, which indicates that our instance-wise approach works well for multi-source domain adaptation as well. Code is available at https://github.com/NaJaeMin92/CoVi.
△ Less
Submitted 18 July, 2022; v1 submitted 26 November, 2021;
originally announced November 2021.
-
MUM : Mix Image Tiles and UnMix Feature Tiles for Semi-Supervised Object Detection
Authors:
JongMok Kim,
Jooyoung Jang,
Seunghyeon Seo,
Jisoo Jeong,
Jongkeun Na,
Nojun Kwak
Abstract:
Many recent semi-supervised learning (SSL) studies build teacher-student architecture and train the student network by the generated supervisory signal from the teacher. Data augmentation strategy plays a significant role in the SSL framework since it is hard to create a weak-strong augmented input pair without losing label information. Especially when extending SSL to semi-supervised object detec…
▽ More
Many recent semi-supervised learning (SSL) studies build teacher-student architecture and train the student network by the generated supervisory signal from the teacher. Data augmentation strategy plays a significant role in the SSL framework since it is hard to create a weak-strong augmented input pair without losing label information. Especially when extending SSL to semi-supervised object detection (SSOD), many strong augmentation methodologies related to image geometry and interpolation-regularization are hard to utilize since they possibly hurt the location information of the bounding box in the object detection task. To address this, we introduce a simple yet effective data augmentation method, Mix/UnMix (MUM), which unmixes feature tiles for the mixed image tiles for the SSOD framework. Our proposed method makes mixed input image tiles and reconstructs them in the feature space. Thus, MUM can enjoy the interpolation-regularization effect from non-interpolated pseudo-labels and successfully generate a meaningful weak-strong pair. Furthermore, MUM can be easily equipped on top of various SSOD methods. Extensive experiments on MS-COCO and PASCAL VOC datasets demonstrate the superiority of MUM by consistently improving the mAP performance over the baseline in all the tested SSOD benchmark protocols.
△ Less
Submitted 15 March, 2022; v1 submitted 21 November, 2021;
originally announced November 2021.
-
Collaborative Cloud and Edge Mobile Computing in C-RAN Systems with Minimal End-to-End Latency
Authors:
Seok-Hwan Park,
Seongah Jeong,
Jinyeop Na,
Osvaldo Simeone,
Shlomo Shamai
Abstract:
Mobile cloud and edge computing protocols make it possible to offer computationally heavy applications to mobile devices via computational offloading from devices to nearby edge servers or more powerful, but remote, cloud servers. Previous work assumed that computational tasks can be fractionally offloaded at both cloud processor (CP) and at a local edge node (EN) within a conventional Distributed…
▽ More
Mobile cloud and edge computing protocols make it possible to offer computationally heavy applications to mobile devices via computational offloading from devices to nearby edge servers or more powerful, but remote, cloud servers. Previous work assumed that computational tasks can be fractionally offloaded at both cloud processor (CP) and at a local edge node (EN) within a conventional Distributed Radio Access Network (D-RAN) that relies on non-cooperative ENs equipped with one-way uplink fronthaul connection to the cloud. In this paper, we propose to integrate collaborative fractional computing across CP and ENs within a Cloud RAN (C-RAN) architecture with finite-capacity two-way fronthaul links. Accordingly, tasks offloaded by a mobile device can be partially carried out at an EN and the CP, with multiple ENs communicating with a common CP to exchange data and computational outcomes while allowing for centralized precoding and decoding. Unlike prior work, we investigate joint optimization of computing and communication resources, including wireless and fronthaul segments, to minimize the end-to-end latency by accounting for a two-way uplink and downlink transmission. The problem is tackled by using fractional programming (FP) and matrix FP. Extensive numerical results validate the performance gain of the proposed architecture as compared to the previously studied D-RAN solution.
△ Less
Submitted 30 March, 2021;
originally announced March 2021.
-
Generative Chemical Transformer: Neural Machine Learning of Molecular Geometric Structures from Chemical Language via Attention
Authors:
Hyunseung Kim,
Jonggeol Na,
Won Bo Lee
Abstract:
Discovering new materials better suited to specific purposes is an important issue in improving the quality of human life. Here, a neural network that creates molecules that meet some desired conditions based on a deep understanding of chemical language is proposed (Generative Chemical Transformer, GCT). The attention mechanism in GCT allows a deeper understanding of molecular structures beyond th…
▽ More
Discovering new materials better suited to specific purposes is an important issue in improving the quality of human life. Here, a neural network that creates molecules that meet some desired conditions based on a deep understanding of chemical language is proposed (Generative Chemical Transformer, GCT). The attention mechanism in GCT allows a deeper understanding of molecular structures beyond the limitations of chemical language itself which cause semantic discontinuity by paying attention to characters sparsely. It is investigated that the significance of language models for inverse molecular design problems by quantitatively evaluating the quality of the generated molecules. GCT generates highly realistic chemical strings that satisfy both chemical and linguistic grammar rules. Molecules parsed from generated strings simultaneously satisfy the multiple target properties and vary for a single condition set. These advances will contribute to improving the quality of human life by accelerating the process of desired material discovery.
△ Less
Submitted 3 December, 2021; v1 submitted 27 February, 2021;
originally announced March 2021.
-
FixBi: Bridging Domain Spaces for Unsupervised Domain Adaptation
Authors:
Jaemin Na,
Heechul Jung,
Hyung Jin Chang,
Wonjun Hwang
Abstract:
Unsupervised domain adaptation (UDA) methods for learning domain invariant representations have achieved remarkable progress. However, most of the studies were based on direct adaptation from the source domain to the target domain and have suffered from large domain discrepancies. In this paper, we propose a UDA method that effectively handles such large domain discrepancies. We introduce a fixed…
▽ More
Unsupervised domain adaptation (UDA) methods for learning domain invariant representations have achieved remarkable progress. However, most of the studies were based on direct adaptation from the source domain to the target domain and have suffered from large domain discrepancies. In this paper, we propose a UDA method that effectively handles such large domain discrepancies. We introduce a fixed ratio-based mixup to augment multiple intermediate domains between the source and target domain. From the augmented-domains, we train the source-dominant model and the target-dominant model that have complementary characteristics. Using our confidence-based learning methodologies, e.g., bidirectional matching with high-confidence predictions and self-penalization using low-confidence predictions, the models can learn from each other or from its own results. Through our proposed methods, the models gradually transfer domain knowledge from the source to the target domain. Extensive experiments demonstrate the superiority of our proposed method on three public benchmarks: Office-31, Office-Home, and VisDA-2017.
△ Less
Submitted 25 March, 2021; v1 submitted 18 November, 2020;
originally announced November 2020.
-
Densely Guided Knowledge Distillation using Multiple Teacher Assistants
Authors:
Wonchul Son,
Jaemin Na,
Junyong Choi,
Wonjun Hwang
Abstract:
With the success of deep neural networks, knowledge distillation which guides the learning of a small student network from a large teacher network is being actively studied for model compression and transfer learning. However, few studies have been performed to resolve the poor learning issue of the student network when the student and teacher model sizes significantly differ. In this paper, we pr…
▽ More
With the success of deep neural networks, knowledge distillation which guides the learning of a small student network from a large teacher network is being actively studied for model compression and transfer learning. However, few studies have been performed to resolve the poor learning issue of the student network when the student and teacher model sizes significantly differ. In this paper, we propose a densely guided knowledge distillation using multiple teacher assistants that gradually decreases the model size to efficiently bridge the large gap between the teacher and student networks. To stimulate more efficient learning of the student network, we guide each teacher assistant to every other smaller teacher assistants iteratively. Specifically, when teaching a smaller teacher assistant at the next step, the existing larger teacher assistants from the previous step are used as well as the teacher network. Moreover, we design stochastic teaching where, for each mini-batch, a teacher or teacher assistants are randomly dropped. This acts as a regularizer to improve the efficiency of teaching of the student network. Thus, the student can always learn salient distilled knowledge from the multiple sources. We verified the effectiveness of the proposed method for a classification task using CIFAR-10, CIFAR-100, and ImageNet. We also achieved significant performance improvements with various backbone architectures such as ResNet, WideResNet, and VGG.
△ Less
Submitted 9 August, 2021; v1 submitted 18 September, 2020;
originally announced September 2020.
-
Positional Attention-based Frame Identification with BERT: A Deep Learning Approach to Target Disambiguation and Semantic Frame Selection
Authors:
Sang-Sang Tan,
Jin-Cheon Na
Abstract:
Semantic parsing is the task of transforming sentences from natural language into formal representations of predicate-argument structures. Under this research area, frame-semantic parsing has attracted much interest. This parsing approach leverages the lexical information defined in FrameNet to associate marked predicates or targets with semantic frames, thereby assigning semantic roles to sentenc…
▽ More
Semantic parsing is the task of transforming sentences from natural language into formal representations of predicate-argument structures. Under this research area, frame-semantic parsing has attracted much interest. This parsing approach leverages the lexical information defined in FrameNet to associate marked predicates or targets with semantic frames, thereby assigning semantic roles to sentence components based on pre-specified frame elements in FrameNet. In this paper, a deep neural network architecture known as Positional Attention-based Frame Identification with BERT (PAFIBERT) is presented as a solution to the frame identification subtask in frame-semantic parsing. Although the importance of this subtask is well-established, prior research has yet to find a robust solution that works satisfactorily for both in-domain and out-of-domain data. This study thus set out to improve frame identification in light of recent advancements of language modeling and transfer learning in natural language processing. The proposed method is partially empowered by BERT, a pre-trained language model that excels at capturing contextual information in texts. By combining the language representation power of BERT with a position-based attention mechanism, PAFIBERT is able to attend to target-specific contexts in sentences for disambiguating targets and associating them with the most suitable semantic frames. Under various experimental settings, PAFIBERT outperformed existing solutions by a significant margin, achieving new state-of-the-art results for both in-domain and out-of-domain benchmark test sets.
△ Less
Submitted 31 October, 2019;
originally announced October 2019.
-
Energy-Efficient Task Offloading for Vehicular Edge Computing: Joint Optimization of Offloading and Bit Allocation
Authors:
Youngsu Jang,
Jinyeop Na,
Seongah Jeong,
Joonhyuk Kang
Abstract:
With the rapid development of vehicular networks, various applications that require high computation resources have emerged. To efficiently execute these applications, vehicular edge computing (VEC) can be employed. VEC offloads the computation tasks to the VEC node, i.e., the road side unit (RSU), which improves vehicular service and reduces energy consumption of the vehicle. However, communicati…
▽ More
With the rapid development of vehicular networks, various applications that require high computation resources have emerged. To efficiently execute these applications, vehicular edge computing (VEC) can be employed. VEC offloads the computation tasks to the VEC node, i.e., the road side unit (RSU), which improves vehicular service and reduces energy consumption of the vehicle. However, communication environment is time-varying due to the movement of the vehicle, so that finding the optimal offloading parameters is still an open problem. Therefore, it is necessary to investigate an optimal offloading strategy for effective energy savings in energy-limited vehicles. In this paper, we consider the changes of communication environment due to various speeds of vehicles, which are not considered in previous studies. Then, we jointly optimize the offloading proportion and uplink/computation/downlink bit allocation of multiple vehicles, for the purpose of minimizing the total energy consumption of the vehicles under the delay constraint. Numerical results demonstrate that the proposed energy-efficient offloading strategy significantly reduces the total energy consumption.
△ Less
Submitted 15 October, 2019;
originally announced October 2019.
-
A Deep Ranking Model for Spatio-Temporal Highlight Detection from a 360 Video
Authors:
Youngjae Yu,
Sangho Lee,
Joonil Na,
Jaeyun Kang,
Gunhee Kim
Abstract:
We address the problem of highlight detection from a 360 degree video by summarizing it both spatially and temporally. Given a long 360 degree video, we spatially select pleasantly-looking normal field-of-view (NFOV) segments from unlimited field of views (FOV) of the 360 degree video, and temporally summarize it into a concise and informative highlight as a selected subset of subshots. We propose…
▽ More
We address the problem of highlight detection from a 360 degree video by summarizing it both spatially and temporally. Given a long 360 degree video, we spatially select pleasantly-looking normal field-of-view (NFOV) segments from unlimited field of views (FOV) of the 360 degree video, and temporally summarize it into a concise and informative highlight as a selected subset of subshots. We propose a novel deep ranking model named as Composition View Score (CVS) model, which produces a spherical score map of composition per video segment, and determines which view is suitable for highlight via a sliding window kernel at inference. To evaluate the proposed framework, we perform experiments on the Pano2Vid benchmark dataset and our newly collected 360 degree video highlight dataset from YouTube and Vimeo. Through evaluation using both quantitative summarization metrics and user studies via Amazon Mechanical Turk, we demonstrate that our approach outperforms several state-of-the-art highlight detection methods. We also show that our model is 16 times faster at inference than AutoCam, which is one of the first summarization algorithms of 360 degree videos
△ Less
Submitted 31 January, 2018;
originally announced January 2018.
-
Learning Traffic as Images: A Deep Convolutional Neural Network for Large-Scale Transportation Network Speed Prediction
Authors:
Xiaolei Ma,
Zhuang Dai,
Zhengbing He,
Jihui Na,
Yong Wang,
Yunpeng Wang
Abstract:
This paper proposes a convolutional neural network (CNN)-based method that learns traffic as images and predicts large-scale, network-wide traffic speed with a high accuracy. Spatiotemporal traffic dynamics are converted to images describing the time and space relations of traffic flow via a two-dimensional time-space matrix. A CNN is applied to the image following two consecutive steps: abstract…
▽ More
This paper proposes a convolutional neural network (CNN)-based method that learns traffic as images and predicts large-scale, network-wide traffic speed with a high accuracy. Spatiotemporal traffic dynamics are converted to images describing the time and space relations of traffic flow via a two-dimensional time-space matrix. A CNN is applied to the image following two consecutive steps: abstract traffic feature extraction and network-wide traffic speed prediction. The effectiveness of the proposed method is evaluated by taking two real-world transportation networks, the second ring road and north-east transportation network in Beijing, as examples, and comparing the method with four prevailing algorithms, namely, ordinary least squares, k-nearest neighbors, artificial neural network, and random forest, and three deep learning architectures, namely, stacked autoencoder, recurrent neural network, and long-short-term memory network. The results show that the proposed method outperforms other algorithms by an average accuracy improvement of 42.91% within an acceptable execution time. The CNN can train the model in a reasonable time and, thus, is suitable for large-scale transportation networks.
△ Less
Submitted 10 April, 2017; v1 submitted 16 January, 2017;
originally announced January 2017.
-
FM-index of Alignment with Gaps
Authors:
Joong Chae Na,
Hyunjoon Kim,
Seunghwan Min,
Heejin Park,
Thierry Lecroq,
Martine Leonard,
Laurent Mouchardd,
Kunsoo Park
Abstract:
Recently, a compressed index for similar strings, called the FM-index of alignment (FMA), has been proposed with the functionalities of pattern search and random access. The FMA is quite efficient in space requirement and pattern search time, but it is applicable only for an alignment of similar strings without gaps. In this paper we propose the FM-index of alignment with gaps, a realistic index f…
▽ More
Recently, a compressed index for similar strings, called the FM-index of alignment (FMA), has been proposed with the functionalities of pattern search and random access. The FMA is quite efficient in space requirement and pattern search time, but it is applicable only for an alignment of similar strings without gaps. In this paper we propose the FM-index of alignment with gaps, a realistic index for similar strings, which allows gaps in their alignment. For this, we design a new version of the suffix array of alignment by using alignment transformation and a new definition of the alignment-suffix. The new suffix array of alignment enables us to support the LF-mapping and backward search, the key functionalities of the FM-index, regardless of gap existence in the alignment. We experimentally compared our index with RLCSA due to Makinen et al. on 100 genome sequences from the 1000 Genomes Project. The index size of our index is less than one third of that of RLCSA.
△ Less
Submitted 13 June, 2016;
originally announced June 2016.
-
On the total variation distance between the binomial random graph and the random intersection graph
Authors:
Jeong Han Kim,
Sang June Lee,
Joohan Na
Abstract:
When each vertex is assigned a set, the intersection graph generated by the sets is the graph in which two distinct vertices are joined by an edge if and only if their assigned sets have a nonempty intersection. An interval graph is an intersection graph generated by intervals in the real line. A chordal graph can be considered as an intersection graph generated by subtrees of a tree. In 1999, Kar…
▽ More
When each vertex is assigned a set, the intersection graph generated by the sets is the graph in which two distinct vertices are joined by an edge if and only if their assigned sets have a nonempty intersection. An interval graph is an intersection graph generated by intervals in the real line. A chordal graph can be considered as an intersection graph generated by subtrees of a tree. In 1999, Karoński, Scheinerman and Singer-Cohen [Combin Probab Comput 8 (1999), 131--159] introduced a random intersection graph by taking randomly assigned sets. The random intersection graph $G(n,m;p)$ has $n$ vertices and sets assigned to the vertices are chosen to be i.i.d. random subsets of a fixed set $M$ of size $m$ where each element of $M$ belongs to each random subset with probability $p$, independently of all other elements in $M$. Fill, Scheinerman and Singer-Cohen [Random Struct Algorithms 16 (2000), 156--176] showed that the total variation distance between the random graph $G(n,m;p)$ and the Erdös-Rényi graph $G(n,\hat{p})$ tends to $0$ for any $0 \leq p=p(n) \leq 1$ if $m=n^α$, $α>6$, where $\hat{p}$ is chosen so that the expected numbers of edges in the two graphs are the same. In this paper, it is proved that the total variation distance still tends to $0$ for any $0 \leq p=p(n) \leq 1$ whenever $m \gg n^4$.
△ Less
Submitted 9 February, 2017; v1 submitted 10 June, 2015;
originally announced June 2015.
-
Product vectors in the ranges of multi-partite states with positive partial transposes and permanents of matrices
Authors:
Young-Hoon Kiem,
Seung-Hyeok Kye,
Joohan Na
Abstract:
In this paper, we consider a system of homogeneous algebraic equations in complex variables and their conjugates, which arise naturally from the range criterion for separability of PPT states. We examine systematically these equations to get sufficient conditions for the existence of nontrivial solutions. This gives us possible upper bounds of ranks of PPT entangled edge states and their partial t…
▽ More
In this paper, we consider a system of homogeneous algebraic equations in complex variables and their conjugates, which arise naturally from the range criterion for separability of PPT states. We examine systematically these equations to get sufficient conditions for the existence of nontrivial solutions. This gives us possible upper bounds of ranks of PPT entangled edge states and their partial transposes. We will focus on the multi-partite cases which are much more delicate than the bi-partite cases. We use the notion of permanents of matrices as well as techniques from algebraic geometry through the discussion.
△ Less
Submitted 31 May, 2015; v1 submitted 14 January, 2014;
originally announced January 2014.
-
The number of product vectors and their partial conjugates in a pair of spaces
Authors:
Joohan Na
Abstract:
Let $D$ and $E$ be subspaces of the tensor product of the finite-dimensional Hilbert spaces $\mathbb{C}^m \otimes \mathbb{C}^n$. We show that the number of product vectors in $D$ with their partial conjugates in $E$ is uniformly bounded depending only on $m$ and $n$ whenever it is finite. We also give an upper bound in qubit-qunit case which we expect to be sharp.
Let $D$ and $E$ be subspaces of the tensor product of the finite-dimensional Hilbert spaces $\mathbb{C}^m \otimes \mathbb{C}^n$. We show that the number of product vectors in $D$ with their partial conjugates in $E$ is uniformly bounded depending only on $m$ and $n$ whenever it is finite. We also give an upper bound in qubit-qunit case which we expect to be sharp.
△ Less
Submitted 24 September, 2013; v1 submitted 17 September, 2013;
originally announced September 2013.
-
Suffix Tree of Alignment: An Efficient Index for Similar Data
Authors:
Joong Chae Na,
Heejin Park,
Maxime Crochemore,
Jan Holub,
Costas S. Iliopoulos,
Laurent Mouchard,
Kunsoo Park
Abstract:
We consider an index data structure for similar strings. The generalized suffix tree can be a solution for this. The generalized suffix tree of two strings $A$ and $B$ is a compacted trie representing all suffixes in $A$ and $B$. It has $|A|+|B|$ leaves and can be constructed in $O(|A|+|B|)$ time. However, if the two strings are similar, the generalized suffix tree is not efficient because it does…
▽ More
We consider an index data structure for similar strings. The generalized suffix tree can be a solution for this. The generalized suffix tree of two strings $A$ and $B$ is a compacted trie representing all suffixes in $A$ and $B$. It has $|A|+|B|$ leaves and can be constructed in $O(|A|+|B|)$ time. However, if the two strings are similar, the generalized suffix tree is not efficient because it does not exploit the similarity which is usually represented as an alignment of $A$ and $B$.
In this paper we propose a space/time-efficient suffix tree of alignment which wisely exploits the similarity in an alignment. Our suffix tree for an alignment of $A$ and $B$ has $|A| + l_d + l_1$ leaves where $l_d$ is the sum of the lengths of all parts of $B$ different from $A$ and $l_1$ is the sum of the lengths of some common parts of $A$ and $B$. We did not compromise the pattern search to reduce the space. Our suffix tree can be searched for a pattern $P$ in $O(|P|+occ)$ time where $occ$ is the number of occurrences of $P$ in $A$ and $B$. We also present an efficient algorithm to construct the suffix tree of alignment. When the suffix tree is constructed from scratch, the algorithm requires $O(|A| + l_d + l_1 + l_2)$ time where $l_2$ is the sum of the lengths of other common substrings of $A$ and $B$. When the suffix tree of $A$ is already given, it requires $O(l_d + l_1 + l_2)$ time.
△ Less
Submitted 8 May, 2013;
originally announced May 2013.