-
A Distributed Consensus Algorithm for Autonomous Vehicles Deciding Entering Orders on Intesections without Traffic Signals
Authors:
Younjeong Lee,
Young Yoon
Abstract:
We propose a methodology for connected autonomous vehicles (CAVs) to determine their passing priority at unsignalized intersections where they coexist with human-driven vehicles (HVs). Assuming that CAVs can perceive the entry order of surrounding vehicles using computer vision technology and are capable of avoiding collisions, we introduce a voting-based distributed consensus algorithm inspired b…
▽ More
We propose a methodology for connected autonomous vehicles (CAVs) to determine their passing priority at unsignalized intersections where they coexist with human-driven vehicles (HVs). Assuming that CAVs can perceive the entry order of surrounding vehicles using computer vision technology and are capable of avoiding collisions, we introduce a voting-based distributed consensus algorithm inspired by Raft to resolve tie-breaking among simultaneously arriving CAVs. The algorithm is structured around the candidate and leader election processes and incorporates a minimal consensus quorum to ensure both safety and liveness among CAVs under typical asynchronous communication conditions. Assuming CAVs to be SAE (Society of Automotive Engineers) Level-4 or higher autonomous vehicles, we implemented the proposed distributed consensus algorithm using gRPC. By adjusting variables such as the CAV-to-HV ratio, intersection scale, and the processing time of computer vision modules, we demonstrated that stable consensus can be achieved even under mixed-traffic conditions involving HVs without adequate functionalities to interact with CAVs. Experimental results show that the proposed algorithm reached consensus at a typical unsignalized four-way, two-lane intersection in approximately 30-40 ms on average. A secondary vision-based system is employed to complete the crossing priorities based on the recognized lexicographical order of the license plate numbers in case the consensus procedure times out on an unreliable vehicle-to-vehicle communication network. The significance of this study lies in its ability to improve traffic flow at unsignalized intersections by enabling rapid determination of passing priority through distributed consensus even under mixed traffic with faulty vehicles.
△ Less
Submitted 4 July, 2025;
originally announced July 2025.
-
Advanced approach for Agile/Scrum Process: RetroAI++
Authors:
Maria Spichkova,
Kevin Iwan,
Madeleine Zwart,
Hina Lee,
Yuwon Yoon,
Xiaohan Qin
Abstract:
In Agile/Scrum software development, sprint planning and retrospective analysis are the key elements of project management. The aim of our work is to support software developers in these activities. In this paper, we present our prototype tool RetroAI++, based on emerging intelligent technologies. In our RetroAI++ prototype, we aim to automate and refine the practical application of Agile/Scrum pr…
▽ More
In Agile/Scrum software development, sprint planning and retrospective analysis are the key elements of project management. The aim of our work is to support software developers in these activities. In this paper, we present our prototype tool RetroAI++, based on emerging intelligent technologies. In our RetroAI++ prototype, we aim to automate and refine the practical application of Agile/Scrum processes within Sprint Planning and Retrospectives. Leveraging AI insights, our prototype aims to automate and refine the many processes involved in the Sprint Planning, Development and Retrospective stages of Agile/Scrum development projects, offering intelligent suggestions for sprint organisation as well as meaningful insights for retrospective reflection.
△ Less
Submitted 18 June, 2025;
originally announced June 2025.
-
Don't Just Follow MLLM Plans: Robust and Efficient Planning for Open-world Agents
Authors:
Seungjoon Lee,
Suhwan Kim,
Minhyeon Oh,
Youngsik Yoon,
Jungseul Ok
Abstract:
Developing autonomous agents capable of mastering complex, multi-step tasks in unpredictable, interactive environments presents a significant challenge. While Large Language Models (LLMs) offer promise for planning, existing approaches often rely on problematic internal knowledge or make unrealistic environmental assumptions. Although recent work explores learning planning knowledge, they still re…
▽ More
Developing autonomous agents capable of mastering complex, multi-step tasks in unpredictable, interactive environments presents a significant challenge. While Large Language Models (LLMs) offer promise for planning, existing approaches often rely on problematic internal knowledge or make unrealistic environmental assumptions. Although recent work explores learning planning knowledge, they still retain limitations due to partial reliance on external knowledge or impractical setups. Indeed, prior research has largely overlooked developing agents capable of acquiring planning knowledge from scratch, directly in realistic settings. While realizing this capability is necessary, it presents significant challenges, primarily achieving robustness given the substantial risk of incorporating LLMs' inaccurate knowledge. Moreover, efficiency is crucial for practicality as learning can demand prohibitive exploration. In response, we introduce Robust and Efficient Planning for Open-world Agents (REPOA), a novel framework designed to tackle these issues. REPOA features three key components: adaptive dependency learning and fine-grained failure-aware operation memory to enhance robustness to knowledge inaccuracies, and difficulty-based exploration to improve learning efficiency. Our evaluation in two established open-world testbeds demonstrates REPOA's robust and efficient planning, showcasing its capability to successfully obtain challenging late-game items that were beyond the reach of prior approaches.
△ Less
Submitted 29 May, 2025;
originally announced May 2025.
-
Hypothetical Documents or Knowledge Leakage? Rethinking LLM-based Query Expansion
Authors:
Yejun Yoon,
Jaeyoon Jung,
Seunghyun Yoon,
Kunwoo Park
Abstract:
Query expansion methods powered by large language models (LLMs) have demonstrated effectiveness in zero-shot retrieval tasks. These methods assume that LLMs can generate hypothetical documents that, when incorporated into a query vector, enhance the retrieval of real evidence. However, we challenge this assumption by investigating whether knowledge leakage in benchmarks contributes to the observed…
▽ More
Query expansion methods powered by large language models (LLMs) have demonstrated effectiveness in zero-shot retrieval tasks. These methods assume that LLMs can generate hypothetical documents that, when incorporated into a query vector, enhance the retrieval of real evidence. However, we challenge this assumption by investigating whether knowledge leakage in benchmarks contributes to the observed performance gains. Using fact verification as a testbed, we analyze whether the generated documents contain information entailed by ground-truth evidence and assess their impact on performance. Our findings indicate that, on average, performance improvements consistently occurred for claims whose generated documents included sentences entailed by gold evidence. This suggests that knowledge leakage may be present in fact-verification benchmarks, potentially inflating the perceived performance of LLM-based query expansion methods.
△ Less
Submitted 4 June, 2025; v1 submitted 19 April, 2025;
originally announced April 2025.
-
Agile Retrospectives: What went well? What didn't go well? What should we do?
Authors:
Maria Spichkova,
Hina Lee,
Kevin Iwan,
Madeleine Zwart,
Yuwon Yoon,
Xiaohan Qin
Abstract:
In Agile/Scrum software development, the idea of retrospective meetings (retros) is one of the core elements of the project process. In this paper, we present our work in progress focusing on two aspects: analysis of potential usage of generative AI for information interaction within retrospective meetings, and visualisation of retros' information to software development teams. We also present our…
▽ More
In Agile/Scrum software development, the idea of retrospective meetings (retros) is one of the core elements of the project process. In this paper, we present our work in progress focusing on two aspects: analysis of potential usage of generative AI for information interaction within retrospective meetings, and visualisation of retros' information to software development teams. We also present our prototype tool RetroAI++, focusing on retros-related functionalities.
△ Less
Submitted 16 April, 2025;
originally announced April 2025.
-
Traversal Learning Coordination For Lossless And Efficient Distributed Learning
Authors:
Erdenebileg Batbaatar,
Jeonggeol Kim,
Yongcheol Kim,
Young Yoon
Abstract:
In this paper, we introduce Traversal Learning (TL), a novel approach designed to address the problem of decreased quality encountered in popular distributed learning (DL) paradigms such as Federated Learning (FL), Split Learning (SL), and SplitFed Learning (SFL). Traditional FL experiences from an accuracy drop during aggregation due to its averaging function, while SL and SFL face increased loss…
▽ More
In this paper, we introduce Traversal Learning (TL), a novel approach designed to address the problem of decreased quality encountered in popular distributed learning (DL) paradigms such as Federated Learning (FL), Split Learning (SL), and SplitFed Learning (SFL). Traditional FL experiences from an accuracy drop during aggregation due to its averaging function, while SL and SFL face increased loss due to the independent gradient updates on each split network. TL adopts a unique strategy where the model traverses the nodes during forward propagation (FP) and performs backward propagation (BP) on the orchestrator, effectively implementing centralized learning (CL) principles within a distributed environment. The orchestrator is tasked with generating virtual batches and planning the sequential node visits of the model during FP, aligning them with the ordered index of the data within these batches. We conducted experiments on six datasets representing diverse characteristics across various domains. Our evaluation demonstrates that TL is on par with classic CL approaches in terms of accurate inference, thereby offering a viable and robust solution for DL tasks. TL outperformed other DL methods and improved accuracy by 7.85% for independent and identically distributed (IID) datasets, macro F1-score by 1.06% for non-IID datasets, accuracy by 2.60% for text classification, and AUC by 3.88% and 4.54% for medical and financial datasets, respectively. By effectively preserving data privacy while maintaining performance, TL represents a significant advancement in DL methodologies.
△ Less
Submitted 10 April, 2025;
originally announced April 2025.
-
Payload-Aware Intrusion Detection with CMAE and Large Language Models
Authors:
Yongcheol Kim,
Chanjae Lee,
Young Yoon
Abstract:
Intrusion Detection Systems (IDS) are crucial for identifying malicious traffic, yet traditional signature-based methods struggle with zero-day attacks and high false positive rates. AI-driven packet-capture analysis offers a promising alternative. However, existing approaches rely heavily on flow-based or statistical features, limiting their ability to detect fine-grained attack patterns. This st…
▽ More
Intrusion Detection Systems (IDS) are crucial for identifying malicious traffic, yet traditional signature-based methods struggle with zero-day attacks and high false positive rates. AI-driven packet-capture analysis offers a promising alternative. However, existing approaches rely heavily on flow-based or statistical features, limiting their ability to detect fine-grained attack patterns. This study proposes Xavier-CMAE, an enhanced Convolutional Multi-Head Attention Ensemble (CMAE) model that improves detection accuracy while reducing computational overhead. By replacing Word2Vec embeddings with a Hex2Int tokenizer and Xavier initialization, Xavier-CMAE eliminates pre-training, accelerates training, and achieves 99.971% accuracy with a 0.018% false positive rate, outperforming Word2Vec-based methods. Additionally, we introduce LLM-CMAE, which integrates pre-trained Large Language Model (LLM) tokenizers into CMAE. While LLMs enhance feature extraction, their computational cost hinders real-time detection. LLM-CMAE balances efficiency and performance, reaching 99.969% accuracy with a 0.019% false positive rate. This work advances AI-powered IDS by (1) introducing a payload-based detection framework, (2) enhancing efficiency with Xavier-CMAE, and (3) integrating LLM tokenizers for improved real-time detection.
△ Less
Submitted 22 March, 2025;
originally announced March 2025.
-
Learning Dexterous Bimanual Catch Skills through Adversarial-Cooperative Heterogeneous-Agent Reinforcement Learning
Authors:
Taewoo Kim,
Youngwoo Yoon,
Jaehong Kim
Abstract:
Robotic catching has traditionally focused on single-handed systems, which are limited in their ability to handle larger or more complex objects. In contrast, bimanual catching offers significant potential for improved dexterity and object handling but introduces new challenges in coordination and control. In this paper, we propose a novel framework for learning dexterous bimanual catching skills…
▽ More
Robotic catching has traditionally focused on single-handed systems, which are limited in their ability to handle larger or more complex objects. In contrast, bimanual catching offers significant potential for improved dexterity and object handling but introduces new challenges in coordination and control. In this paper, we propose a novel framework for learning dexterous bimanual catching skills using Heterogeneous-Agent Reinforcement Learning (HARL). Our approach introduces an adversarial reward scheme, where a throw agent increases the difficulty of throws-adjusting speed-while a catch agent learns to coordinate both hands to catch objects under these evolving conditions. We evaluate the framework in simulated environments using 15 different objects, demonstrating robustness and versatility in handling diverse objects. Our method achieved approximately a 2x increase in catching reward compared to single-agent baselines across 15 diverse objects.
△ Less
Submitted 16 February, 2025;
originally announced February 2025.
-
TopoCL: Topological Contrastive Learning for Time Series
Authors:
Namwoo Kim,
Hyungryul Baik,
Yoonjin Yoon
Abstract:
Universal time series representation learning is challenging but valuable in real-world applications such as classification, anomaly detection, and forecasting. Recently, contrastive learning (CL) has been actively explored to tackle time series representation. However, a key challenge is that the data augmentation process in CL can distort seasonal patterns or temporal dependencies, inevitably le…
▽ More
Universal time series representation learning is challenging but valuable in real-world applications such as classification, anomaly detection, and forecasting. Recently, contrastive learning (CL) has been actively explored to tackle time series representation. However, a key challenge is that the data augmentation process in CL can distort seasonal patterns or temporal dependencies, inevitably leading to a loss of semantic information. To address this challenge, we propose Topological Contrastive Learning for time series (TopoCL). TopoCL mitigates such information loss by incorporating persistent homology, which captures the topological characteristics of data that remain invariant under transformations. In this paper, we treat the temporal and topological properties of time series data as distinct modalities. Specifically, we compute persistent homology to construct topological features of time series data, representing them in persistence diagrams. We then design a neural network to encode these persistent diagrams. Our approach jointly optimizes CL within the time modality and time-topology correspondence, promoting a comprehensive understanding of both temporal semantics and topological properties of time series. We conduct extensive experiments on four downstream tasks-classification, anomaly detection, forecasting, and transfer learning. The results demonstrate that TopoCL achieves state-of-the-art performance.
△ Less
Submitted 5 February, 2025;
originally announced February 2025.
-
MobiCLR: Mobility Time Series Contrastive Learning for Urban Region Representations
Authors:
Namwoo Kim,
Takahiro Yabe,
Chanyoung Park,
Yoonjin Yoon
Abstract:
Recently, learning effective representations of urban regions has gained significant attention as a key approach to understanding urban dynamics and advancing smarter cities. Existing approaches have demonstrated the potential of leveraging mobility data to generate latent representations, providing valuable insights into the intrinsic characteristics of urban areas. However, incorporating the tem…
▽ More
Recently, learning effective representations of urban regions has gained significant attention as a key approach to understanding urban dynamics and advancing smarter cities. Existing approaches have demonstrated the potential of leveraging mobility data to generate latent representations, providing valuable insights into the intrinsic characteristics of urban areas. However, incorporating the temporal dynamics and detailed semantics inherent in human mobility patterns remains underexplored. To address this gap, we propose a novel urban region representation learning model, Mobility Time Series Contrastive Learning for Urban Region Representations (MobiCLR), designed to capture semantically meaningful embeddings from inflow and outflow mobility patterns. MobiCLR uses contrastive learning to enhance the discriminative power of its representations, applying an instance-wise contrastive loss to capture distinct flow-specific characteristics. Additionally, we develop a regularizer to align output features with these flow-specific representations, enabling a more comprehensive understanding of mobility dynamics. To validate our model, we conduct extensive experiments in Chicago, New York, and Washington, D.C. to predict income, educational attainment, and social vulnerability. The results demonstrate that our model outperforms state-of-the-art models.
△ Less
Submitted 5 February, 2025;
originally announced February 2025.
-
Integrating Urban Air Mobility with Highway Infrastructure: A Strategic Approach for Vertiport Location Selection in the Seoul Metropolitan Area
Authors:
Donghyun Yoon,
Minwoo Jeong,
Jinyong Lee,
Seyun Kim,
Yoonjin Yoon
Abstract:
This study focuses on identifying suitable locations for highway-transfer Vertiports to integrate Urban Air Mobility (UAM) with existing highway infrastructure. UAM offers an effective solution for enhancing transportation accessibility in the Seoul Metropolitan Area, where conventional transportation often struggle to connect suburban employment zones such as industrial parks. By integrating UAM…
▽ More
This study focuses on identifying suitable locations for highway-transfer Vertiports to integrate Urban Air Mobility (UAM) with existing highway infrastructure. UAM offers an effective solution for enhancing transportation accessibility in the Seoul Metropolitan Area, where conventional transportation often struggle to connect suburban employment zones such as industrial parks. By integrating UAM with ground transportation at highway facilities, an efficient connectivity solution can be achieved for regions with limited transportation options. Our proposed methodology for determining the suitable Vertiport locations utilizes data such as geographic information, origin-destination volume, and travel time. Vertiport candidates are evaluated and selected based on criteria including location desirability, combined transportation accessibility and transportation demand. Applying this methodology to the Seoul metropolitan area, we identify 56 suitable Vertiport locations out of 148 candidates. The proposed methodology offers a strategic approach for the selection of highway-transfer Vertiport locations, enhancing UAM integration with existing transportation systems. Our study provides valuable insights for urban planners and policymakers, with recommendations for future research to include real-time environmental data and to explore the impact of Mobility-as-a-Service on UAM operations.
△ Less
Submitted 1 February, 2025;
originally announced February 2025.
-
Combinatorial Rising Bandit
Authors:
Seockbean Song,
Youngsik Yoon,
Siwei Wang,
Wei Chen,
Jungseul Ok
Abstract:
Combinatorial online learning is a fundamental task for selecting the optimal action (or super arm) as a combination of base arms in sequential interactions with systems providing stochastic rewards. It is applicable to diverse domains such as robotics, social advertising, network routing, and recommendation systems. In many real-world scenarios, we often encounter rising rewards, where playing a…
▽ More
Combinatorial online learning is a fundamental task for selecting the optimal action (or super arm) as a combination of base arms in sequential interactions with systems providing stochastic rewards. It is applicable to diverse domains such as robotics, social advertising, network routing, and recommendation systems. In many real-world scenarios, we often encounter rising rewards, where playing a base arm not only provides an instantaneous reward but also contributes to the enhancement of future rewards, e.g., robots enhancing proficiency through practice and social influence strengthening in the history of successful recommendations. Moreover, the enhancement of a single base arm may affect multiple super arms that include it, introducing complex dependencies that are not captured by existing rising bandit models. To address this, we introduce the Combinatorial Rising Bandit (CRB) framework and propose a provably efficient algorithm, Combinatorial Rising Upper Confidence Bound (CRUCB). We establish an upper bound on regret CRUCB and show that it is nearly tight by deriving a matching lower bound. In addition, we empirically demonstrate the effectiveness of CRUCB not only in synthetic environments but also in realistic applications of deep reinforcement learning.
△ Less
Submitted 29 May, 2025; v1 submitted 1 December, 2024;
originally announced December 2024.
-
Mix from Failure: Confusion-Pairing Mixup for Long-Tailed Recognition
Authors:
Youngseok Yoon,
Sangwoo Hong,
Hyungjun Joo,
Yao Qin,
Haewon Jeong,
Jungwoo Lee
Abstract:
Long-tailed image recognition is a computer vision problem considering a real-world class distribution rather than an artificial uniform. Existing methods typically detour the problem by i) adjusting a loss function, ii) decoupling classifier learning, or iii) proposing a new multi-head architecture called experts. In this paper, we tackle the problem from a different perspective to augment a trai…
▽ More
Long-tailed image recognition is a computer vision problem considering a real-world class distribution rather than an artificial uniform. Existing methods typically detour the problem by i) adjusting a loss function, ii) decoupling classifier learning, or iii) proposing a new multi-head architecture called experts. In this paper, we tackle the problem from a different perspective to augment a training dataset to enhance the sample diversity of minority classes. Specifically, our method, namely Confusion-Pairing Mixup (CP-Mix), estimates the confusion distribution of the model and handles the data deficiency problem by augmenting samples from confusion pairs in real-time. In this way, CP-Mix trains the model to mitigate its weakness and distinguish a pair of classes it frequently misclassifies. In addition, CP-Mix utilizes a novel mixup formulation to handle the bias in decision boundaries that originated from the imbalanced dataset. Extensive experiments demonstrate that CP-Mix outperforms existing methods for long-tailed image recognition and successfully relieves the confusion of the classifier.
△ Less
Submitted 4 March, 2025; v1 submitted 12 November, 2024;
originally announced November 2024.
-
LLM-Driven Learning Analytics Dashboard for Teachers in EFL Writing Education
Authors:
Minsun Kim,
SeonGyeom Kim,
Suyoun Lee,
Yoosang Yoon,
Junho Myung,
Haneul Yoo,
Hyunseung Lim,
Jieun Han,
Yoonsu Kim,
So-Yeon Ahn,
Juho Kim,
Alice Oh,
Hwajung Hong,
Tak Yeon Lee
Abstract:
This paper presents the development of a dashboard designed specifically for teachers in English as a Foreign Language (EFL) writing education. Leveraging LLMs, the dashboard facilitates the analysis of student interactions with an essay writing system, which integrates ChatGPT for real-time feedback. The dashboard aids teachers in monitoring student behavior, identifying noneducational interactio…
▽ More
This paper presents the development of a dashboard designed specifically for teachers in English as a Foreign Language (EFL) writing education. Leveraging LLMs, the dashboard facilitates the analysis of student interactions with an essay writing system, which integrates ChatGPT for real-time feedback. The dashboard aids teachers in monitoring student behavior, identifying noneducational interaction with ChatGPT, and aligning instructional strategies with learning objectives. By combining insights from NLP and Human-Computer Interaction (HCI), this study demonstrates how a human-centered approach can enhance the effectiveness of teacher dashboards, particularly in ChatGPT-integrated learning.
△ Less
Submitted 19 October, 2024;
originally announced October 2024.
-
HerO at AVeriTeC: The Herd of Open Large Language Models for Verifying Real-World Claims
Authors:
Yejun Yoon,
Jaeyoon Jung,
Seunghyun Yoon,
Kunwoo Park
Abstract:
To tackle the AVeriTeC shared task hosted by the FEVER-24, we introduce a system that only employs publicly available large language models (LLMs) for each step of automated fact-checking, dubbed the Herd of Open LLMs for verifying real-world claims (HerO). For evidence retrieval, a language model is used to enhance a query by generating hypothetical fact-checking documents. We prompt pretrained a…
▽ More
To tackle the AVeriTeC shared task hosted by the FEVER-24, we introduce a system that only employs publicly available large language models (LLMs) for each step of automated fact-checking, dubbed the Herd of Open LLMs for verifying real-world claims (HerO). For evidence retrieval, a language model is used to enhance a query by generating hypothetical fact-checking documents. We prompt pretrained and fine-tuned LLMs for question generation and veracity prediction by crafting prompts with retrieved in-context samples. HerO achieved 2nd place on the leaderboard with the AVeriTeC score of 0.57, suggesting the potential of open LLMs for verifying real-world claims. For future research, we make our code publicly available at https://github.com/ssu-humane/HerO.
△ Less
Submitted 20 October, 2024; v1 submitted 16 October, 2024;
originally announced October 2024.
-
Towards a GENEA Leaderboard -- an Extended, Living Benchmark for Evaluating and Advancing Conversational Motion Synthesis
Authors:
Rajmund Nagy,
Hendric Voss,
Youngwoo Yoon,
Taras Kucherenko,
Teodor Nikolov,
Thanh Hoang-Minh,
Rachel McDonnell,
Stefan Kopp,
Michael Neff,
Gustav Eje Henter
Abstract:
Current evaluation practices in speech-driven gesture generation lack standardisation and focus on aspects that are easy to measure over aspects that actually matter. This leads to a situation where it is impossible to know what is the state of the art, or to know which method works better for which purpose when comparing two publications. In this position paper, we review and give details on issu…
▽ More
Current evaluation practices in speech-driven gesture generation lack standardisation and focus on aspects that are easy to measure over aspects that actually matter. This leads to a situation where it is impossible to know what is the state of the art, or to know which method works better for which purpose when comparing two publications. In this position paper, we review and give details on issues with existing gesture-generation evaluation, and present a novel proposal for remedying them. Specifically, we announce an upcoming living leaderboard to benchmark progress in conversational motion synthesis. Unlike earlier gesture-generation challenges, the leaderboard will be updated with large-scale user studies of new gesture-generation systems multiple times per year, and systems on the leaderboard can be submitted to any publication venue that their authors prefer. By evolving the leaderboard evaluation data and tasks over time, the effort can keep driving progress towards the most important end goals identified by the community. We actively seek community involvement across the entire evaluation pipeline: from data and tasks for the evaluation, via tooling, to the systems evaluated. In other words, our proposal will not only make it easier for researchers to perform good evaluations, but their collective input and contributions will also help drive the future of gesture-generation research.
△ Less
Submitted 8 October, 2024;
originally announced October 2024.
-
GMT: Enhancing Generalizable Neural Rendering via Geometry-Driven Multi-Reference Texture Transfer
Authors:
Youngho Yoon,
Hyun-Kurl Jang,
Kuk-Jin Yoon
Abstract:
Novel view synthesis (NVS) aims to generate images at arbitrary viewpoints using multi-view images, and recent insights from neural radiance fields (NeRF) have contributed to remarkable improvements. Recently, studies on generalizable NeRF (G-NeRF) have addressed the challenge of per-scene optimization in NeRFs. The construction of radiance fields on-the-fly in G-NeRF simplifies the NVS process, m…
▽ More
Novel view synthesis (NVS) aims to generate images at arbitrary viewpoints using multi-view images, and recent insights from neural radiance fields (NeRF) have contributed to remarkable improvements. Recently, studies on generalizable NeRF (G-NeRF) have addressed the challenge of per-scene optimization in NeRFs. The construction of radiance fields on-the-fly in G-NeRF simplifies the NVS process, making it well-suited for real-world applications. Meanwhile, G-NeRF still struggles in representing fine details for a specific scene due to the absence of per-scene optimization, even with texture-rich multi-view source inputs. As a remedy, we propose a Geometry-driven Multi-reference Texture transfer network (GMT) available as a plug-and-play module designed for G-NeRF. Specifically, we propose ray-imposed deformable convolution (RayDCN), which aligns input and reference features reflecting scene geometry. Additionally, the proposed texture preserving transformer (TP-Former) aggregates multi-view source features while preserving texture information. Consequently, our module enables direct interaction between adjacent pixels during the image enhancement process, which is deficient in G-NeRF models with an independent rendering process per pixel. This addresses constraints that hinder the ability to capture high-frequency details. Experiments show that our plug-and-play module consistently improves G-NeRF models on various benchmark datasets.
△ Less
Submitted 1 October, 2024;
originally announced October 2024.
-
The Era of Foundation Models in Medical Imaging is Approaching : A Scoping Review of the Clinical Value of Large-Scale Generative AI Applications in Radiology
Authors:
Inwoo Seo,
Eunkyoung Bae,
Joo-Young Jeon,
Young-Sang Yoon,
Jiho Cha
Abstract:
Social problems stemming from the shortage of radiologists are intensifying, and artificial intelligence is being highlighted as a potential solution. Recently emerging large-scale generative AI has expanded from large language models (LLMs) to multi-modal models, showing potential to revolutionize the entire process of medical imaging. However, comprehensive reviews on their development status an…
▽ More
Social problems stemming from the shortage of radiologists are intensifying, and artificial intelligence is being highlighted as a potential solution. Recently emerging large-scale generative AI has expanded from large language models (LLMs) to multi-modal models, showing potential to revolutionize the entire process of medical imaging. However, comprehensive reviews on their development status and future challenges are currently lacking. This scoping review systematically organizes existing literature on the clinical value of large-scale generative AI applications by following PCC guidelines. A systematic search was conducted across four databases: PubMed, EMbase, IEEE-Xplore, and Google Scholar, and 15 studies meeting the inclusion/exclusion criteria set by the researchers were reviewed. Most of these studies focused on improving the efficiency of report generation in specific parts of the interpretation process or on translating reports to aid patient understanding, with the latest studies extending to AI applications performing direct interpretations. All studies were quantitatively evaluated by clinicians, with most utilizing LLMs and only three employing multi-modal models. Both LLMs and multi-modal models showed excellent results in specific areas, but none yet outperformed radiologists in diagnostic performance. Most studies utilized GPT, with few using models specialized for the medical imaging domain. This study provides insights into the current state and limitations of large-scale generative AI-based applications in the medical imaging field, offering foundational data and suggesting that the era of medical imaging foundation models is on the horizon, which may fundamentally transform clinical practice in the near future.
△ Less
Submitted 2 September, 2024;
originally announced September 2024.
-
Model Collapse in the Self-Consuming Chain of Diffusion Finetuning: A Novel Perspective from Quantitative Trait Modeling
Authors:
Youngseok Yoon,
Dainong Hu,
Iain Weissburg,
Yao Qin,
Haewon Jeong
Abstract:
Model collapse, the severe degradation of generative models when iteratively trained on their own outputs, has gained significant attention in recent years. This paper examines Chain of Diffusion, where a pretrained text-to-image diffusion model is finetuned on its own generated images. We demonstrate that severe image quality degradation was universal and identify CFG scale as the key factor impa…
▽ More
Model collapse, the severe degradation of generative models when iteratively trained on their own outputs, has gained significant attention in recent years. This paper examines Chain of Diffusion, where a pretrained text-to-image diffusion model is finetuned on its own generated images. We demonstrate that severe image quality degradation was universal and identify CFG scale as the key factor impacting this model collapse. Drawing on an analogy between the Chain of Diffusion and biological evolution, we then introduce a novel theoretical analysis based on quantitative trait modeling from statistical genetics. Our theoretical analysis aligns with empirical observations of the generated images in the Chain of Diffusion. Finally, we propose Reusable Diffusion Finetuning (ReDiFine), a simple yet effective strategy inspired by genetic mutations. It operates robustly across various scenarios without requiring any hyperparameter tuning, making it a plug-and-play solution for reusable image generation.
△ Less
Submitted 6 June, 2025; v1 submitted 4 July, 2024;
originally announced July 2024.
-
Distribution-Aware Robust Learning from Long-Tailed Data with Noisy Labels
Authors:
Jae Soon Baik,
In Young Yoon,
Kun Hoon Kim,
Jun Won Choi
Abstract:
Deep neural networks have demonstrated remarkable advancements in various fields using large, well-annotated datasets. However, real-world data often exhibit long-tailed distributions and label noise, significantly degrading generalization performance. Recent studies addressing these issues have focused on noisy sample selection methods that estimate the centroid of each class based on high-confid…
▽ More
Deep neural networks have demonstrated remarkable advancements in various fields using large, well-annotated datasets. However, real-world data often exhibit long-tailed distributions and label noise, significantly degrading generalization performance. Recent studies addressing these issues have focused on noisy sample selection methods that estimate the centroid of each class based on high-confidence samples within each target class. The performance of these methods is limited because they use only the training samples within each class for class centroid estimation, making the quality of centroids susceptible to long-tailed distributions and noisy labels. In this study, we present a robust training framework called Distribution-aware Sample Selection and Contrastive Learning (DaSC). Specifically, DaSC introduces a Distribution-aware Class Centroid Estimation (DaCC) to generate enhanced class centroids. DaCC performs weighted averaging of the features from all samples, with weights determined based on model predictions. Additionally, we propose a confidence-aware contrastive learning strategy to obtain balanced and robust representations. The training samples are categorized into high-confidence and low-confidence samples. Our method then applies Semi-supervised Balanced Contrastive Loss (SBCL) using high-confidence samples, leveraging reliable label information to mitigate class bias. For the low-confidence samples, our method computes Mixup-enhanced Instance Discrimination Loss (MIDL) to improve their representations in a self-supervised manner. Our experimental results on CIFAR and real-world noisy-label datasets demonstrate the superior performance of the proposed DaSC compared to previous approaches.
△ Less
Submitted 23 July, 2024;
originally announced July 2024.
-
Harmful Suicide Content Detection
Authors:
Kyumin Park,
Myung Jae Baik,
YeongJun Hwang,
Yen Shin,
HoJae Lee,
Ruda Lee,
Sang Min Lee,
Je Young Hannah Sun,
Ah Rah Lee,
Si Yeun Yoon,
Dong-ho Lee,
Jihyung Moon,
JinYeong Bak,
Kyunghyun Cho,
Jong-Woo Paik,
Sungjoon Park
Abstract:
Harmful suicide content on the Internet is a significant risk factor inducing suicidal thoughts and behaviors among vulnerable populations. Despite global efforts, existing resources are insufficient, specifically in high-risk regions like the Republic of Korea. Current research mainly focuses on understanding negative effects of such content or suicide risk in individuals, rather than on automati…
▽ More
Harmful suicide content on the Internet is a significant risk factor inducing suicidal thoughts and behaviors among vulnerable populations. Despite global efforts, existing resources are insufficient, specifically in high-risk regions like the Republic of Korea. Current research mainly focuses on understanding negative effects of such content or suicide risk in individuals, rather than on automatically detecting the harmfulness of content. To fill this gap, we introduce a harmful suicide content detection task for classifying online suicide content into five harmfulness levels. We develop a multi-modal benchmark and a task description document in collaboration with medical professionals, and leverage large language models (LLMs) to explore efficient methods for moderating such content. Our contributions include proposing a novel detection task, a multi-modal Korean benchmark with expert annotations, and suggesting strategies using LLMs to detect illegal and harmful content. Owing to the potential harm involved, we publicize our implementations and benchmark, incorporating an ethical verification process.
△ Less
Submitted 2 June, 2024;
originally announced July 2024.
-
A Spatio-Temporal Representation Learning as an Alternative to Traditional Glosses in Sign Language Translation and Production
Authors:
Eui Jun Hwang,
Sukmin Cho,
Huije Lee,
Youngwoo Yoon,
Jong C. Park
Abstract:
This work addresses the challenges associated with the use of glosses in both Sign Language Translation (SLT) and Sign Language Production (SLP). While glosses have long been used as a bridge between sign language and spoken language, they come with two major limitations that impede the advancement of sign language systems. First, annotating the glosses is a labor-intensive and time-consuming proc…
▽ More
This work addresses the challenges associated with the use of glosses in both Sign Language Translation (SLT) and Sign Language Production (SLP). While glosses have long been used as a bridge between sign language and spoken language, they come with two major limitations that impede the advancement of sign language systems. First, annotating the glosses is a labor-intensive and time-consuming process, which limits the scalability of datasets. Second, the glosses oversimplify sign language by stripping away its spatio-temporal dynamics, reducing complex signs to basic labels and missing the subtle movements essential for precise interpretation. To address these limitations, we introduce Universal Gloss-level Representation (UniGloR), a framework designed to capture the spatio-temporal features inherent in sign language, providing a more dynamic and detailed alternative to the use of the glosses. The core idea of UniGloR is simple yet effective: We derive dense spatio-temporal representations from sign keypoint sequences using self-supervised learning and seamlessly integrate them into SLT and SLP tasks. Our experiments in a keypoint-based setting demonstrate that UniGloR either outperforms or matches the performance of previous SLT and SLP methods on two widely-used datasets: PHOENIX14T and How2Sign.
△ Less
Submitted 4 December, 2024; v1 submitted 3 July, 2024;
originally announced July 2024.
-
CLST: Cold-Start Mitigation in Knowledge Tracing by Aligning a Generative Language Model as a Students' Knowledge Tracer
Authors:
Heeseok Jung,
Jaesang Yoo,
Yohaan Yoon,
Yeonju Jang
Abstract:
Knowledge tracing (KT), wherein students' problem-solving histories are used to estimate their current levels of knowledge, has attracted significant interest from researchers. However, most existing KT models were developed with an ID-based paradigm, which exhibits limitations in cold-start performance. These limitations can be mitigated by leveraging the vast quantities of external knowledge pos…
▽ More
Knowledge tracing (KT), wherein students' problem-solving histories are used to estimate their current levels of knowledge, has attracted significant interest from researchers. However, most existing KT models were developed with an ID-based paradigm, which exhibits limitations in cold-start performance. These limitations can be mitigated by leveraging the vast quantities of external knowledge possessed by generative large language models (LLMs). In this study, we propose cold-start mitigation in knowledge tracing by aligning a generative language model as a students' knowledge tracer (CLST) as a framework that utilizes a generative LLM as a knowledge tracer. Upon collecting data from math, social studies, and science subjects, we framed the KT task as a natural language processing task, wherein problem-solving data are expressed in natural language, and fine-tuned the generative LLM using the formatted KT dataset. Subsequently, we evaluated the performance of the CLST in situations of data scarcity using various baseline models for comparison. The results indicate that the CLST significantly enhanced performance with a dataset of fewer than 100 students in terms of prediction, reliability, and cross-domain generalization.
△ Less
Submitted 17 June, 2024; v1 submitted 13 June, 2024;
originally announced June 2024.
-
Designing Prompt Analytics Dashboards to Analyze Student-ChatGPT Interactions in EFL Writing
Authors:
Minsun Kim,
SeonGyeom Kim,
Suyoun Lee,
Yoosang Yoon,
Junho Myung,
Haneul Yoo,
Hyunseung Lim,
Jieun Han,
Yoonsu Kim,
So-Yeon Ahn,
Juho Kim,
Alice Oh,
Hwajung Hong,
Tak Yeon Lee
Abstract:
While ChatGPT has significantly impacted education by offering personalized resources for students, its integration into educational settings poses unprecedented risks, such as inaccuracies and biases in AI-generated content, plagiarism and over-reliance on AI, and privacy and security issues. To help teachers address such risks, we conducted a two-phase iterative design process that comprises sur…
▽ More
While ChatGPT has significantly impacted education by offering personalized resources for students, its integration into educational settings poses unprecedented risks, such as inaccuracies and biases in AI-generated content, plagiarism and over-reliance on AI, and privacy and security issues. To help teachers address such risks, we conducted a two-phase iterative design process that comprises surveys, interviews, and prototype demonstration involving six EFL (English as a Foreign Language) teachers, who integrated ChatGPT into semester-long English essay writing classes. Based on the needs identified during the initial survey and interviews, we developed a prototype of Prompt Analytics Dashboard (PAD) that integrates the essay editing history and chat logs between students and ChatGPT. Teacher's feedback on the prototype informs additional features and unmet needs for designing future PAD, which helps them (1) analyze contextual analysis of student behaviors, (2) design an overall learning loop, and (3) develop their teaching skills.
△ Less
Submitted 18 October, 2024; v1 submitted 30 May, 2024;
originally announced May 2024.
-
HyperCLOVA X Technical Report
Authors:
Kang Min Yoo,
Jaegeun Han,
Sookyo In,
Heewon Jeon,
Jisu Jeong,
Jaewook Kang,
Hyunwook Kim,
Kyung-Min Kim,
Munhyong Kim,
Sungju Kim,
Donghyun Kwak,
Hanock Kwak,
Se Jung Kwon,
Bado Lee,
Dongsoo Lee,
Gichang Lee,
Jooho Lee,
Baeseong Park,
Seongjin Shin,
Joonsang Yu,
Seolki Baek,
Sumin Byeon,
Eungsup Cho,
Dooseok Choe,
Jeesung Han
, et al. (371 additional authors not shown)
Abstract:
We introduce HyperCLOVA X, a family of large language models (LLMs) tailored to the Korean language and culture, along with competitive capabilities in English, math, and coding. HyperCLOVA X was trained on a balanced mix of Korean, English, and code data, followed by instruction-tuning with high-quality human-annotated datasets while abiding by strict safety guidelines reflecting our commitment t…
▽ More
We introduce HyperCLOVA X, a family of large language models (LLMs) tailored to the Korean language and culture, along with competitive capabilities in English, math, and coding. HyperCLOVA X was trained on a balanced mix of Korean, English, and code data, followed by instruction-tuning with high-quality human-annotated datasets while abiding by strict safety guidelines reflecting our commitment to responsible AI. The model is evaluated across various benchmarks, including comprehensive reasoning, knowledge, commonsense, factuality, coding, math, chatting, instruction-following, and harmlessness, in both Korean and English. HyperCLOVA X exhibits strong reasoning capabilities in Korean backed by a deep understanding of the language and cultural nuances. Further analysis of the inherent bilingual nature and its extension to multilingualism highlights the model's cross-lingual proficiency and strong generalization ability to untargeted languages, including machine translation between several language pairs and cross-lingual inference tasks. We believe that HyperCLOVA X can provide helpful guidance for regions or countries in developing their sovereign LLMs.
△ Less
Submitted 13 April, 2024; v1 submitted 2 April, 2024;
originally announced April 2024.
-
BlendX: Complex Multi-Intent Detection with Blended Patterns
Authors:
Yejin Yoon,
Jungyeon Lee,
Kangsan Kim,
Chanhee Park,
Taeuk Kim
Abstract:
Task-oriented dialogue (TOD) systems are commonly designed with the presumption that each utterance represents a single intent. However, this assumption may not accurately reflect real-world situations, where users frequently express multiple intents within a single utterance. While there is an emerging interest in multi-intent detection (MID), existing in-domain datasets such as MixATIS and MixSN…
▽ More
Task-oriented dialogue (TOD) systems are commonly designed with the presumption that each utterance represents a single intent. However, this assumption may not accurately reflect real-world situations, where users frequently express multiple intents within a single utterance. While there is an emerging interest in multi-intent detection (MID), existing in-domain datasets such as MixATIS and MixSNIPS have limitations in their formulation. To address these issues, we present BlendX, a suite of refined datasets featuring more diverse patterns than their predecessors, elevating both its complexity and diversity. For dataset construction, we utilize both rule-based heuristics as well as a generative tool -- OpenAI's ChatGPT -- which is augmented with a similarity-driven strategy for utterance selection. To ensure the quality of the proposed datasets, we also introduce three novel metrics that assess the statistical properties of an utterance related to word count, conjunction use, and pronoun usage. Extensive experiments on BlendX reveal that state-of-the-art MID models struggle with the challenges posed by the new datasets, highlighting the need to reexamine the current state of the MID field. The dataset is available at https://github.com/HYU-NLP/BlendX.
△ Less
Submitted 27 March, 2024;
originally announced March 2024.
-
Assessing News Thumbnail Representativeness: Counterfactual text can enhance the cross-modal matching ability
Authors:
Yejun Yoon,
Seunghyun Yoon,
Kunwoo Park
Abstract:
This paper addresses the critical challenge of assessing the representativeness of news thumbnail images, which often serve as the first visual engagement for readers when an article is disseminated on social media. We focus on whether a news image represents the actors discussed in the news text. To serve the challenge, we introduce NewsTT, a manually annotated dataset of 1000 news thumbnail imag…
▽ More
This paper addresses the critical challenge of assessing the representativeness of news thumbnail images, which often serve as the first visual engagement for readers when an article is disseminated on social media. We focus on whether a news image represents the actors discussed in the news text. To serve the challenge, we introduce NewsTT, a manually annotated dataset of 1000 news thumbnail images and text pairs. We found that the pretrained vision and language models, such as BLIP-2, struggle with this task. Since news subjects frequently involve named entities or proper nouns, the pretrained models could have a limited capability to match news actors' visual and textual appearances. We hypothesize that learning to contrast news text with its counterfactual, of which named entities are replaced, can enhance the cross-modal matching ability of vision and language models. We propose CFT-CLIP, a contrastive learning framework that updates vision and language bi-encoders according to the hypothesis. We found that our simple method can boost the performance for assessing news thumbnail representativeness, supporting our assumption. Code and data can be accessed at https://github.com/ssu-humane/news-images-acl24.
△ Less
Submitted 6 June, 2024; v1 submitted 16 February, 2024;
originally announced February 2024.
-
LoTa-Bench: Benchmarking Language-oriented Task Planners for Embodied Agents
Authors:
Jae-Woo Choi,
Youngwoo Yoon,
Hyobin Ong,
Jaehong Kim,
Minsu Jang
Abstract:
Large language models (LLMs) have recently received considerable attention as alternative solutions for task planning. However, comparing the performance of language-oriented task planners becomes difficult, and there exists a dearth of detailed exploration regarding the effects of various factors such as pre-trained model selection and prompt construction. To address this, we propose a benchmark…
▽ More
Large language models (LLMs) have recently received considerable attention as alternative solutions for task planning. However, comparing the performance of language-oriented task planners becomes difficult, and there exists a dearth of detailed exploration regarding the effects of various factors such as pre-trained model selection and prompt construction. To address this, we propose a benchmark system for automatically quantifying performance of task planning for home-service embodied agents. Task planners are tested on two pairs of datasets and simulators: 1) ALFRED and AI2-THOR, 2) an extension of Watch-And-Help and VirtualHome. Using the proposed benchmark system, we perform extensive experiments with LLMs and prompts, and explore several enhancements of the baseline planner. We expect that the proposed benchmark tool would accelerate the development of language-oriented task planners.
△ Less
Submitted 12 February, 2024;
originally announced February 2024.
-
Motion-induced error reduction for high-speed dynamic digital fringe projection system
Authors:
Sanghoon Jeon,
Hyo-Geon Lee,
Jae-Sung Lee,
Bo-Min Kang,
Byung-Wook Jeon,
Jun Young Yoon,
Jae-Sang Hyun
Abstract:
In phase-shifting profilometry (PSP), any motion during the acquisition of fringe patterns can introduce errors because it assumes both the object and measurement system are stationary. Therefore, we propose a method to pixel-wise reduce the errors when the measurement system is in motion due to a motorized linear stage. The proposed method introduces motion-induced error reduction algorithm, whic…
▽ More
In phase-shifting profilometry (PSP), any motion during the acquisition of fringe patterns can introduce errors because it assumes both the object and measurement system are stationary. Therefore, we propose a method to pixel-wise reduce the errors when the measurement system is in motion due to a motorized linear stage. The proposed method introduces motion-induced error reduction algorithm, which leverages the motor's encoder and pinhole model of the camera and projector. 3D shape measurement is possible with only three fringe patterns by applying geometric constraints of the digital fringe projection system. We address the mismatch problem due to the motion-induced camera pixel disparities and reduce phase-shift errors. These processes are easy to implement and require low computational cost. Experimental results demonstrate that the presented method effectively reduces the errors even in non-uniform motion.
△ Less
Submitted 29 January, 2024;
originally announced January 2024.
-
Few-Shot Anomaly Detection with Adversarial Loss for Robust Feature Representations
Authors:
Jae Young Lee,
Wonjun Lee,
Jaehyun Choi,
Yongkwi Lee,
Young Seog Yoon
Abstract:
Anomaly detection is a critical and challenging task that aims to identify data points deviating from normal patterns and distributions within a dataset. Various methods have been proposed using a one-class-one-model approach, but these techniques often face practical problems such as memory inefficiency and the requirement of sufficient data for training. In particular, few-shot anomaly detection…
▽ More
Anomaly detection is a critical and challenging task that aims to identify data points deviating from normal patterns and distributions within a dataset. Various methods have been proposed using a one-class-one-model approach, but these techniques often face practical problems such as memory inefficiency and the requirement of sufficient data for training. In particular, few-shot anomaly detection presents significant challenges in industrial applications, where limited samples are available before mass production. In this paper, we propose a few-shot anomaly detection method that integrates adversarial training loss to obtain more robust and generalized feature representations. We utilize the adversarial loss previously employed in domain adaptation to align feature distributions between source and target domains, to enhance feature robustness and generalization in few-shot anomaly detection tasks. We hypothesize that adversarial loss is effective when applied to features that should have similar characteristics, such as those from the same layer in a Siamese network's parallel branches or input-output pairs of reconstruction-based methods. Experimental results demonstrate that the proposed method generally achieves better performance when utilizing the adversarial loss.
△ Less
Submitted 4 December, 2023;
originally announced December 2023.
-
A Unified Approach for Comprehensive Analysis of Various Spectral and Tissue Doppler Echocardiography
Authors:
Jaeik Jeon,
Jiyeon Kim,
Yeonggul Jang,
Yeonyee E. Yoon,
Dawun Jeong,
Youngtaek Hong,
Seung-Ah Lee,
Hyuk-Jae Chang
Abstract:
Doppler echocardiography offers critical insights into cardiac function and phases by quantifying blood flow velocities and evaluating myocardial motion. However, previous methods for automating Doppler analysis, ranging from initial signal processing techniques to advanced deep learning approaches, have been constrained by their reliance on electrocardiogram (ECG) data and their inability to proc…
▽ More
Doppler echocardiography offers critical insights into cardiac function and phases by quantifying blood flow velocities and evaluating myocardial motion. However, previous methods for automating Doppler analysis, ranging from initial signal processing techniques to advanced deep learning approaches, have been constrained by their reliance on electrocardiogram (ECG) data and their inability to process Doppler views collectively. We introduce a novel unified framework using a convolutional neural network for comprehensive analysis of spectral and tissue Doppler echocardiography images that combines automatic measurements and end-diastole (ED) detection into a singular method. The network automatically recognizes key features across various Doppler views, with novel Doppler shape embedding and anti-aliasing modules enhancing interpretation and ensuring consistent analysis. Empirical results indicate a consistent outperformance in performance metrics, including dice similarity coefficients (DSC) and intersection over union (IoU). The proposed framework demonstrates strong agreement with clinicians in Doppler automatic measurements and competitive performance in ED detection.
△ Less
Submitted 14 November, 2023;
originally announced November 2023.
-
US Microelectronics Packaging Ecosystem: Challenges and Opportunities
Authors:
Rouhan Noor,
Himanandhan Reddy Kottur,
Patrick J Craig,
Liton Kumar Biswas,
M Shafkat M Khan,
Nitin Varshney,
Hamed Dalir,
Elif Akçalı,
Bahareh Ghane Motlagh,
Charles Woychik,
Yong-Kyu Yoon,
Navid Asadizanjani
Abstract:
The semiconductor industry is experiencing a significant shift from traditional methods of shrinking devices and reducing costs. Chip designers actively seek new technological solutions to enhance cost-effectiveness while incorporating more features into the silicon footprint. One promising approach is Heterogeneous Integration (HI), which involves advanced packaging techniques to integrate indepe…
▽ More
The semiconductor industry is experiencing a significant shift from traditional methods of shrinking devices and reducing costs. Chip designers actively seek new technological solutions to enhance cost-effectiveness while incorporating more features into the silicon footprint. One promising approach is Heterogeneous Integration (HI), which involves advanced packaging techniques to integrate independently designed and manufactured components using the most suitable process technology. However, adopting HI introduces design and security challenges. To enable HI, research and development of advanced packaging is crucial. The existing research raises the possible security threats in the advanced packaging supply chain, as most of the Outsourced Semiconductor Assembly and Test (OSAT) facilities/vendors are offshore. To deal with the increasing demand for semiconductors and to ensure a secure semiconductor supply chain, there are sizable efforts from the United States (US) government to bring semiconductor fabrication facilities onshore. However, the US-based advanced packaging capabilities must also be ramped up to fully realize the vision of establishing a secure, efficient, resilient semiconductor supply chain. Our effort was motivated to identify the possible bottlenecks and weak links in the advanced packaging supply chain based in the US.
△ Less
Submitted 30 October, 2023; v1 submitted 17 October, 2023;
originally announced October 2023.
-
Self supervised convolutional kernel based handcrafted feature harmonization: Enhanced left ventricle hypertension disease phenotyping on echocardiography
Authors:
Jina Lee,
Youngtaek Hong,
Dawun Jeong,
Yeonggul Jang,
Jaeik Jeon,
Sihyeon Jeong,
Taekgeun Jung,
Yeonyee E. Yoon,
Inki Moon,
Seung-Ah Lee,
Hyuk-Jae Chang
Abstract:
Radiomics, a medical imaging technique, extracts quantitative handcrafted features from images to predict diseases. Harmonization in those features ensures consistent feature extraction across various imaging devices and protocols. Methods for harmonization include standardized imaging protocols, statistical adjustments, and evaluating feature robustness. Myocardial diseases such as Left Ventricul…
▽ More
Radiomics, a medical imaging technique, extracts quantitative handcrafted features from images to predict diseases. Harmonization in those features ensures consistent feature extraction across various imaging devices and protocols. Methods for harmonization include standardized imaging protocols, statistical adjustments, and evaluating feature robustness. Myocardial diseases such as Left Ventricular Hypertrophy (LVH) and Hypertensive Heart Disease (HHD) are diagnosed via echocardiography, but variable imaging settings pose challenges. Harmonization techniques are crucial for applying handcrafted features in disease diagnosis in such scenario. Self-supervised learning (SSL) enhances data understanding within limited datasets and adapts to diverse data settings. ConvNeXt-V2 integrates convolutional layers into SSL, displaying superior performance in various tasks. This study focuses on convolutional filters within SSL, using them as preprocessing to convert images into feature maps for handcrafted feature harmonization. Our proposed method excelled in harmonization evaluation and exhibited superior LVH classification performance compared to existing methods.
△ Less
Submitted 22 November, 2023; v1 submitted 13 October, 2023;
originally announced October 2023.
-
Improving Out-of-Distribution Detection in Echocardiographic View Classication through Enhancing Semantic Features
Authors:
Jaeik Jeon,
Seongmin Ha,
Yeonggul Jang,
Yeonyee E. Yoon,
Jiyeon Kim,
Hyunseok Jeong,
Dawun Jeong,
Youngtaek Hong,
Seung-Ah Lee Hyuk-Jae Chang
Abstract:
In echocardiographic view classification, accurately detecting out-of-distribution (OOD) data is essential but challenging, especially given the subtle differences between in-distribution and OOD data. While conventional OOD detection methods, such as Mahalanobis distance (MD) are effective in far-OOD scenarios with clear distinctions between distributions, they struggle to discern the less obviou…
▽ More
In echocardiographic view classification, accurately detecting out-of-distribution (OOD) data is essential but challenging, especially given the subtle differences between in-distribution and OOD data. While conventional OOD detection methods, such as Mahalanobis distance (MD) are effective in far-OOD scenarios with clear distinctions between distributions, they struggle to discern the less obvious variations characteristic of echocardiographic data. In this study, we introduce a novel use of label smoothing to enhance semantic feature representation in echocardiographic images, demonstrating that these enriched semantic features are key for significantly improving near-OOD instance detection. By combining label smoothing with MD-based OOD detection, we establish a new benchmark for accuracy in echocardiographic OOD detection.
△ Less
Submitted 23 November, 2023; v1 submitted 31 August, 2023;
originally announced August 2023.
-
The GENEA Challenge 2023: A large scale evaluation of gesture generation models in monadic and dyadic settings
Authors:
Taras Kucherenko,
Rajmund Nagy,
Youngwoo Yoon,
Jieyeon Woo,
Teodor Nikolov,
Mihail Tsakov,
Gustav Eje Henter
Abstract:
This paper reports on the GENEA Challenge 2023, in which participating teams built speech-driven gesture-generation systems using the same speech and motion dataset, followed by a joint evaluation. This year's challenge provided data on both sides of a dyadic interaction, allowing teams to generate full-body motion for an agent given its speech (text and audio) and the speech and motion of the int…
▽ More
This paper reports on the GENEA Challenge 2023, in which participating teams built speech-driven gesture-generation systems using the same speech and motion dataset, followed by a joint evaluation. This year's challenge provided data on both sides of a dyadic interaction, allowing teams to generate full-body motion for an agent given its speech (text and audio) and the speech and motion of the interlocutor. We evaluated 12 submissions and 2 baselines together with held-out motion-capture data in several large-scale user studies. The studies focused on three aspects: 1) the human-likeness of the motion, 2) the appropriateness of the motion for the agent's own speech whilst controlling for the human-likeness of the motion, and 3) the appropriateness of the motion for the behaviour of the interlocutor in the interaction, using a setup that controls for both the human-likeness of the motion and the agent's own speech. We found a large span in human-likeness between challenge submissions, with a few systems rated close to human mocap. Appropriateness seems far from being solved, with most submissions performing in a narrow range slightly above chance, far behind natural motion. The effect of the interlocutor is even more subtle, with submitted systems at best performing barely above chance. Interestingly, a dyadic system being highly appropriate for agent speech does not necessarily imply high appropriateness for the interlocutor. Additional material is available via the project website at https://svito-zar.github.io/GENEAchallenge2023/ .
△ Less
Submitted 24 August, 2023;
originally announced August 2023.
-
Camera-Driven Representation Learning for Unsupervised Domain Adaptive Person Re-identification
Authors:
Geon Lee,
Sanghoon Lee,
Dohyung Kim,
Younghoon Shin,
Yongsang Yoon,
Bumsub Ham
Abstract:
We present a novel unsupervised domain adaption method for person re-identification (reID) that generalizes a model trained on a labeled source domain to an unlabeled target domain. We introduce a camera-driven curriculum learning (CaCL) framework that leverages camera labels of person images to transfer knowledge from source to target domains progressively. To this end, we divide target domain da…
▽ More
We present a novel unsupervised domain adaption method for person re-identification (reID) that generalizes a model trained on a labeled source domain to an unlabeled target domain. We introduce a camera-driven curriculum learning (CaCL) framework that leverages camera labels of person images to transfer knowledge from source to target domains progressively. To this end, we divide target domain dataset into multiple subsets based on the camera labels, and initially train our model with a single subset (i.e., images captured by a single camera). We then gradually exploit more subsets for training, according to a curriculum sequence obtained with a camera-driven scheduling rule. The scheduler considers maximum mean discrepancies (MMD) between each subset and the source domain dataset, such that the subset closer to the source domain is exploited earlier within the curriculum. For each curriculum sequence, we generate pseudo labels of person images in a target domain to train a reID model in a supervised way. We have observed that the pseudo labels are highly biased toward cameras, suggesting that person images obtained from the same camera are likely to have the same pseudo labels, even for different IDs. To address the camera bias problem, we also introduce a camera-diversity (CD) loss encouraging person images of the same pseudo label, but captured across various cameras, to involve more for discriminative feature learning, providing person representations robust to inter-camera variations. Experimental results on standard benchmarks, including real-to-real and synthetic-to-real scenarios, demonstrate the effectiveness of our framework.
△ Less
Submitted 23 August, 2023;
originally announced August 2023.
-
Inductive Program Synthesis via Iterative Forward-Backward Abstract Interpretation
Authors:
Yongho Yoon,
Woosuk Lee,
Kwangkeun Yi
Abstract:
A key challenge in example-based program synthesis is the gigantic search space of programs. To address this challenge, various work proposed to use abstract interpretation to prune the search space. However, most of existing approaches have focused only on forward abstract interpretation, and thus cannot fully exploit the power of abstract interpretation. In this paper, we propose a novel approac…
▽ More
A key challenge in example-based program synthesis is the gigantic search space of programs. To address this challenge, various work proposed to use abstract interpretation to prune the search space. However, most of existing approaches have focused only on forward abstract interpretation, and thus cannot fully exploit the power of abstract interpretation. In this paper, we propose a novel approach to inductive program synthesis via iterative forward-backward abstract interpretation. The forward abstract interpretation computes possible outputs of a program given inputs, while the backward abstract interpretation computes possible inputs of a program given outputs. By iteratively performing the two abstract interpretations in an alternating fashion, we can effectively determine if any completion of each partial program as a candidate can satisfy the input-output examples. We apply our approach to a standard formulation, syntax-guided synthesis (SyGuS), thereby supporting a wide range of inductive synthesis tasks. We have implemented our approach and evaluated it on a set of benchmarks from the prior work. The experimental results show that our approach significantly outperforms the state-of-the-art approaches thanks to the sophisticated abstract interpretation techniques.
△ Less
Submitted 21 April, 2023;
originally announced April 2023.
-
Co-Speech Gesture Synthesis using Discrete Gesture Token Learning
Authors:
Shuhong Lu,
Youngwoo Yoon,
Andrew Feng
Abstract:
Synthesizing realistic co-speech gestures is an important and yet unsolved problem for creating believable motions that can drive a humanoid robot to interact and communicate with human users. Such capability will improve the impressions of the robots by human users and will find applications in education, training, and medical services. One challenge in learning the co-speech gesture model is tha…
▽ More
Synthesizing realistic co-speech gestures is an important and yet unsolved problem for creating believable motions that can drive a humanoid robot to interact and communicate with human users. Such capability will improve the impressions of the robots by human users and will find applications in education, training, and medical services. One challenge in learning the co-speech gesture model is that there may be multiple viable gesture motions for the same speech utterance. The deterministic regression methods can not resolve the conflicting samples and may produce over-smoothed or damped motions. We proposed a two-stage model to address this uncertainty issue in gesture synthesis by modeling the gesture segments as discrete latent codes. Our method utilizes RQ-VAE in the first stage to learn a discrete codebook consisting of gesture tokens from training data. In the second stage, a two-level autoregressive transformer model is used to learn the prior distribution of residual codes conditioned on input speech context. Since the inference is formulated as token sampling, multiple gesture sequences could be generated given the same speech input using top-k sampling. The quantitative results and the user study showed the proposed method outperforms the previous methods and is able to generate realistic and diverse gesture motions.
△ Less
Submitted 3 March, 2023;
originally announced March 2023.
-
Evaluating gesture generation in a large-scale open challenge: The GENEA Challenge 2022
Authors:
Taras Kucherenko,
Pieter Wolfert,
Youngwoo Yoon,
Carla Viegas,
Teodor Nikolov,
Mihail Tsakov,
Gustav Eje Henter
Abstract:
This paper reports on the second GENEA Challenge to benchmark data-driven automatic co-speech gesture generation. Participating teams used the same speech and motion dataset to build gesture-generation systems. Motion generated by all these systems was rendered to video using a standardised visualisation pipeline and evaluated in several large, crowdsourced user studies. Unlike when comparing diff…
▽ More
This paper reports on the second GENEA Challenge to benchmark data-driven automatic co-speech gesture generation. Participating teams used the same speech and motion dataset to build gesture-generation systems. Motion generated by all these systems was rendered to video using a standardised visualisation pipeline and evaluated in several large, crowdsourced user studies. Unlike when comparing different research papers, differences in results are here only due to differences between methods, enabling direct comparison between systems. The dataset was based on 18 hours of full-body motion capture, including fingers, of different persons engaging in a dyadic conversation. Ten teams participated in the challenge across two tiers: full-body and upper-body gesticulation. For each tier, we evaluated both the human-likeness of the gesture motion and its appropriateness for the specific speech signal. Our evaluations decouple human-likeness from gesture appropriateness, which has been a difficult problem in the field.
The evaluation results show some synthetic gesture conditions being rated as significantly more human-like than 3D human motion capture. To the best of our knowledge, this has not been demonstrated before. On the other hand, all synthetic motion is found to be vastly less appropriate for the speech than the original motion-capture recordings. We also find that conventional objective metrics do not correlate well with subjective human-likeness ratings in this large evaluation. The one exception is the Fréchet gesture distance (FGD), which achieves a Kendall's tau rank correlation of around $-0.5$. Based on the challenge results we formulate numerous recommendations for system building and evaluation.
△ Less
Submitted 28 March, 2024; v1 submitted 15 March, 2023;
originally announced March 2023.
-
Design, Field Evaluation, and Traffic Analysis of a Competitive Autonomous Driving Model in a Congested Environment
Authors:
Daegyu Lee,
Hyunki Seong,
Seungil Han,
Gyuree Kang,
D. Hyunchul Shim,
Yoonjin Yoon
Abstract:
Recently, numerous studies have investigated cooperative traffic systems using the communication among vehicle-to-everything (V2X). Unfortunately, when multiple autonomous vehicles are deployed while exposed to communication failure, there might be a conflict of ideal conditions between various autonomous vehicles leading to adversarial situation on the roads. In South Korea, virtual and real-worl…
▽ More
Recently, numerous studies have investigated cooperative traffic systems using the communication among vehicle-to-everything (V2X). Unfortunately, when multiple autonomous vehicles are deployed while exposed to communication failure, there might be a conflict of ideal conditions between various autonomous vehicles leading to adversarial situation on the roads. In South Korea, virtual and real-world urban autonomous multi-vehicle races were held in March and November of 2021, respectively. During the competition, multiple vehicles were involved simultaneously, which required maneuvers such as overtaking low-speed vehicles, negotiating intersections, and obeying traffic laws. In this study, we introduce a fully autonomous driving software stack to deploy a competitive driving model, which enabled us to win the urban autonomous multi-vehicle races. We evaluate module-based systems such as navigation, perception, and planning in real and virtual environments. Additionally, an analysis of traffic is performed after collecting multiple vehicle position data over communication to gain additional insight into a multi-agent autonomous driving scenario. Finally, we propose a method for analyzing traffic in order to compare the spatial distribution of multiple autonomous vehicles. We study the similarity distribution between each team's driving log data to determine the impact of competitive autonomous driving on the traffic environment.
△ Less
Submitted 6 November, 2022; v1 submitted 31 October, 2022;
originally announced October 2022.
-
The GENEA Challenge 2022: A large evaluation of data-driven co-speech gesture generation
Authors:
Youngwoo Yoon,
Pieter Wolfert,
Taras Kucherenko,
Carla Viegas,
Teodor Nikolov,
Mihail Tsakov,
Gustav Eje Henter
Abstract:
This paper reports on the second GENEA Challenge to benchmark data-driven automatic co-speech gesture generation. Participating teams used the same speech and motion dataset to build gesture-generation systems. Motion generated by all these systems was rendered to video using a standardised visualisation pipeline and evaluated in several large, crowdsourced user studies. Unlike when comparing diff…
▽ More
This paper reports on the second GENEA Challenge to benchmark data-driven automatic co-speech gesture generation. Participating teams used the same speech and motion dataset to build gesture-generation systems. Motion generated by all these systems was rendered to video using a standardised visualisation pipeline and evaluated in several large, crowdsourced user studies. Unlike when comparing different research papers, differences in results are here only due to differences between methods, enabling direct comparison between systems. This year's dataset was based on 18 hours of full-body motion capture, including fingers, of different persons engaging in dyadic conversation. Ten teams participated in the challenge across two tiers: full-body and upper-body gesticulation. For each tier we evaluated both the human-likeness of the gesture motion and its appropriateness for the specific speech signal. Our evaluations decouple human-likeness from gesture appropriateness, which previously was a major challenge in the field.
The evaluation results are a revolution, and a revelation. Some synthetic conditions are rated as significantly more human-like than human motion capture. To the best of our knowledge, this has never been shown before on a high-fidelity avatar. On the other hand, all synthetic motion is found to be vastly less appropriate for the speech than the original motion-capture recordings. Additional material is available via the project website at https://youngwoo-yoon.github.io/GENEAchallenge2022/
△ Less
Submitted 22 August, 2022;
originally announced August 2022.
-
Efficient and Privacy Preserving Group Signature for Federated Learning
Authors:
Sneha Kanchan,
Jae Won Jang,
Jun Yong Yoon,
Bong Jun Choi
Abstract:
Federated Learning (FL) is a Machine Learning (ML) technique that aims to reduce the threats to user data privacy. Training is done using the raw data on the users' device, called clients, and only the training results, called gradients, are sent to the server to be aggregated and generate an updated model. However, we cannot assume that the server can be trusted with private information, such as…
▽ More
Federated Learning (FL) is a Machine Learning (ML) technique that aims to reduce the threats to user data privacy. Training is done using the raw data on the users' device, called clients, and only the training results, called gradients, are sent to the server to be aggregated and generate an updated model. However, we cannot assume that the server can be trusted with private information, such as metadata related to the owner or source of the data. So, hiding the client information from the server helps reduce privacy-related attacks. Therefore, the privacy of the client's identity, along with the privacy of the client's data, is necessary to make such attacks more difficult. This paper proposes an efficient and privacy-preserving protocol for FL based on group signature. A new group signature for federated learning, called GSFL, is designed to not only protect the privacy of the client's data and identity but also significantly reduce the computation and communication costs considering the iterative process of federated learning. We show that GSFL outperforms existing approaches in terms of computation, communication, and signaling costs. Also, we show that the proposed protocol can handle various security attacks in the federated learning environment.
△ Less
Submitted 15 July, 2022; v1 submitted 12 July, 2022;
originally announced July 2022.
-
ST-CoNAL: Consistency-Based Acquisition Criterion Using Temporal Self-Ensemble for Active Learning
Authors:
Jae Soon Baik,
In Young Yoon,
Jun Won Choi
Abstract:
Modern deep learning has achieved great success in various fields. However, it requires the labeling of huge amounts of data, which is expensive and labor-intensive. Active learning (AL), which identifies the most informative samples to be labeled, is becoming increasingly important to maximize the efficiency of the training process. The existing AL methods mostly use only a single final fixed mod…
▽ More
Modern deep learning has achieved great success in various fields. However, it requires the labeling of huge amounts of data, which is expensive and labor-intensive. Active learning (AL), which identifies the most informative samples to be labeled, is becoming increasingly important to maximize the efficiency of the training process. The existing AL methods mostly use only a single final fixed model for acquiring the samples to be labeled. This strategy may not be good enough in that the structural uncertainty of a model for given training data is not considered to acquire the samples. In this study, we propose a novel acquisition criterion based on temporal self-ensemble generated by conventional stochastic gradient descent (SGD) optimization. These self-ensemble models are obtained by capturing the intermediate network weights obtained through SGD iterations. Our acquisition function relies on a consistency measure between the student and teacher models. The student models are given a fixed number of temporal self-ensemble models, and the teacher model is constructed by averaging the weights of the student models. Using the proposed acquisition criterion, we present an AL algorithm, namely student-teacher consistency-based AL (ST-CoNAL). Experiments conducted for image classification tasks on CIFAR-10, CIFAR-100, Caltech-256, and Tiny ImageNet datasets demonstrate that the proposed ST-CoNAL achieves significantly better performance than the existing acquisition methods. Furthermore, extensive experiments show the robustness and effectiveness of our methods.
△ Less
Submitted 16 October, 2022; v1 submitted 5 July, 2022;
originally announced July 2022.
-
DBN-Mix: Training Dual Branch Network Using Bilateral Mixup Augmentation for Long-Tailed Visual Recognition
Authors:
Jae Soon Baik,
In Young Yoon,
Jun Won Choi
Abstract:
There is growing interest in the challenging visual perception task of learning from long-tailed class distributions. The extreme class imbalance in the training dataset biases the model to prefer recognizing majority class data over minority class data. Furthermore, the lack of diversity in minority class samples makes it difficult to find a good representation. In this paper, we propose an effec…
▽ More
There is growing interest in the challenging visual perception task of learning from long-tailed class distributions. The extreme class imbalance in the training dataset biases the model to prefer recognizing majority class data over minority class data. Furthermore, the lack of diversity in minority class samples makes it difficult to find a good representation. In this paper, we propose an effective data augmentation method, referred to as bilateral mixup augmentation, which can improve the performance of long-tailed visual recognition. The bilateral mixup augmentation combines two samples generated by a uniform sampler and a re-balanced sampler and augments the training dataset to enhance the representation learning for minority classes. We also reduce the classifier bias using class-wise temperature scaling, which scales the logits differently per class in the training phase. We apply both ideas to the dual-branch network (DBN) framework, presenting a new model, named dual-branch network with bilateral mixup (DBN-Mix). Experiments on popular long-tailed visual recognition datasets show that DBN-Mix improves performance significantly over baseline and that the proposed method achieves state-of-the-art performance in some categories of benchmarks.
△ Less
Submitted 20 August, 2022; v1 submitted 5 July, 2022;
originally announced July 2022.
-
Few-Shot Unlearning by Model Inversion
Authors:
Youngsik Yoon,
Jinhwan Nam,
Hyojeong Yun,
Jaeho Lee,
Dongwoo Kim,
Jungseul Ok
Abstract:
We consider a practical scenario of machine unlearning to erase a target dataset, which causes unexpected behavior from the trained model. The target dataset is often assumed to be fully identifiable in a standard unlearning scenario. Such a flawless identification, however, is almost impossible if the training dataset is inaccessible at the time of unlearning. Unlike previous approaches requiring…
▽ More
We consider a practical scenario of machine unlearning to erase a target dataset, which causes unexpected behavior from the trained model. The target dataset is often assumed to be fully identifiable in a standard unlearning scenario. Such a flawless identification, however, is almost impossible if the training dataset is inaccessible at the time of unlearning. Unlike previous approaches requiring a complete set of targets, we consider few-shot unlearning scenario when only a few samples of target data are available. To this end, we formulate the few-shot unlearning problem specifying intentions behind the unlearning request (e.g., purely unlearning, mislabel correction, privacy protection), and we devise a straightforward framework that (i) retrieves a proxy of the training data via model inversion fully exploiting information available in the context of unlearning; (ii) adjusts the proxy according to the unlearning intention; and (iii) updates the model with the adjusted proxy. We demonstrate that our method using only a subset of target data can outperform the state-of-the-art unlearning methods even with a complete indication of target data.
△ Less
Submitted 14 March, 2023; v1 submitted 31 May, 2022;
originally announced May 2022.
-
Evaluating the Quality of a Synthesized Motion with the Fréchet Motion Distance
Authors:
Antoine Maiorca,
Youngwoo Yoon,
Thierry Dutoit
Abstract:
Evaluating the Quality of a Synthesized Motion with the Fréchet Motion Distance
Evaluating the Quality of a Synthesized Motion with the Fréchet Motion Distance
△ Less
Submitted 27 April, 2022; v1 submitted 26 April, 2022;
originally announced April 2022.
-
How does fake news use a thumbnail? CLIP-based Multimodal Detection on the Unrepresentative News Image
Authors:
Hyewon Choi,
Yejun Yoon,
Seunghyun Yoon,
Kunwoo Park
Abstract:
This study investigates how fake news uses a thumbnail for a news article with a focus on whether a news article's thumbnail represents the news content correctly. A news article shared with an irrelevant thumbnail can mislead readers into having a wrong impression of the issue, especially in social media environments where users are less likely to click the link and consume the entire content. We…
▽ More
This study investigates how fake news uses a thumbnail for a news article with a focus on whether a news article's thumbnail represents the news content correctly. A news article shared with an irrelevant thumbnail can mislead readers into having a wrong impression of the issue, especially in social media environments where users are less likely to click the link and consume the entire content. We propose to capture the degree of semantic incongruity in the multimodal relation by using the pretrained CLIP representation. From a source-level analysis, we found that fake news employs a more incongruous image to the main content than general news. Going further, we attempted to detect news articles with image-text incongruity. Evaluation experiments suggest that CLIP-based methods can successfully detect news articles in which the thumbnail is semantically irrelevant to news text. This study contributes to the research by providing a novel view on tackling online fake news and misinformation. Code and datasets are available at https://github.com/ssu-humane/fake-news-thumbnail.
△ Less
Submitted 27 April, 2022; v1 submitted 12 April, 2022;
originally announced April 2022.
-
Collaborative Transformers for Grounded Situation Recognition
Authors:
Junhyeong Cho,
Youngseok Yoon,
Suha Kwak
Abstract:
Grounded situation recognition is the task of predicting the main activity, entities playing certain roles within the activity, and bounding-box groundings of the entities in the given image. To effectively deal with this challenging task, we introduce a novel approach where the two processes for activity classification and entity estimation are interactive and complementary. To implement this ide…
▽ More
Grounded situation recognition is the task of predicting the main activity, entities playing certain roles within the activity, and bounding-box groundings of the entities in the given image. To effectively deal with this challenging task, we introduce a novel approach where the two processes for activity classification and entity estimation are interactive and complementary. To implement this idea, we propose Collaborative Glance-Gaze TransFormer (CoFormer) that consists of two modules: Glance transformer for activity classification and Gaze transformer for entity estimation. Glance transformer predicts the main activity with the help of Gaze transformer that analyzes entities and their relations, while Gaze transformer estimates the grounded entities by focusing only on the entities relevant to the activity predicted by Glance transformer. Our CoFormer achieves the state of the art in all evaluation metrics on the SWiG dataset. Training code and model weights are available at https://github.com/jhcho99/CoFormer.
△ Less
Submitted 30 March, 2022;
originally announced March 2022.
-
Disperse rotation operator DRT and use in some stream ciphers
Authors:
Yong-Jin Kim,
Yong-Ho Yon,
Son-Gyong Kim
Abstract:
The rotation operator is frequently used in several stream ciphers, including HC-128, Rabbit, and Salsa20, the final candidates for eSTREAM. This is because the rotation operator (ROT) is simple but has very good dispersibility. In this paper, we propose a disperse rotation operator (DRT), which has the same structure as ROT but has better dispersibility. In addition, the use of DRT instead of ROT…
▽ More
The rotation operator is frequently used in several stream ciphers, including HC-128, Rabbit, and Salsa20, the final candidates for eSTREAM. This is because the rotation operator (ROT) is simple but has very good dispersibility. In this paper, we propose a disperse rotation operator (DRT), which has the same structure as ROT but has better dispersibility. In addition, the use of DRT instead of ROT has shown that the quality of the output stream of all three stream ciphers was significantly improved. However, the use of DRT instead of ROT in the HC-128 stream cipher prevents the expansion of differential attacks based on LSB.
△ Less
Submitted 2 March, 2022;
originally announced March 2022.
-
Effective Urban Region Representation Learning Using Heterogeneous Urban Graph Attention Network (HUGAT)
Authors:
Namwoo Kim,
Yoonjin Yoon
Abstract:
Revealing the hidden patterns shaping the urban environment is essential to understand its dynamics and to make cities smarter. Recent studies have demonstrated that learning the representations of urban regions can be an effective strategy to uncover the intrinsic characteristics of urban areas. However, existing studies lack in incorporating diversity in urban data sources. In this work, we prop…
▽ More
Revealing the hidden patterns shaping the urban environment is essential to understand its dynamics and to make cities smarter. Recent studies have demonstrated that learning the representations of urban regions can be an effective strategy to uncover the intrinsic characteristics of urban areas. However, existing studies lack in incorporating diversity in urban data sources. In this work, we propose heterogeneous urban graph attention network (HUGAT), which incorporates heterogeneity of diverse urban datasets. In HUGAT, heterogeneous urban graph (HUG) incorporates both the geo-spatial and temporal people movement variations in a single graph structure. Given a HUG, a set of meta-paths are designed to capture the rich urban semantics as composite relations between nodes. Region embedding is carried out using heterogeneous graph attention network (HAN). HUGAT is designed to consider multiple learning objectives of city's geo-spatial and mobility variations simultaneously. In our extensive experiments on NYC data, HUGAT outperformed all the state-of-the-art models. Moreover, it demonstrated a robust generalization capability across the various prediction tasks of crime, average personal income, and bike flow as well as the spatial clustering task.
△ Less
Submitted 17 February, 2022;
originally announced February 2022.