-
Intrinsic Fingerprint of LLMs: Continue Training is NOT All You Need to Steal A Model!
Authors:
Do-hyeon Yoon,
Minsoo Chun,
Thomas Allen,
Hans Müller,
Min Wang,
Rajesh Sharma
Abstract:
Large language models (LLMs) face significant copyright and intellectual property challenges as the cost of training increases and model reuse becomes prevalent. While watermarking techniques have been proposed to protect model ownership, they may not be robust to continue training and development, posing serious threats to model attribution and copyright protection. This work introduces a simple…
▽ More
Large language models (LLMs) face significant copyright and intellectual property challenges as the cost of training increases and model reuse becomes prevalent. While watermarking techniques have been proposed to protect model ownership, they may not be robust to continue training and development, posing serious threats to model attribution and copyright protection. This work introduces a simple yet effective approach for robust LLM fingerprinting based on intrinsic model characteristics. We discover that the standard deviation distributions of attention parameter matrices across different layers exhibit distinctive patterns that remain stable even after extensive continued training. These parameter distribution signatures serve as robust fingerprints that can reliably identify model lineage and detect potential copyright infringement. Our experimental validation across multiple model families demonstrates the effectiveness of our method for model authentication. Notably, our investigation uncovers evidence that a recently Pangu Pro MoE model released by Huawei is derived from Qwen-2.5 14B model through upcycling techniques rather than training from scratch, highlighting potential cases of model plagiarism, copyright violation, and information fabrication. These findings underscore the critical importance of developing robust fingerprinting methods for protecting intellectual property in large-scale model development and emphasize that deliberate continued training alone is insufficient to completely obscure model origins.
△ Less
Submitted 2 July, 2025;
originally announced July 2025.
-
A Multi-Pass Large Language Model Framework for Precise and Efficient Radiology Report Error Detection
Authors:
Songsoo Kim,
Seungtae Lee,
See Young Lee,
Joonho Kim,
Keechan Kan,
Dukyong Yoon
Abstract:
Background: The positive predictive value (PPV) of large language model (LLM)-based proofreading for radiology reports is limited due to the low error prevalence. Purpose: To assess whether a three-pass LLM framework enhances PPV and reduces operational costs compared with baseline approaches. Materials and Methods: A retrospective analysis was performed on 1,000 consecutive radiology reports (250…
▽ More
Background: The positive predictive value (PPV) of large language model (LLM)-based proofreading for radiology reports is limited due to the low error prevalence. Purpose: To assess whether a three-pass LLM framework enhances PPV and reduces operational costs compared with baseline approaches. Materials and Methods: A retrospective analysis was performed on 1,000 consecutive radiology reports (250 each: radiography, ultrasonography, CT, MRI) from the MIMIC-III database. Two external datasets (CheXpert and Open-i) were validation sets. Three LLM frameworks were tested: (1) single-prompt detector; (2) extractor plus detector; and (3) extractor, detector, and false-positive verifier. Precision was measured by PPV and absolute true positive rate (aTPR). Efficiency was calculated from model inference charges and reviewer remuneration. Statistical significance was tested using cluster bootstrap, exact McNemar tests, and Holm-Bonferroni correction. Results: Framework PPV increased from 0.063 (95% CI, 0.036-0.101, Framework 1) to 0.079 (0.049-0.118, Framework 2), and significantly to 0.159 (0.090-0.252, Framework 3; P<.001 vs. baselines). aTPR remained stable (0.012-0.014; P>=.84). Operational costs per 1,000 reports dropped to USD 5.58 (Framework 3) from USD 9.72 (Framework 1) and USD 6.85 (Framework 2), reflecting reductions of 42.6% and 18.5%, respectively. Human-reviewed reports decreased from 192 to 88. External validation supported Framework 3's superior PPV (CheXpert 0.133, Open-i 0.105) and stable aTPR (0.007). Conclusion: A three-pass LLM framework significantly enhanced PPV and reduced operational costs, maintaining detection performance, providing an effective strategy for AI-assisted radiology report quality assurance.
△ Less
Submitted 25 June, 2025;
originally announced June 2025.
-
Stop learning it all to mitigate visual hallucination, Focus on the hallucination target
Authors:
Dokyoon Yoon,
Youngsook Song,
Woomyong Park
Abstract:
Multimodal Large Language Models (MLLMs) frequently suffer from hallucination issues, generating information about objects that are not present in input images during vision-language tasks. These hallucinations particularly undermine model reliability in practical applications requiring accurate object identification. To address this challenge, we propose \mymethod,\ a preference learning approach…
▽ More
Multimodal Large Language Models (MLLMs) frequently suffer from hallucination issues, generating information about objects that are not present in input images during vision-language tasks. These hallucinations particularly undermine model reliability in practical applications requiring accurate object identification. To address this challenge, we propose \mymethod,\ a preference learning approach that mitigates hallucinations by focusing on targeted areas where they occur. To implement this, we build a dataset containing hallucinated responses, correct responses, and target information (i.e., objects present in the images and the corresponding chunk positions in responses affected by hallucinations). By applying a preference learning method restricted to these specific targets, the model can filter out irrelevant signals and focus on correcting hallucinations. This allows the model to produce more factual responses by concentrating solely on relevant information. Experimental results demonstrate that \mymethod\ effectively reduces hallucinations across multiple vision hallucination tasks, improving the reliability and performance of MLLMs without diminishing overall performance.
△ Less
Submitted 12 June, 2025;
originally announced June 2025.
-
Data Augmentation For Small Object using Fast AutoAugment
Authors:
DaeEun Yoon,
Semin Kim,
SangWook Yoo,
Jongha Lee
Abstract:
In recent years, there has been tremendous progress in object detection performance. However, despite these advances, the detection performance for small objects is significantly inferior to that of large objects. Detecting small objects is one of the most challenging and important problems in computer vision. To improve the detection performance for small objects, we propose an optimal data augme…
▽ More
In recent years, there has been tremendous progress in object detection performance. However, despite these advances, the detection performance for small objects is significantly inferior to that of large objects. Detecting small objects is one of the most challenging and important problems in computer vision. To improve the detection performance for small objects, we propose an optimal data augmentation method using Fast AutoAugment. Through our proposed method, we can quickly find optimal augmentation policies that can overcome degradation when detecting small objects, and we achieve a 20% performance improvement on the DOTA dataset.
△ Less
Submitted 10 June, 2025;
originally announced June 2025.
-
Mic-hackathon 2024: Hackathon on Machine Learning for Electron and Scanning Probe Microscopy
Authors:
Utkarsh Pratiush,
Austin Houston,
Kamyar Barakati,
Aditya Raghavan,
Dasol Yoon,
Harikrishnan KP,
Zhaslan Baraissov,
Desheng Ma,
Samuel S. Welborn,
Mikolaj Jakowski,
Shawn-Patrick Barhorst,
Alexander J. Pattison,
Panayotis Manganaris,
Sita Sirisha Madugula,
Sai Venkata Gayathri Ayyagari,
Vishal Kennedy,
Ralph Bulanadi,
Michelle Wang,
Kieran J. Pang,
Ian Addison-Smith,
Willy Menacho,
Horacio V. Guzman,
Alexander Kiefer,
Nicholas Furth,
Nikola L. Kolev
, et al. (48 additional authors not shown)
Abstract:
Microscopy is a primary source of information on materials structure and functionality at nanometer and atomic scales. The data generated is often well-structured, enriched with metadata and sample histories, though not always consistent in detail or format. The adoption of Data Management Plans (DMPs) by major funding agencies promotes preservation and access. However, deriving insights remains d…
▽ More
Microscopy is a primary source of information on materials structure and functionality at nanometer and atomic scales. The data generated is often well-structured, enriched with metadata and sample histories, though not always consistent in detail or format. The adoption of Data Management Plans (DMPs) by major funding agencies promotes preservation and access. However, deriving insights remains difficult due to the lack of standardized code ecosystems, benchmarks, and integration strategies. As a result, data usage is inefficient and analysis time is extensive. In addition to post-acquisition analysis, new APIs from major microscope manufacturers enable real-time, ML-based analytics for automated decision-making and ML-agent-controlled microscope operation. Yet, a gap remains between the ML and microscopy communities, limiting the impact of these methods on physics, materials discovery, and optimization. Hackathons help bridge this divide by fostering collaboration between ML researchers and microscopy experts. They encourage the development of novel solutions that apply ML to microscopy, while preparing a future workforce for instrumentation, materials science, and applied ML. This hackathon produced benchmark datasets and digital twins of microscopes to support community growth and standardized workflows. All related code is available at GitHub: https://github.com/KalininGroup/Mic-hackathon-2024-codes-publication/tree/1.0.0.1
△ Less
Submitted 27 June, 2025; v1 submitted 9 June, 2025;
originally announced June 2025.
-
WikiGap: Promoting Epistemic Equity by Surfacing Knowledge Gaps Between English Wikipedia and other Language Editions
Authors:
Zining Wang,
Yuxuan Zhang,
Dongwook Yoon,
Nicholas Vincent,
Farhan Samir,
Vered Shwartz
Abstract:
With more than 11 times as many pageviews as the next, English Wikipedia dominates global knowledge access relative to other language editions. Readers are prone to assuming English Wikipedia as a superset of all language editions, leading many to prefer it even when their primary language is not English. Other language editions, however, comprise complementary facts rooted in their respective cul…
▽ More
With more than 11 times as many pageviews as the next, English Wikipedia dominates global knowledge access relative to other language editions. Readers are prone to assuming English Wikipedia as a superset of all language editions, leading many to prefer it even when their primary language is not English. Other language editions, however, comprise complementary facts rooted in their respective cultures and media environments, which are marginalized in English Wikipedia. While Wikipedia's user interface enables switching between language editions through its Interlanguage Link (ILL) system, it does not reveal to readers that other language editions contain valuable, complementary information. We present WikiGap, a system that surfaces complementary facts sourced from other Wikipedias within the English Wikipedia interface. Specifically, by combining a recent multilingual information-gap discovery method with a user-centered design, WikiGap enables access to complementary information from French, Russian, and Chinese Wikipedia. In a mixed-methods study (n=21), WikiGap significantly improved fact-finding accuracy, reduced task time, and received a 32-point higher usability score relative to Wikipedia's current ILL-based navigation system. Participants reported increased awareness of the availability of complementary information in non-English editions and reconsidered the completeness of English Wikipedia. WikiGap thus paves the way for improved epistemic equity across language editions.
△ Less
Submitted 4 June, 2025; v1 submitted 30 May, 2025;
originally announced May 2025.
-
Reasoning Models Better Express Their Confidence
Authors:
Dongkeun Yoon,
Seungone Kim,
Sohee Yang,
Sunkyoung Kim,
Soyeon Kim,
Yongil Kim,
Eunbi Choi,
Yireun Kim,
Minjoon Seo
Abstract:
Despite their strengths, large language models (LLMs) often fail to communicate their confidence accurately, making it difficult to assess when they might be wrong and limiting their reliability. In this work, we demonstrate that reasoning models-LLMs that engage in extended chain-of-thought (CoT) reasoning-exhibit superior performance not only in problem-solving but also in accurately expressing…
▽ More
Despite their strengths, large language models (LLMs) often fail to communicate their confidence accurately, making it difficult to assess when they might be wrong and limiting their reliability. In this work, we demonstrate that reasoning models-LLMs that engage in extended chain-of-thought (CoT) reasoning-exhibit superior performance not only in problem-solving but also in accurately expressing their confidence. Specifically, we benchmark six reasoning models across six datasets and find that they achieve strictly better confidence calibration than their non-reasoning counterparts in 33 out of the 36 settings. Our detailed analysis reveals that these gains in calibration stem from the slow thinking behaviors of reasoning models-such as exploring alternative approaches and backtracking-which enable them to adjust their confidence dynamically throughout their CoT, making it progressively more accurate. In particular, we find that reasoning models become increasingly better calibrated as their CoT unfolds, a trend not observed in non-reasoning models. Moreover, removing slow thinking behaviors from the CoT leads to a significant drop in calibration. Lastly, we show that these gains are not exclusive to reasoning models-non-reasoning models also benefit when guided to perform slow thinking via in-context learning.
△ Less
Submitted 20 May, 2025;
originally announced May 2025.
-
EgoDex: Learning Dexterous Manipulation from Large-Scale Egocentric Video
Authors:
Ryan Hoque,
Peide Huang,
David J. Yoon,
Mouli Sivapurapu,
Jian Zhang
Abstract:
Imitation learning for manipulation has a well-known data scarcity problem. Unlike natural language and 2D computer vision, there is no Internet-scale corpus of data for dexterous manipulation. One appealing option is egocentric human video, a passively scalable data source. However, existing large-scale datasets such as Ego4D do not have native hand pose annotations and do not focus on object man…
▽ More
Imitation learning for manipulation has a well-known data scarcity problem. Unlike natural language and 2D computer vision, there is no Internet-scale corpus of data for dexterous manipulation. One appealing option is egocentric human video, a passively scalable data source. However, existing large-scale datasets such as Ego4D do not have native hand pose annotations and do not focus on object manipulation. To this end, we use Apple Vision Pro to collect EgoDex: the largest and most diverse dataset of dexterous human manipulation to date. EgoDex has 829 hours of egocentric video with paired 3D hand and finger tracking data collected at the time of recording, where multiple calibrated cameras and on-device SLAM can be used to precisely track the pose of every joint of each hand. The dataset covers a wide range of diverse manipulation behaviors with everyday household objects in 194 different tabletop tasks ranging from tying shoelaces to folding laundry. Furthermore, we train and systematically evaluate imitation learning policies for hand trajectory prediction on the dataset, introducing metrics and benchmarks for measuring progress in this increasingly important area. By releasing this large-scale dataset, we hope to push the frontier of robotics, computer vision, and foundation models.
△ Less
Submitted 16 May, 2025;
originally announced May 2025.
-
NSegment : Label-specific Deformations for Remote Sensing Image Segmentation
Authors:
Yechan Kim,
DongHo Yoon,
SooYeon Kim,
Moongu Jeon
Abstract:
Labeling errors in remote sensing (RS) image segmentation datasets often remain implicit and subtle due to ambiguous class boundaries, mixed pixels, shadows, complex terrain features, and subjective annotator bias. Furthermore, the scarcity of annotated RS data due to high image acquisition and labeling costs complicates training noise-robust models. While sophisticated mechanisms such as label se…
▽ More
Labeling errors in remote sensing (RS) image segmentation datasets often remain implicit and subtle due to ambiguous class boundaries, mixed pixels, shadows, complex terrain features, and subjective annotator bias. Furthermore, the scarcity of annotated RS data due to high image acquisition and labeling costs complicates training noise-robust models. While sophisticated mechanisms such as label selection or noise correction might address this issue, they tend to increase training time and add implementation complexity. In this letter, we propose NSegment-a simple yet effective data augmentation solution to mitigate this issue. Unlike traditional methods, it applies elastic transformations only to segmentation labels, varying deformation intensity per sample in each training epoch to address annotation inconsistencies. Experimental results demonstrate that our approach improves the performance of RS image segmentation on various state-of-the-art models.
△ Less
Submitted 27 June, 2025; v1 submitted 28 April, 2025;
originally announced April 2025.
-
HPU: High-Bandwidth Processing Unit for Scalable, Cost-effective LLM Inference via GPU Co-processing
Authors:
Myunghyun Rhee,
Joonseop Sim,
Taeyoung Ahn,
Seungyong Lee,
Daegun Yoon,
Euiseok Kim,
Kyoung Park,
Youngpyo Joo,
Hosik Kim
Abstract:
The attention layer, a core component of Transformer-based LLMs, brings out inefficiencies in current GPU systems due to its low operational intensity and the substantial memory requirements of KV caches. We propose a High-bandwidth Processing Unit (HPU), a memoryintensive co-processor that enhances GPU resource utilization during large-batched LLM inference. By offloading memory-bound operations,…
▽ More
The attention layer, a core component of Transformer-based LLMs, brings out inefficiencies in current GPU systems due to its low operational intensity and the substantial memory requirements of KV caches. We propose a High-bandwidth Processing Unit (HPU), a memoryintensive co-processor that enhances GPU resource utilization during large-batched LLM inference. By offloading memory-bound operations, the HPU allows the GPU to focus on compute-intensive tasks, increasing overall efficiency. Also, the HPU, as an add-on card, scales out to accommodate surging memory demands driven by large batch sizes and extended sequence lengths. In this paper, we show the HPU prototype implemented with PCIe-based FPGA cards mounted on a GPU system. Our novel GPU-HPU heterogeneous system demonstrates up to 4.1x performance gains and 4.6x energy efficiency improvements over a GPUonly system, providing scalability without increasing the number of GPUs.
△ Less
Submitted 17 April, 2025;
originally announced April 2025.
-
M-Prometheus: A Suite of Open Multilingual LLM Judges
Authors:
José Pombal,
Dongkeun Yoon,
Patrick Fernandes,
Ian Wu,
Seungone Kim,
Ricardo Rei,
Graham Neubig,
André F. T. Martins
Abstract:
The use of language models for automatically evaluating long-form text (LLM-as-a-judge) is becoming increasingly common, yet most LLM judges are optimized exclusively for English, with strategies for enhancing their multilingual evaluation capabilities remaining largely unexplored in the current literature. This has created a disparity in the quality of automatic evaluation methods for non-English…
▽ More
The use of language models for automatically evaluating long-form text (LLM-as-a-judge) is becoming increasingly common, yet most LLM judges are optimized exclusively for English, with strategies for enhancing their multilingual evaluation capabilities remaining largely unexplored in the current literature. This has created a disparity in the quality of automatic evaluation methods for non-English languages, ultimately hindering the development of models with better multilingual capabilities. To bridge this gap, we introduce M-Prometheus, a suite of open-weight LLM judges ranging from 3B to 14B parameters that can provide both direct assessment and pairwise comparison feedback on multilingual outputs. M-Prometheus models outperform state-of-the-art open LLM judges on multilingual reward benchmarks spanning more than 20 languages, as well as on literary machine translation (MT) evaluation covering 4 language pairs. Furthermore, M-Prometheus models can be leveraged at decoding time to significantly improve generated outputs across all 3 tested languages, showcasing their utility for the development of better multilingual models. Lastly, through extensive ablations, we identify the key factors for obtaining an effective multilingual judge, including backbone model selection and training on natively multilingual feedback data instead of translated data. We release our models, training dataset, and code.
△ Less
Submitted 7 April, 2025;
originally announced April 2025.
-
Humanoid Policy ~ Human Policy
Authors:
Ri-Zhao Qiu,
Shiqi Yang,
Xuxin Cheng,
Chaitanya Chawla,
Jialong Li,
Tairan He,
Ge Yan,
David J. Yoon,
Ryan Hoque,
Lars Paulsen,
Ge Yang,
Jian Zhang,
Sha Yi,
Guanya Shi,
Xiaolong Wang
Abstract:
Training manipulation policies for humanoid robots with diverse data enhances their robustness and generalization across tasks and platforms. However, learning solely from robot demonstrations is labor-intensive, requiring expensive tele-operated data collection which is difficult to scale. This paper investigates a more scalable data source, egocentric human demonstrations, to serve as cross-embo…
▽ More
Training manipulation policies for humanoid robots with diverse data enhances their robustness and generalization across tasks and platforms. However, learning solely from robot demonstrations is labor-intensive, requiring expensive tele-operated data collection which is difficult to scale. This paper investigates a more scalable data source, egocentric human demonstrations, to serve as cross-embodiment training data for robot learning. We mitigate the embodiment gap between humanoids and humans from both the data and modeling perspectives. We collect an egocentric task-oriented dataset (PH2D) that is directly aligned with humanoid manipulation demonstrations. We then train a human-humanoid behavior policy, which we term Human Action Transformer (HAT). The state-action space of HAT is unified for both humans and humanoid robots and can be differentiably retargeted to robot actions. Co-trained with smaller-scale robot data, HAT directly models humanoid robots and humans as different embodiments without additional supervision. We show that human data improves both generalization and robustness of HAT with significantly better data collection efficiency. Code and data: https://human-as-robot.github.io/
△ Less
Submitted 24 March, 2025; v1 submitted 17 March, 2025;
originally announced March 2025.
-
Balancing Act: Trading Off Doppler Odometry and Map Registration for Efficient Lidar Localization
Authors:
Katya M. Papais,
Daniil Lisus,
David J. Yoon,
Andrew Lambert,
Keith Y. K. Leung,
Timothy D. Barfoot
Abstract:
Most autonomous vehicles rely on accurate and efficient localization, which is achieved by comparing live sensor data to a preexisting map, to navigate their environment. Balancing the accuracy of localization with computational efficiency remains a significant challenge, as high-accuracy methods often come with higher computational costs. In this paper, we present two ways of improving lidar loca…
▽ More
Most autonomous vehicles rely on accurate and efficient localization, which is achieved by comparing live sensor data to a preexisting map, to navigate their environment. Balancing the accuracy of localization with computational efficiency remains a significant challenge, as high-accuracy methods often come with higher computational costs. In this paper, we present two ways of improving lidar localization efficiency and study their impact on performance. First, we integrate a lightweight Doppler-based odometry method into a topometric localization pipeline and compare its performance against an iterative closest point (ICP)-based method. We highlight the trade-offs between these approaches: the Doppler estimator offers faster, lightweight updates, while ICP provides higher accuracy at the cost of increased computational load. Second, by controlling the frequency of localization updates and leveraging odometry estimates between them, we demonstrate that accurate localization can be maintained while optimizing for computational efficiency using either odometry method. Our experimental results show that localizing every 10 lidar frames strikes a favourable balance, achieving a localization accuracy below 0.05 meters in translation and below 0.1 degrees in orientation while reducing computational effort by over 30% in an ICP-based pipeline. We quantify the trade-off of accuracy to computational effort using over 100 kilometers of real-world driving data in different on-road environments.
△ Less
Submitted 3 March, 2025;
originally announced March 2025.
-
Who Reaps All the Superchats? A Large-Scale Analysis of Income Inequality in Virtual YouTuber Livestreaming
Authors:
Ruijing Zhao,
Brian Diep,
Jiaxin Pei,
Dongwook Yoon,
David Jurgens,
Jian Zhu
Abstract:
The explosive growth of Virtual YouTubers (VTubers)-streamers who perform behind virtual anime avatars-has created a unique digital economy with profound implications for content creators, platforms, and viewers. Understanding the economic landscape of VTubers is crucial for designing equitable platforms, supporting content creator livelihoods, and fostering sustainable digital communities. To thi…
▽ More
The explosive growth of Virtual YouTubers (VTubers)-streamers who perform behind virtual anime avatars-has created a unique digital economy with profound implications for content creators, platforms, and viewers. Understanding the economic landscape of VTubers is crucial for designing equitable platforms, supporting content creator livelihoods, and fostering sustainable digital communities. To this end, we conducted a large-scale study of over 1 million hours of publicly available streaming records from 1,923 VTubers on YouTube, covering tens of millions of dollars in actual profits. Our analysis reveals stark inequality within the VTuber community and characterizes the sources of income for VTubers from multiple perspectives. Furthermore, we also found that the VTuber community is increasingly monopolized by two agencies, driving the financial disparity. This research illuminates the financial dynamics of VTuber communities, informing the design of equitable platforms and sustainable support systems for digital content creators.
△ Less
Submitted 2 March, 2025;
originally announced March 2025.
-
Acoustic Anomaly Detection on UAM Propeller Defect with Acoustic dataset for Crack of drone Propeller (ADCP)
Authors:
Juho Lee,
Donghyun Yoon,
Gumoon Jeong,
Hyeoncheol Kim
Abstract:
The imminent commercialization of UAM requires stable, AI-based maintenance systems to ensure safety for both passengers and pedestrians. This paper presents a methodology for non-destructively detecting cracks in UAM propellers using drone propeller sound datasets. Normal operating sounds were recorded, and abnormal sounds (categorized as ripped and broken) were differentiated by varying the micr…
▽ More
The imminent commercialization of UAM requires stable, AI-based maintenance systems to ensure safety for both passengers and pedestrians. This paper presents a methodology for non-destructively detecting cracks in UAM propellers using drone propeller sound datasets. Normal operating sounds were recorded, and abnormal sounds (categorized as ripped and broken) were differentiated by varying the microphone-propeller angle and throttle power. Our novel approach integrates FFT and STFT preprocessing techniques to capture both global frequency patterns and local time-frequency variations, thereby enhancing anomaly detection performance. The constructed Acoustic Dataset for Crack of Drone Propeller (ADCP) demonstrates the potential for detecting propeller cracks and lays the groundwork for future UAM maintenance applications.
△ Less
Submitted 2 March, 2025;
originally announced March 2025.
-
Integrating Urban Air Mobility with Highway Infrastructure: A Strategic Approach for Vertiport Location Selection in the Seoul Metropolitan Area
Authors:
Donghyun Yoon,
Minwoo Jeong,
Jinyong Lee,
Seyun Kim,
Yoonjin Yoon
Abstract:
This study focuses on identifying suitable locations for highway-transfer Vertiports to integrate Urban Air Mobility (UAM) with existing highway infrastructure. UAM offers an effective solution for enhancing transportation accessibility in the Seoul Metropolitan Area, where conventional transportation often struggle to connect suburban employment zones such as industrial parks. By integrating UAM…
▽ More
This study focuses on identifying suitable locations for highway-transfer Vertiports to integrate Urban Air Mobility (UAM) with existing highway infrastructure. UAM offers an effective solution for enhancing transportation accessibility in the Seoul Metropolitan Area, where conventional transportation often struggle to connect suburban employment zones such as industrial parks. By integrating UAM with ground transportation at highway facilities, an efficient connectivity solution can be achieved for regions with limited transportation options. Our proposed methodology for determining the suitable Vertiport locations utilizes data such as geographic information, origin-destination volume, and travel time. Vertiport candidates are evaluated and selected based on criteria including location desirability, combined transportation accessibility and transportation demand. Applying this methodology to the Seoul metropolitan area, we identify 56 suitable Vertiport locations out of 148 candidates. The proposed methodology offers a strategic approach for the selection of highway-transfer Vertiport locations, enhancing UAM integration with existing transportation systems. Our study provides valuable insights for urban planners and policymakers, with recommendations for future research to include real-time environmental data and to explore the impact of Mobility-as-a-Service on UAM operations.
△ Less
Submitted 1 February, 2025;
originally announced February 2025.
-
Domain-specific Question Answering with Hybrid Search
Authors:
Dewang Sultania,
Zhaoyu Lu,
Twisha Naik,
Franck Dernoncourt,
David Seunghyun Yoon,
Sanat Sharma,
Trung Bui,
Ashok Gupta,
Tushar Vatsa,
Suhas Suresha,
Ishita Verma,
Vibha Belavadi,
Cheng Chen,
Michael Friedrich
Abstract:
Domain specific question answering is an evolving field that requires specialized solutions to address unique challenges. In this paper, we show that a hybrid approach combining a fine-tuned dense retriever with keyword based sparse search methods significantly enhances performance. Our system leverages a linear combination of relevance signals, including cosine similarity from dense retrieval, BM…
▽ More
Domain specific question answering is an evolving field that requires specialized solutions to address unique challenges. In this paper, we show that a hybrid approach combining a fine-tuned dense retriever with keyword based sparse search methods significantly enhances performance. Our system leverages a linear combination of relevance signals, including cosine similarity from dense retrieval, BM25 scores, and URL host matching, each with tunable boost parameters. Experimental results indicate that this hybrid method outperforms our single-retriever system, achieving improved accuracy while maintaining robust contextual grounding. These findings suggest that integrating multiple retrieval methodologies with weighted scoring effectively addresses the complexities of domain specific question answering in enterprise settings.
△ Less
Submitted 21 December, 2024; v1 submitted 4 December, 2024;
originally announced December 2024.
-
A 2-step Framework for Automated Literary Translation Evaluation: Its Promises and Pitfalls
Authors:
Sheikh Shafayat,
Dongkeun Yoon,
Woori Jang,
Jiwoo Choi,
Alice Oh,
Seohyon Jung
Abstract:
In this work, we propose and evaluate the feasibility of a two-stage pipeline to evaluate literary machine translation, in a fine-grained manner, from English to Korean. The results show that our framework provides fine-grained, interpretable metrics suited for literary translation and obtains a higher correlation with human judgment than traditional machine translation metrics. Nonetheless, it st…
▽ More
In this work, we propose and evaluate the feasibility of a two-stage pipeline to evaluate literary machine translation, in a fine-grained manner, from English to Korean. The results show that our framework provides fine-grained, interpretable metrics suited for literary translation and obtains a higher correlation with human judgment than traditional machine translation metrics. Nonetheless, it still fails to match inter-human agreement, especially in metrics like Korean Honorifics. We also observe that LLMs tend to favor translations generated by other LLMs, and we highlight the necessity of developing more sophisticated evaluation methods to ensure accurate and culturally sensitive machine translation of literary works.
△ Less
Submitted 1 January, 2025; v1 submitted 2 December, 2024;
originally announced December 2024.
-
MSG score: A Comprehensive Evaluation for Multi-Scene Video Generation
Authors:
Daewon Yoon,
Hyungsuk Lee,
Wonsik Shin
Abstract:
This paper addresses the metrics required for generating multi-scene videos based on a continuous scenario, as opposed to traditional short video generation. Scenario-based videos require a comprehensive evaluation that considers multiple factors such as character consistency, artistic coherence, aesthetic quality, and the alignment of the generated content with the intended prompt. Additionally,…
▽ More
This paper addresses the metrics required for generating multi-scene videos based on a continuous scenario, as opposed to traditional short video generation. Scenario-based videos require a comprehensive evaluation that considers multiple factors such as character consistency, artistic coherence, aesthetic quality, and the alignment of the generated content with the intended prompt. Additionally, in video generation, unlike single images, the movement of characters across frames introduces potential issues like distortion or unintended changes, which must be effectively evaluated and corrected. In the context of probabilistic models like diffusion, generating the desired scene requires repeated sampling and manual selection, akin to how a film director chooses the best shots from numerous takes. We propose a score-based evaluation benchmark that automates this process, enabling a more objective and efficient assessment of these complexities. This approach allows for the generation of high-quality multi-scene videos by selecting the best outcomes based on automated scoring rather than manual inspection.
△ Less
Submitted 28 November, 2024;
originally announced November 2024.
-
Unsupervised Training of Diffusion Models for Feasible Solution Generation in Neural Combinatorial Optimization
Authors:
Seong-Hyun Hong,
Hyun-Sung Kim,
Zian Jang,
Deunsol Yoon,
Hyungseok Song,
Byung-Jun Lee
Abstract:
Recent advancements in neural combinatorial optimization (NCO) methods have shown promising results in generating near-optimal solutions without the need for expert-crafted heuristics. However, high performance of these approaches often rely on problem-specific human-expertise-based search after generating candidate solutions, limiting their applicability to commonly solved CO problems such as Tra…
▽ More
Recent advancements in neural combinatorial optimization (NCO) methods have shown promising results in generating near-optimal solutions without the need for expert-crafted heuristics. However, high performance of these approaches often rely on problem-specific human-expertise-based search after generating candidate solutions, limiting their applicability to commonly solved CO problems such as Traveling Salesman Problem (TSP). In this paper, we present IC/DC, an unsupervised CO framework that directly trains a diffusion model from scratch. We train our model in a self-supervised way to minimize the cost of the solution while adhering to the problem-specific constraints. IC/DC is specialized in addressing CO problems involving two distinct sets of items, and it does not need problem-specific search processes to generate valid solutions. IC/DC employs a novel architecture capable of capturing the intricate relationships between items, and thereby enabling effective optimization in challenging CO scenarios. IC/DC achieves state-of-the-art performance relative to existing NCO methods on the Parallel Machine Scheduling Problem (PMSP) and Asymmetric Traveling Salesman Problem (ATSP).
△ Less
Submitted 12 February, 2025; v1 submitted 15 October, 2024;
originally announced November 2024.
-
MM-Eval: A Multilingual Meta-Evaluation Benchmark for LLM-as-a-Judge and Reward Models
Authors:
Guijin Son,
Dongkeun Yoon,
Juyoung Suk,
Javier Aula-Blasco,
Mano Aslan,
Vu Trong Kim,
Shayekh Bin Islam,
Jaume Prats-Cristià,
Lucía Tormo-Bañuelos,
Seungone Kim
Abstract:
As Large Language Models (LLMs) are now capable of producing fluent and coherent content in languages other than English, it is not imperative to precisely evaluate these non-English outputs. However, when assessing the outputs from mutlilingual LLMs, prior works often employed LLM based evaluators that excel at assessing English outputs, without a thorough examination of whether these evaluators…
▽ More
As Large Language Models (LLMs) are now capable of producing fluent and coherent content in languages other than English, it is not imperative to precisely evaluate these non-English outputs. However, when assessing the outputs from mutlilingual LLMs, prior works often employed LLM based evaluators that excel at assessing English outputs, without a thorough examination of whether these evaluators could effectively assess non-English text as well. Moreover, existing benchmarks to test evaluator LLMs (referred to as "meta-evaluation benchmarks") are mostly English-centric. To bridge this gap and examine whether evaluator LLMs can reliably assess the outputs of multilingual LLMs, we introduce MM-Eval, a multilingual meta-evaluation benchmark comprising five core subsets covering 18 languages and a Language Consistency subset spanning 122 languages. A core attribute of MM-Eval is that, instead of merely translating existing English meta-evaluation benchmarks, it is designed with multilingual-specific challenges in mind. Additionally, unlike existing meta-evaluation benchmarks that focus solely on ranking accuracy over pairwise data, MM-Eval also evaluates the consistency and fairness of absolute score values across a wide range of languages. Our results show that existing evaluator LLMs that excel in English contexts have considerable room for improvement when assessing non-English outputs. Furthermore, we find that evaluators are unfair and inconsistent when evaluating lower-resourced languages. Finally, we validate MM-Eval by measuring its correlation with Best-of-N rankings, finding a significantly stronger correlation compared to other meta-evaluation benchmarks. We publicly release our benchmark and code.
△ Less
Submitted 29 March, 2025; v1 submitted 23 October, 2024;
originally announced October 2024.
-
RoCoTex: A Robust Method for Consistent Texture Synthesis with Diffusion Models
Authors:
Jangyeong Kim,
Donggoo Kang,
Junyoung Choi,
Jeonga Wi,
Junho Gwon,
Jiun Bae,
Dumim Yoon,
Junghyun Han
Abstract:
Text-to-texture generation has recently attracted increasing attention, but existing methods often suffer from the problems of view inconsistencies, apparent seams, and misalignment between textures and the underlying mesh. In this paper, we propose a robust text-to-texture method for generating consistent and seamless textures that are well aligned with the mesh. Our method leverages state-of-the…
▽ More
Text-to-texture generation has recently attracted increasing attention, but existing methods often suffer from the problems of view inconsistencies, apparent seams, and misalignment between textures and the underlying mesh. In this paper, we propose a robust text-to-texture method for generating consistent and seamless textures that are well aligned with the mesh. Our method leverages state-of-the-art 2D diffusion models, including SDXL and multiple ControlNets, to capture structural features and intricate details in the generated textures. The method also employs a symmetrical view synthesis strategy combined with regional prompts for enhancing view consistency. Additionally, it introduces novel texture blending and soft-inpainting techniques, which significantly reduce the seam regions. Extensive experiments demonstrate that our method outperforms existing state-of-the-art methods.
△ Less
Submitted 30 September, 2024;
originally announced September 2024.
-
The BiGGen Bench: A Principled Benchmark for Fine-grained Evaluation of Language Models with Language Models
Authors:
Seungone Kim,
Juyoung Suk,
Ji Yong Cho,
Shayne Longpre,
Chaeeun Kim,
Dongkeun Yoon,
Guijin Son,
Yejin Cho,
Sheikh Shafayat,
Jinheon Baek,
Sue Hyun Park,
Hyeonbin Hwang,
Jinkyung Jo,
Hyowon Cho,
Haebin Shin,
Seongyun Lee,
Hanseok Oh,
Noah Lee,
Namgyu Ho,
Se June Joo,
Miyoung Ko,
Yoonjoo Lee,
Hyungjoo Chae,
Jamin Shin,
Joel Jang
, et al. (7 additional authors not shown)
Abstract:
As language models (LMs) become capable of handling a wide range of tasks, their evaluation is becoming as challenging as their development. Most generation benchmarks currently assess LMs using abstract evaluation criteria like helpfulness and harmlessness, which often lack the flexibility and granularity of human assessment. Additionally, these benchmarks tend to focus disproportionately on spec…
▽ More
As language models (LMs) become capable of handling a wide range of tasks, their evaluation is becoming as challenging as their development. Most generation benchmarks currently assess LMs using abstract evaluation criteria like helpfulness and harmlessness, which often lack the flexibility and granularity of human assessment. Additionally, these benchmarks tend to focus disproportionately on specific capabilities such as instruction following, leading to coverage bias. To overcome these limitations, we introduce the BiGGen Bench, a principled generation benchmark designed to thoroughly evaluate nine distinct capabilities of LMs across 77 diverse tasks. A key feature of the BiGGen Bench is its use of instance-specific evaluation criteria, closely mirroring the nuanced discernment of human evaluation. We apply this benchmark to assess 103 frontier LMs using five evaluator LMs. Our code, data, and evaluation results are all publicly available at https://github.com/prometheus-eval/prometheus-eval/tree/main/BiGGen-Bench.
△ Less
Submitted 25 March, 2025; v1 submitted 9 June, 2024;
originally announced June 2024.
-
Retrieval Augmented Generation for Domain-specific Question Answering
Authors:
Sanat Sharma,
David Seunghyun Yoon,
Franck Dernoncourt,
Dewang Sultania,
Karishma Bagga,
Mengjiao Zhang,
Trung Bui,
Varun Kotte
Abstract:
Question answering (QA) has become an important application in the advanced development of large language models. General pre-trained large language models for question-answering are not trained to properly understand the knowledge or terminology for a specific domain, such as finance, healthcare, education, and customer service for a product. To better cater to domain-specific understanding, we b…
▽ More
Question answering (QA) has become an important application in the advanced development of large language models. General pre-trained large language models for question-answering are not trained to properly understand the knowledge or terminology for a specific domain, such as finance, healthcare, education, and customer service for a product. To better cater to domain-specific understanding, we build an in-house question-answering system for Adobe products. We propose a novel framework to compile a large question-answer database and develop the approach for retrieval-aware finetuning of a Large Language model. We showcase that fine-tuning the retriever leads to major improvements in the final generation. Our overall approach reduces hallucinations during generation while keeping in context the latest retrieval information for contextual grounding.
△ Less
Submitted 29 May, 2024; v1 submitted 23 April, 2024;
originally announced April 2024.
-
Are Doppler Velocity Measurements Useful for Spinning Radar Odometry?
Authors:
Daniil Lisus,
Keenan Burnett,
David J. Yoon,
Richard Poulton,
John Marshall,
Timothy D. Barfoot
Abstract:
Spinning, frequency-modulated continuous-wave (FMCW) radars with 360 degree coverage have been gaining popularity for autonomous-vehicle navigation. However, unlike `fixed' automotive radar, commercially available spinning radar systems typically do not produce radial velocities due to the lack of repeated measurements in the same direction and the fundamental hardware setup. To make these radial…
▽ More
Spinning, frequency-modulated continuous-wave (FMCW) radars with 360 degree coverage have been gaining popularity for autonomous-vehicle navigation. However, unlike `fixed' automotive radar, commercially available spinning radar systems typically do not produce radial velocities due to the lack of repeated measurements in the same direction and the fundamental hardware setup. To make these radial velocities observable, we modified the firmware of a commercial spinning radar to use triangular frequency modulation. In this paper, we develop a novel way to use this modulation to extract radial Doppler velocity measurements from consecutive azimuths of a radar intensity scan, without any data association. We show that these noisy, error-prone measurements contain enough information to provide good ego-velocity estimates, and incorporate these estimates into different modern odometry pipelines. We extensively evaluate the pipelines on over 110 km of driving data in progressively more geometrically challenging autonomous-driving environments. We show that Doppler velocity measurements improve odometry in well-defined geometric conditions and enable it to continue functioning even in severely geometrically degenerate environments, such as long tunnels.
△ Less
Submitted 5 December, 2024; v1 submitted 1 April, 2024;
originally announced April 2024.
-
Preserving Near-Optimal Gradient Sparsification Cost for Scalable Distributed Deep Learning
Authors:
Daegun Yoon,
Sangyoon Oh
Abstract:
Communication overhead is a major obstacle to scaling distributed training systems. Gradient sparsification is a potential optimization approach to reduce the communication volume without significant loss of model fidelity. However, existing gradient sparsification methods have low scalability owing to inefficient design of their algorithms, which raises the communication overhead significantly. I…
▽ More
Communication overhead is a major obstacle to scaling distributed training systems. Gradient sparsification is a potential optimization approach to reduce the communication volume without significant loss of model fidelity. However, existing gradient sparsification methods have low scalability owing to inefficient design of their algorithms, which raises the communication overhead significantly. In particular, gradient build-up and inadequate sparsity control methods degrade the sparsification performance considerably. Moreover, communication traffic increases drastically owing to workload imbalance of gradient selection between workers.
To address these challenges, we propose a novel gradient sparsification scheme called ExDyna. In ExDyna, the gradient tensor of the model comprises fined-grained blocks, and contiguous blocks are grouped into non-overlapping partitions. Each worker selects gradients in its exclusively allocated partition so that gradient build-up never occurs. To balance the workload of gradient selection between workers, ExDyna adjusts the topology of partitions by comparing the workloads of adjacent partitions. In addition, ExDyna supports online threshold scaling, which estimates the accurate threshold of gradient selection on-the-fly. Accordingly, ExDyna can satisfy the user-required sparsity level during a training period regardless of models and datasets. Therefore, ExDyna can enhance the scalability of distributed training systems by preserving near-optimal gradient sparsification cost. In experiments, ExDyna outperformed state-of-the-art sparsifiers in terms of training speed and sparsification performance while achieving high accuracy.
△ Less
Submitted 21 February, 2024;
originally announced February 2024.
-
LangBridge: Multilingual Reasoning Without Multilingual Supervision
Authors:
Dongkeun Yoon,
Joel Jang,
Sungdong Kim,
Seungone Kim,
Sheikh Shafayat,
Minjoon Seo
Abstract:
We introduce LangBridge, a zero-shot approach to adapt language models for multilingual reasoning tasks without multilingual supervision. LangBridge operates by bridging two models, each specialized in different aspects: (1) one specialized in understanding multiple languages (e.g., mT5 encoder) and (2) one specialized in reasoning (e.g., MetaMath). LangBridge connects the two models by introducin…
▽ More
We introduce LangBridge, a zero-shot approach to adapt language models for multilingual reasoning tasks without multilingual supervision. LangBridge operates by bridging two models, each specialized in different aspects: (1) one specialized in understanding multiple languages (e.g., mT5 encoder) and (2) one specialized in reasoning (e.g., MetaMath). LangBridge connects the two models by introducing minimal trainable parameters between them. Despite utilizing only English data for training, LangBridge considerably enhances the performance of language models on low-resource languages across mathematical reasoning, code completion, logical reasoning, and commonsense reasoning. Our analysis suggests that the efficacy of LangBridge stems from the language-agnostic characteristics of multilingual representations. We publicly release our code and models.
△ Less
Submitted 3 June, 2024; v1 submitted 19 January, 2024;
originally announced January 2024.
-
Deterministic Guidance Diffusion Model for Probabilistic Weather Forecasting
Authors:
Donggeun Yoon,
Minseok Seo,
Doyi Kim,
Yeji Choi,
Donghyeon Cho
Abstract:
Weather forecasting requires not only accuracy but also the ability to perform probabilistic prediction. However, deterministic weather forecasting methods do not support probabilistic predictions, and conversely, probabilistic models tend to be less accurate. To address these challenges, in this paper, we introduce the \textbf{\textit{D}}eterministic \textbf{\textit{G}}uidance \textbf{\textit{D}}…
▽ More
Weather forecasting requires not only accuracy but also the ability to perform probabilistic prediction. However, deterministic weather forecasting methods do not support probabilistic predictions, and conversely, probabilistic models tend to be less accurate. To address these challenges, in this paper, we introduce the \textbf{\textit{D}}eterministic \textbf{\textit{G}}uidance \textbf{\textit{D}}iffusion \textbf{\textit{M}}odel (DGDM) for probabilistic weather forecasting, integrating benefits of both deterministic and probabilistic approaches. During the forward process, both the deterministic and probabilistic models are trained end-to-end. In the reverse process, weather forecasting leverages the predicted result from the deterministic model, using as an intermediate starting point for the probabilistic model. By fusing deterministic models with probabilistic models in this manner, DGDM is capable of providing accurate forecasts while also offering probabilistic predictions. To evaluate DGDM, we assess it on the global weather forecasting dataset (WeatherBench) and the common video frame prediction benchmark (Moving MNIST). We also introduce and evaluate the Pacific Northwest Windstorm (PNW)-Typhoon weather satellite dataset to verify the effectiveness of DGDM in high-resolution regional forecasting. As a result of our experiments, DGDM achieves state-of-the-art results not only in global forecasting but also in regional forecasting. The code is available at: \url{https://github.com/DongGeun-Yoon/DGDM}.
△ Less
Submitted 5 December, 2023;
originally announced December 2023.
-
MiCRO: Near-Zero Cost Gradient Sparsification for Scaling and Accelerating Distributed DNN Training
Authors:
Daegun Yoon,
Sangyoon Oh
Abstract:
Gradient sparsification is a communication optimisation technique for scaling and accelerating distributed deep neural network (DNN) training. It reduces the increasing communication traffic for gradient aggregation. However, existing sparsifiers have poor scalability because of the high computational cost of gradient selection and/or increase in communication traffic. In particular, an increase i…
▽ More
Gradient sparsification is a communication optimisation technique for scaling and accelerating distributed deep neural network (DNN) training. It reduces the increasing communication traffic for gradient aggregation. However, existing sparsifiers have poor scalability because of the high computational cost of gradient selection and/or increase in communication traffic. In particular, an increase in communication traffic is caused by gradient build-up and inappropriate threshold for gradient selection.
To address these challenges, we propose a novel gradient sparsification method called MiCRO. In MiCRO, the gradient vector is partitioned, and each partition is assigned to the corresponding worker. Each worker then selects gradients from its partition, and the aggregated gradients are free from gradient build-up. Moreover, MiCRO estimates the accurate threshold to maintain the communication traffic as per user requirement by minimising the compression ratio error. MiCRO enables near-zero cost gradient sparsification by solving existing problems that hinder the scalability and acceleration of distributed DNN training. In our extensive experiments, MiCRO outperformed state-of-the-art sparsifiers with an outstanding convergence rate.
△ Less
Submitted 20 February, 2024; v1 submitted 2 October, 2023;
originally announced October 2023.
-
PDFTriage: Question Answering over Long, Structured Documents
Authors:
Jon Saad-Falcon,
Joe Barrow,
Alexa Siu,
Ani Nenkova,
David Seunghyun Yoon,
Ryan A. Rossi,
Franck Dernoncourt
Abstract:
Large Language Models (LLMs) have issues with document question answering (QA) in situations where the document is unable to fit in the small context length of an LLM. To overcome this issue, most existing works focus on retrieving the relevant context from the document, representing them as plain text. However, documents such as PDFs, web pages, and presentations are naturally structured with dif…
▽ More
Large Language Models (LLMs) have issues with document question answering (QA) in situations where the document is unable to fit in the small context length of an LLM. To overcome this issue, most existing works focus on retrieving the relevant context from the document, representing them as plain text. However, documents such as PDFs, web pages, and presentations are naturally structured with different pages, tables, sections, and so on. Representing such structured documents as plain text is incongruous with the user's mental model of these documents with rich structure. When a system has to query the document for context, this incongruity is brought to the fore, and seemingly trivial questions can trip up the QA system. To bridge this fundamental gap in handling structured documents, we propose an approach called PDFTriage that enables models to retrieve the context based on either structure or content. Our experiments demonstrate the effectiveness of the proposed PDFTriage-augmented models across several classes of questions where existing retrieval-augmented LLMs fail. To facilitate further research on this fundamental problem, we release our benchmark dataset consisting of 900+ human-generated questions over 80 structured documents from 10 different categories of question types for document QA. Our code and datasets will be released soon on Github.
△ Less
Submitted 8 November, 2023; v1 submitted 16 September, 2023;
originally announced September 2023.
-
DEFT: Exploiting Gradient Norm Difference between Model Layers for Scalable Gradient Sparsification
Authors:
Daegun Yoon,
Sangyoon Oh
Abstract:
Gradient sparsification is a widely adopted solution for reducing the excessive communication traffic in distributed deep learning. However, most existing gradient sparsifiers have relatively poor scalability because of considerable computational cost of gradient selection and/or increased communication traffic owing to gradient build-up. To address these challenges, we propose a novel gradient sp…
▽ More
Gradient sparsification is a widely adopted solution for reducing the excessive communication traffic in distributed deep learning. However, most existing gradient sparsifiers have relatively poor scalability because of considerable computational cost of gradient selection and/or increased communication traffic owing to gradient build-up. To address these challenges, we propose a novel gradient sparsification scheme, DEFT, that partitions the gradient selection task into sub tasks and distributes them to workers. DEFT differs from existing sparsifiers, wherein every worker selects gradients among all gradients. Consequently, the computational cost can be reduced as the number of workers increases. Moreover, gradient build-up can be eliminated because DEFT allows workers to select gradients in partitions that are non-intersecting (between workers). Therefore, even if the number of workers increases, the communication traffic can be maintained as per user requirement.
To avoid the loss of significance of gradient selection, DEFT selects more gradients in the layers that have a larger gradient norm than the other layers. Because every layer has a different computational load, DEFT allocates layers to workers using a bin-packing algorithm to maintain a balanced load of gradient selection between workers. In our empirical evaluation, DEFT shows a significant improvement in training performance in terms of speed in gradient selection over existing sparsifiers while achieving high convergence performance.
△ Less
Submitted 13 July, 2023; v1 submitted 7 July, 2023;
originally announced July 2023.
-
Gradient Ascent Post-training Enhances Language Model Generalization
Authors:
Dongkeun Yoon,
Joel Jang,
Sungdong Kim,
Minjoon Seo
Abstract:
In this work, we empirically show that updating pretrained LMs (350M, 1.3B, 2.7B) with just a few steps of Gradient Ascent Post-training (GAP) on random, unlabeled text corpora enhances its zero-shot generalization capabilities across diverse NLP tasks. Specifically, we show that GAP can allow LMs to become comparable to 2-3x times larger LMs across 12 different NLP tasks. We also show that applyi…
▽ More
In this work, we empirically show that updating pretrained LMs (350M, 1.3B, 2.7B) with just a few steps of Gradient Ascent Post-training (GAP) on random, unlabeled text corpora enhances its zero-shot generalization capabilities across diverse NLP tasks. Specifically, we show that GAP can allow LMs to become comparable to 2-3x times larger LMs across 12 different NLP tasks. We also show that applying GAP on out-of-distribution corpora leads to the most reliable performance improvements. Our findings indicate that GAP can be a promising method for improving the generalization capability of LMs without any task-specific fine-tuning.
△ Less
Submitted 12 June, 2023;
originally announced June 2023.
-
Maximum-Width Rainbow-Bisecting Empty Annulus
Authors:
Sang Won Bae,
Sandip Banerjee,
Arpita Baral,
Priya Ranjan Sinha Mahapatra,
Sang Duk Yoon
Abstract:
Given a set of $n$ colored points with $k$ colors in the plane, we study the problem of computing a maximum-width rainbow-bisecting empty annulus (of objects specifically axis-parallel square, axis-parallel rectangle and circle) problem. We call a region rainbow if it contains at least one point of each color. The maximum-width rainbow-bisecting empty annulus problem asks to find an annulus $A$ of…
▽ More
Given a set of $n$ colored points with $k$ colors in the plane, we study the problem of computing a maximum-width rainbow-bisecting empty annulus (of objects specifically axis-parallel square, axis-parallel rectangle and circle) problem. We call a region rainbow if it contains at least one point of each color. The maximum-width rainbow-bisecting empty annulus problem asks to find an annulus $A$ of a particular shape with maximum possible width such that $A$ does not contain any input points and it bisects the input point set into two parts, each of which is a rainbow. We compute a maximum-width rainbow-bisecting empty axis-parallel square, axis-parallel rectangular and circular annulus in $O(n^3)$ time using $O(n)$ space, in $O(k^2n^2\log n)$ time using $O(n\log n)$ space and in $O(n^3)$ time using $O(n^2)$ space respectively.
△ Less
Submitted 26 March, 2024; v1 submitted 16 May, 2023;
originally announced May 2023.
-
PROBE3.0: A Systematic Framework for Design-Technology Pathfinding with Improved Design Enablement
Authors:
Suhyeong Choi,
Jinwook Jung,
Andrew B. Kahng,
Minsoo Kim,
Chul-Hong Park,
Bodhisatta Pramanik,
Dooseok Yoon
Abstract:
We propose a systematic framework to conduct design-technology pathfinding for PPAC in advanced nodes. Our goal is to provide configurable, scalable generation of process design kit (PDK) and standard-cell library, spanning key scaling boosters (backside PDN and buried power rail), to explore PPAC across given technology and design parameters. We build on PROBE2.0, which addressed only area and co…
▽ More
We propose a systematic framework to conduct design-technology pathfinding for PPAC in advanced nodes. Our goal is to provide configurable, scalable generation of process design kit (PDK) and standard-cell library, spanning key scaling boosters (backside PDN and buried power rail), to explore PPAC across given technology and design parameters. We build on PROBE2.0, which addressed only area and cost (AC), to include power and performance (PP) evaluations through automated generation of full design enablements. We also improve the use of artificial designs in the PPAC assessment of technology and design configurations. We generate more realistic artificial designs by applying a machine learning-based parameter tuning flow. We further employ clustering-based cell width-regularized placements at the core of routability assessment, enabling more realistic placement utilization and improved experimental efficiency. We demonstrate PPAC evaluation across scaling boosters and artificial designs in a predictive technology node.
△ Less
Submitted 25 April, 2023;
originally announced April 2023.
-
RPLKG: Robust Prompt Learning with Knowledge Graph
Authors:
YongTaek Lim,
Yewon Kim,
Suho Kang,
Dokyung Yoon,
KyungWoo Song
Abstract:
Large-scale pre-trained models surpass in transferability and robust generalization across diverse datasets. The emergence of multimodal pre-trained models like CLIP has significantly boosted performance in various experiments. However, generalizing to new datasets or domains remains challenging, especially with limited labeled data. Also, existing methods often lack interpretability and impose hi…
▽ More
Large-scale pre-trained models surpass in transferability and robust generalization across diverse datasets. The emergence of multimodal pre-trained models like CLIP has significantly boosted performance in various experiments. However, generalizing to new datasets or domains remains challenging, especially with limited labeled data. Also, existing methods often lack interpretability and impose high computational costs. To address this, we propose Robust Prompt Learning with Knowledge Graph (RPLKG), leveraging the knowledge graph to curate diverse, interpretable prompt sets automatically. Our method autonomously selects the optimal interpretable prompt based on dataset characteristics, achieving performance improvements over zero-shot learning and competitive performance compared to various prompt learning methods. Also, RPLKG efficiently reuses cached prompt embeddings from a single model pass and optimizes prompt selection via Gumbel-Softmax, enabling low-memory, fast training. Moreover, RPLKG advances few-shot learning effectiveness while enhancing interpretability and efficiency in model adaptation. Our
△ Less
Submitted 21 June, 2025; v1 submitted 21 April, 2023;
originally announced April 2023.
-
Rethinking Evaluation Protocols of Visual Representations Learned via Self-supervised Learning
Authors:
Jae-Hun Lee,
Doyoung Yoon,
ByeongMoon Ji,
Kyungyul Kim,
Sangheum Hwang
Abstract:
Linear probing (LP) (and $k$-NN) on the upstream dataset with labels (e.g., ImageNet) and transfer learning (TL) to various downstream datasets are commonly employed to evaluate the quality of visual representations learned via self-supervised learning (SSL). Although existing SSL methods have shown good performances under those evaluation protocols, we observe that the performances are very sensi…
▽ More
Linear probing (LP) (and $k$-NN) on the upstream dataset with labels (e.g., ImageNet) and transfer learning (TL) to various downstream datasets are commonly employed to evaluate the quality of visual representations learned via self-supervised learning (SSL). Although existing SSL methods have shown good performances under those evaluation protocols, we observe that the performances are very sensitive to the hyperparameters involved in LP and TL. We argue that this is an undesirable behavior since truly generic representations should be easily adapted to any other visual recognition task, i.e., the learned representations should be robust to the settings of LP and TL hyperparameters. In this work, we try to figure out the cause of performance sensitivity by conducting extensive experiments with state-of-the-art SSL methods. First, we find that input normalization for LP is crucial to eliminate performance variations according to the hyperparameters. Specifically, batch normalization before feeding inputs to a linear classifier considerably improves the stability of evaluation, and also resolves inconsistency of $k$-NN and LP metrics. Second, for TL, we demonstrate that a weight decay parameter in SSL significantly affects the transferability of learned representations, which cannot be identified by LP or $k$-NN evaluations on the upstream dataset. We believe that the findings of this study will be beneficial for the community by drawing attention to the shortcomings in the current SSL evaluation schemes and underscoring the need to reconsider them.
△ Less
Submitted 6 April, 2023;
originally announced April 2023.
-
Need for Speed: Fast Correspondence-Free Lidar-Inertial Odometry Using Doppler Velocity
Authors:
David J. Yoon,
Keenan Burnett,
Johann Laconte,
Yi Chen,
Heethesh Vhavle,
Soeren Kammel,
James Reuther,
Timothy D. Barfoot
Abstract:
In this paper, we present a fast, lightweight odometry method that uses the Doppler velocity measurements from a Frequency-Modulated Continuous-Wave (FMCW) lidar without data association. FMCW lidar is a recently emerging technology that enables per-return relative radial velocity measurements via the Doppler effect. Since the Doppler measurement model is linear with respect to the 6-degrees-of-fr…
▽ More
In this paper, we present a fast, lightweight odometry method that uses the Doppler velocity measurements from a Frequency-Modulated Continuous-Wave (FMCW) lidar without data association. FMCW lidar is a recently emerging technology that enables per-return relative radial velocity measurements via the Doppler effect. Since the Doppler measurement model is linear with respect to the 6-degrees-of-freedom (DOF) vehicle velocity, we can formulate a linear continuous-time estimation problem for the velocity and numerically integrate for the 6-DOF pose estimate afterward. The caveat is that angular velocity is not observable with a single FMCW lidar. We address this limitation by also incorporating the angular velocity measurements from a gyroscope. This results in an extremely efficient odometry method that processes lidar frames at an average wall-clock time of 5.64ms on a single thread, well below the 10Hz operating rate of the lidar we tested. We show experimental results on real-world driving sequences and compare against state-of-the-art Iterative Closest Point (ICP)-based odometry methods, presenting a compelling trade-off between accuracy and computation. We also present an algebraic observability study, where we demonstrate in theory that the Doppler measurements from multiple FMCW lidars are capable of observing all 6 degrees of freedom (translational and angular velocity).
△ Less
Submitted 29 September, 2023; v1 submitted 11 March, 2023;
originally announced March 2023.
-
Towards Consistent Batch State Estimation Using a Time-Correlated Measurement Noise Model
Authors:
David J. Yoon,
Timothy D. Barfoot
Abstract:
In this paper, we present an algorithm for learning time-correlated measurement covariances for application in batch state estimation. We parameterize the inverse measurement covariance matrix to be block-banded, which conveniently factorizes and results in a computationally efficient approach for correlating measurements across the entire trajectory. We train our covariance model through supervis…
▽ More
In this paper, we present an algorithm for learning time-correlated measurement covariances for application in batch state estimation. We parameterize the inverse measurement covariance matrix to be block-banded, which conveniently factorizes and results in a computationally efficient approach for correlating measurements across the entire trajectory. We train our covariance model through supervised learning using the groundtruth trajectory. In applications where the measurements are time-correlated, we demonstrate improved performance in both the mean posterior estimate and the covariance (i.e., improved estimator consistency). We use an experimental dataset collected using a mobile robot equipped with a laser rangefinder to demonstrate the improvement in performance. We also verify estimator consistency in a controlled simulation using a statistical test over several trials.
△ Less
Submitted 11 March, 2023;
originally announced March 2023.
-
DIFAI: Diverse Facial Inpainting using StyleGAN Inversion
Authors:
Dongsik Yoon,
Jeong-gi Kwak,
Yuanming Li,
David Han,
Hanseok Ko
Abstract:
Image inpainting is an old problem in computer vision that restores occluded regions and completes damaged images. In the case of facial image inpainting, most of the methods generate only one result for each masked image, even though there are other reasonable possibilities. To prevent any potential biases and unnatural constraints stemming from generating only one image, we propose a novel frame…
▽ More
Image inpainting is an old problem in computer vision that restores occluded regions and completes damaged images. In the case of facial image inpainting, most of the methods generate only one result for each masked image, even though there are other reasonable possibilities. To prevent any potential biases and unnatural constraints stemming from generating only one image, we propose a novel framework for diverse facial inpainting exploiting the embedding space of StyleGAN. Our framework employs pSp encoder and SeFa algorithm to identify semantic components of the StyleGAN embeddings and feed them into our proposed SPARN decoder that adopts region normalization for plausible inpainting. We demonstrate that our proposed method outperforms several state-of-the-art methods.
△ Less
Submitted 20 January, 2023;
originally announced January 2023.
-
Reference Guided Image Inpainting using Facial Attributes
Authors:
Dongsik Yoon,
Jeonggi Kwak,
Yuanming Li,
David Han,
Youngsaeng Jin,
Hanseok Ko
Abstract:
Image inpainting is a technique of completing missing pixels such as occluded region restoration, distracting objects removal, and facial completion. Among these inpainting tasks, facial completion algorithm performs face inpainting according to the user direction. Existing approaches require delicate and well controlled input by the user, thus it is difficult for an average user to provide the gu…
▽ More
Image inpainting is a technique of completing missing pixels such as occluded region restoration, distracting objects removal, and facial completion. Among these inpainting tasks, facial completion algorithm performs face inpainting according to the user direction. Existing approaches require delicate and well controlled input by the user, thus it is difficult for an average user to provide the guidance sufficiently accurate for the algorithm to generate desired results. To overcome this limitation, we propose an alternative user-guided inpainting architecture that manipulates facial attributes using a single reference image as the guide. Our end-to-end model consists of attribute extractors for accurate reference image attribute transfer and an inpainting model to map the attributes realistically and accurately to generated images. We customize MS-SSIM loss and learnable bidirectional attention maps in which importance structures remain intact even with irregular shaped masks. Based on our evaluation using the publicly available dataset CelebA-HQ, we demonstrate that the proposed method delivers superior performance compared to some state-of-the-art methods specialized in inpainting tasks.
△ Less
Submitted 19 January, 2023;
originally announced January 2023.
-
Graphlets over Time: A New Lens for Temporal Network Analysis
Authors:
Deukryeol Yoon,
Dongjin Lee,
Minyoung Choe,
Kijung Shin
Abstract:
Graphs are widely used for modeling various types of interactions, such as email communications and online discussions. Many of such real-world graphs are temporal, and specifically, they grow over time with new nodes and edges.
Counting the instances of each graphlet (i.e., an induced subgraph isomorphism class) has been successful in characterizing local structures of graphs, with many applica…
▽ More
Graphs are widely used for modeling various types of interactions, such as email communications and online discussions. Many of such real-world graphs are temporal, and specifically, they grow over time with new nodes and edges.
Counting the instances of each graphlet (i.e., an induced subgraph isomorphism class) has been successful in characterizing local structures of graphs, with many applications. While graphlets have been extended for temporal graphs, the extensions are designed for examining temporally-local subgraphs composed of edges with close arrival times, instead of long-term changes in local structures.
In this paper, as a new lens for temporal graph analysis, we study the evolution of distributions of graphlet instances over time in real-world graphs at three different levels (graphs, nodes, and edges). At the graph level, we first discover that the evolution patterns are significantly different from those in random graphs. Then, we suggest a graphlet transition graph for measuring the similarity of the evolution patterns of graphs, and we find out a surprising similarity between the graphs from the same domain. At the node and edge levels, we demonstrate that the local structures around nodes and edges in their early stage provide a strong signal regarding their future importance. In particular, we significantly improve the predictability of the future importance of nodes and edges using the counts of the roles (a.k.a., orbits) that they take within graphlets.
△ Less
Submitted 3 January, 2023; v1 submitted 31 December, 2022;
originally announced January 2023.
-
Region-Conditioned Orthogonal 3D U-Net for Weather4Cast Competition
Authors:
Taehyeon Kim,
Shinhwan Kang,
Hyeonjeong Shin,
Deukryeol Yoon,
Seongha Eom,
Kijung Shin,
Se-Young Yun
Abstract:
The Weather4Cast competition (hosted by NeurIPS 2022) required competitors to predict super-resolution rain movies in various regions of Europe when low-resolution satellite contexts covering wider regions are given. In this paper, we show that a general baseline 3D U-Net can be significantly improved with region-conditioned layers as well as orthogonality regularizations on 1x1x1 convolutional la…
▽ More
The Weather4Cast competition (hosted by NeurIPS 2022) required competitors to predict super-resolution rain movies in various regions of Europe when low-resolution satellite contexts covering wider regions are given. In this paper, we show that a general baseline 3D U-Net can be significantly improved with region-conditioned layers as well as orthogonality regularizations on 1x1x1 convolutional layers. Additionally, we facilitate the generalization with a bag of training strategies: mixup data augmentation, self-distillation, and feature-wise linear modulation (FiLM). Presented modifications outperform the baseline algorithms (3D U-Net) by up to 19.54% with less than 1% additional parameters, which won the 4th place in the core test leaderboard.
△ Less
Submitted 5 December, 2022;
originally announced December 2022.
-
Universal convex covering problems under translation and discrete rotations
Authors:
Mook Kwon Jung,
Sang Duk Yoon,
Hee-Kap Ahn,
Takeshi Tokuyama
Abstract:
We consider the smallest-area universal covering of planar objects of perimeter 2 (or equivalently closed curves of length 2) allowing translation and discrete rotations. In particular, we show that the solution is an equilateral triangle of height 1 when translation and discrete rotation of $π$ are allowed. Our proof is purely geometric and elementary. We also give convex coverings of closed curv…
▽ More
We consider the smallest-area universal covering of planar objects of perimeter 2 (or equivalently closed curves of length 2) allowing translation and discrete rotations. In particular, we show that the solution is an equilateral triangle of height 1 when translation and discrete rotation of $π$ are allowed. Our proof is purely geometric and elementary. We also give convex coverings of closed curves of length 2 under translation and discrete rotations of multiples of $π/2$ and $2π/3$. We show a minimality of the covering for discrete rotation of multiples of $π/2$, which is an equilateral triangle of height smaller than 1, and conjecture that the covering is the smallest-area convex covering. Finally, we give the smallest-area convex coverings of all unit segments under translation and discrete rotations $2π/k$ for all integers $k\ge 3$.
△ Less
Submitted 27 November, 2022;
originally announced November 2022.
-
Lightweight Alpha Matting Network Using Distillation-Based Channel Pruning
Authors:
Donggeun Yoon,
Jinsun Park,
Donghyeon Cho
Abstract:
Recently, alpha matting has received a lot of attention because of its usefulness in mobile applications such as selfies. Therefore, there has been a demand for a lightweight alpha matting model due to the limited computational resources of commercial portable devices. To this end, we suggest a distillation-based channel pruning method for the alpha matting networks. In the pruning step, we remove…
▽ More
Recently, alpha matting has received a lot of attention because of its usefulness in mobile applications such as selfies. Therefore, there has been a demand for a lightweight alpha matting model due to the limited computational resources of commercial portable devices. To this end, we suggest a distillation-based channel pruning method for the alpha matting networks. In the pruning step, we remove channels of a student network having fewer impacts on mimicking the knowledge of a teacher network. Then, the pruned lightweight student network is trained by the same distillation loss. A lightweight alpha matting model from the proposed method outperforms existing lightweight methods. To show superiority of our algorithm, we provide various quantitative and qualitative experiments with in-depth analyses. Furthermore, we demonstrate the versatility of the proposed distillation-based channel pruning method by applying it to semantic segmentation.
△ Less
Submitted 14 October, 2022;
originally announced October 2022.
-
Knowledge Unlearning for Mitigating Privacy Risks in Language Models
Authors:
Joel Jang,
Dongkeun Yoon,
Sohee Yang,
Sungmin Cha,
Moontae Lee,
Lajanugen Logeswaran,
Minjoon Seo
Abstract:
Pretrained Language Models (LMs) memorize a vast amount of knowledge during initial pretraining, including information that may violate the privacy of personal lives and identities. Previous work addressing privacy issues for language models has mostly focused on data preprocessing and differential privacy methods, both requiring re-training the underlying LM. We propose knowledge unlearning as an…
▽ More
Pretrained Language Models (LMs) memorize a vast amount of knowledge during initial pretraining, including information that may violate the privacy of personal lives and identities. Previous work addressing privacy issues for language models has mostly focused on data preprocessing and differential privacy methods, both requiring re-training the underlying LM. We propose knowledge unlearning as an alternative method to reduce privacy risks for LMs post hoc. We show that simply performing gradient ascent on target token sequences is effective at forgetting them with little to no degradation of general language modeling performances for larger LMs; it sometimes even substantially improves the underlying LM with just a few iterations. We also find that sequential unlearning is better than trying to unlearn all the data at once and that unlearning is highly dependent on which kind of data (domain) is forgotten. By showing comparisons with a previous data preprocessing method and a decoding method known to mitigate privacy risks for LMs, we show that unlearning can give a stronger empirical privacy guarantee in scenarios where the data vulnerable to extraction attacks are known a priori while being much more efficient and robust. We release the code and dataset needed to replicate our results at https://github.com/joeljang/knowledge-unlearning.
△ Less
Submitted 19 December, 2022; v1 submitted 4 October, 2022;
originally announced October 2022.
-
Empirical Analysis on Top-k Gradient Sparsification for Distributed Deep Learning in a Supercomputing Environment
Authors:
Daegun Yoon,
Sangyoon Oh
Abstract:
To train deep learning models faster, distributed training on multiple GPUs is the very popular scheme in recent years. However, the communication bandwidth is still a major bottleneck of training performance. To improve overall training performance, recent works have proposed gradient sparsification methods that reduce the communication traffic significantly. Most of them require gradient sorting…
▽ More
To train deep learning models faster, distributed training on multiple GPUs is the very popular scheme in recent years. However, the communication bandwidth is still a major bottleneck of training performance. To improve overall training performance, recent works have proposed gradient sparsification methods that reduce the communication traffic significantly. Most of them require gradient sorting to select meaningful gradients such as Top-k gradient sparsification (Top-k SGD). However, Top-k SGD has a limit to increase the speed up overall training performance because gradient sorting is significantly inefficient on GPUs. In this paper, we conduct experiments that show the inefficiency of Top-k SGD and provide the insight of the low performance. Based on observations from our empirical analysis, we plan to yield a high performance gradient sparsification method as a future work.
△ Less
Submitted 18 September, 2022;
originally announced September 2022.
-
Picking Up Speed: Continuous-Time Lidar-Only Odometry using Doppler Velocity Measurements
Authors:
Yuchen Wu,
David J. Yoon,
Keenan Burnett,
Soeren Kammel,
Yi Chen,
Heethesh Vhavle,
Timothy D. Barfoot
Abstract:
Frequency-Modulated Continuous-Wave (FMCW) lidar is a recently emerging technology that additionally enables per-return instantaneous relative radial velocity measurements via the Doppler effect. In this letter, we present the first continuous-time lidar-only odometry algorithm using these Doppler velocity measurements from an FMCW lidar to aid odometry in geometrically degenerate environments. We…
▽ More
Frequency-Modulated Continuous-Wave (FMCW) lidar is a recently emerging technology that additionally enables per-return instantaneous relative radial velocity measurements via the Doppler effect. In this letter, we present the first continuous-time lidar-only odometry algorithm using these Doppler velocity measurements from an FMCW lidar to aid odometry in geometrically degenerate environments. We apply an existing continuous-time framework that efficiently estimates the vehicle trajectory using Gaussian process regression to compensate for motion distortion due to the scanning-while-moving nature of any mechanically actuated lidar (FMCW and non-FMCW). We evaluate our proposed algorithm on several real-world datasets, including publicly available ones and datasets we collected. Our algorithm outperforms the only existing method that also uses Doppler velocity measurements, and we study difficult conditions where including this extra information greatly improves performance. We additionally demonstrate state-of-the-art performance of lidar-only odometry with and without using Doppler velocity measurements in nominal conditions. Code for this project can be found at: https://github.com/utiasASRL/steam_icp.
△ Less
Submitted 3 December, 2022; v1 submitted 7 September, 2022;
originally announced September 2022.
-
DiVa: An Accelerator for Differentially Private Machine Learning
Authors:
Beomsik Park,
Ranggi Hwang,
Dongho Yoon,
Yoonhyuk Choi,
Minsoo Rhu
Abstract:
The widespread deployment of machine learning (ML) is raising serious concerns on protecting the privacy of users who contributed to the collection of training data. Differential privacy (DP) is rapidly gaining momentum in the industry as a practical standard for privacy protection. Despite DP's importance, however, little has been explored within the computer systems community regarding the impli…
▽ More
The widespread deployment of machine learning (ML) is raising serious concerns on protecting the privacy of users who contributed to the collection of training data. Differential privacy (DP) is rapidly gaining momentum in the industry as a practical standard for privacy protection. Despite DP's importance, however, little has been explored within the computer systems community regarding the implication of this emerging ML algorithm on system designs. In this work, we conduct a detailed workload characterization on a state-of-the-art differentially private ML training algorithm named DP-SGD. We uncover several unique properties of DP-SGD (e.g., its high memory capacity and computation requirements vs. non-private ML), root-causing its key bottlenecks. Based on our analysis, we propose an accelerator for differentially private ML named DiVa, which provides a significant improvement in compute utilization, leading to 2.6x higher energy-efficiency vs. conventional systolic arrays.
△ Less
Submitted 25 August, 2022;
originally announced August 2022.
-
Injecting 3D Perception of Controllable NeRF-GAN into StyleGAN for Editable Portrait Image Synthesis
Authors:
Jeong-gi Kwak,
Yuanming Li,
Dongsik Yoon,
Donghyeon Kim,
David Han,
Hanseok Ko
Abstract:
Over the years, 2D GANs have achieved great successes in photorealistic portrait generation. However, they lack 3D understanding in the generation process, thus they suffer from multi-view inconsistency problem. To alleviate the issue, many 3D-aware GANs have been proposed and shown notable results, but 3D GANs struggle with editing semantic attributes. The controllability and interpretability of…
▽ More
Over the years, 2D GANs have achieved great successes in photorealistic portrait generation. However, they lack 3D understanding in the generation process, thus they suffer from multi-view inconsistency problem. To alleviate the issue, many 3D-aware GANs have been proposed and shown notable results, but 3D GANs struggle with editing semantic attributes. The controllability and interpretability of 3D GANs have not been much explored. In this work, we propose two solutions to overcome these weaknesses of 2D GANs and 3D-aware GANs. We first introduce a novel 3D-aware GAN, SURF-GAN, which is capable of discovering semantic attributes during training and controlling them in an unsupervised manner. After that, we inject the prior of SURF-GAN into StyleGAN to obtain a high-fidelity 3D-controllable generator. Unlike existing latent-based methods allowing implicit pose control, the proposed 3D-controllable StyleGAN enables explicit pose control over portrait generation. This distillation allows direct compatibility between 3D control and many StyleGAN-based techniques (e.g., inversion and stylization), and also brings an advantage in terms of computational resources. Our codes are available at https://github.com/jgkwak95/SURF-GAN.
△ Less
Submitted 26 July, 2022; v1 submitted 20 July, 2022;
originally announced July 2022.
-
Interaction Design for VR Applications: Understanding Needs for University Curricula
Authors:
Oloff C. Biermann,
Daniel Ajisafe,
Dongwook Yoon
Abstract:
As virtual reality (VR) is emerging in the tech sector, developers and designers are under pressure to create immersive experiences for their products. However, the current curricula from top institutions focus primarily on technical considerations for building VR applications, missing out on concerns and usability problems specific to VR interaction design. To better understand current needs, we…
▽ More
As virtual reality (VR) is emerging in the tech sector, developers and designers are under pressure to create immersive experiences for their products. However, the current curricula from top institutions focus primarily on technical considerations for building VR applications, missing out on concerns and usability problems specific to VR interaction design. To better understand current needs, we examined the status quo of existing university pedagogies by carrying out a content analysis of undergraduate and graduate courses about VR and related areas offered in the major citadels of learning and conducting interviews with 7 industry experts. Our analysis reveals that the current teaching practices underemphasize design thinking, prototyping, and evaluation skills, while focusing on technical implementation. We recommend VR curricula should emphasize design principles and guidelines, offer training in prototyping and ideation, prioritize practical design exercises while providing industry insights, and encourage students to solve VR design problems beyond the classroom.
△ Less
Submitted 9 June, 2022;
originally announced June 2022.