-
WriteViT: Handwritten Text Generation with Vision Transformer
Authors:
Dang Hoai Nam,
Huynh Tong Dang Khoa,
Vo Nguyen Le Duy
Abstract:
Humans can quickly generalize handwriting styles from a single example by intuitively separating content from style. Machines, however, struggle with this task, especially in low-data settings, often missing subtle spatial and stylistic cues. Motivated by this gap, we introduce WriteViT, a one-shot handwritten text synthesis framework that incorporates Vision Transformers (ViT), a family of models…
▽ More
Humans can quickly generalize handwriting styles from a single example by intuitively separating content from style. Machines, however, struggle with this task, especially in low-data settings, often missing subtle spatial and stylistic cues. Motivated by this gap, we introduce WriteViT, a one-shot handwritten text synthesis framework that incorporates Vision Transformers (ViT), a family of models that have shown strong performance across various computer vision tasks. WriteViT integrates a ViT-based Writer Identifier for extracting style embeddings, a multi-scale generator built with Transformer encoder-decoder blocks enhanced by conditional positional encoding (CPE), and a lightweight ViT-based recognizer. While previous methods typically rely on CNNs or CRNNs, our design leverages transformers in key components to better capture both fine-grained stroke details and higher-level style information. Although handwritten text synthesis has been widely explored, its application to Vietnamese -- a language rich in diacritics and complex typography -- remains limited. Experiments on Vietnamese and English datasets demonstrate that WriteViT produces high-quality, style-consistent handwriting while maintaining strong recognition performance in low-resource scenarios. These results highlight the promise of transformer-based designs for multilingual handwriting generation and efficient style adaptation.
△ Less
Submitted 19 May, 2025;
originally announced May 2025.
-
Prompting LLMs for Code Editing: Struggles and Remedies
Authors:
Daye Nam,
Ahmed Omran,
Ambar Murillo,
Saksham Thakur,
Abner Araujo,
Marcel Blistein,
Alexander Frömmgen,
Vincent Hellendoorn,
Satish Chandra
Abstract:
Large Language Models (LLMs) are rapidly transforming software engineering, with coding assistants embedded in an IDE becoming increasingly prevalent. While research has focused on improving the tools and understanding developer perceptions, a critical gap exists in understanding how developers actually use these tools in their daily workflows, and, crucially, where they struggle. This paper addre…
▽ More
Large Language Models (LLMs) are rapidly transforming software engineering, with coding assistants embedded in an IDE becoming increasingly prevalent. While research has focused on improving the tools and understanding developer perceptions, a critical gap exists in understanding how developers actually use these tools in their daily workflows, and, crucially, where they struggle. This paper addresses part of this gap through a multi-phased investigation of developer interactions with an LLM-powered code editing and transformation feature, Transform Code, in an IDE widely used at Google. First, we analyze telemetry logs of the feature usage, revealing that frequent re-prompting can be an indicator of developer struggles with using Transform Code. Second, we conduct a qualitative analysis of unsatisfactory requests, identifying five key categories of information often missing from developer prompts. Finally, based on these findings, we propose and evaluate a tool, AutoPrompter, for automatically improving prompts by inferring missing information from the surrounding code context, leading to a 27% improvement in edit correctness on our test set.
△ Less
Submitted 28 April, 2025;
originally announced April 2025.
-
From Prompts to Propositions: A Logic-Based Lens on Student-LLM Interactions
Authors:
Ali Alfageeh,
Sadegh AlMahdi Kazemi Zarkouei,
Daye Nam,
Daniel Prol,
Matin Amoozadeh,
Souti Chattopadhyay,
James Prather,
Paul Denny,
Juho Leinonen,
Michael Hilton,
Sruti Srinivasa Ragavan,
Mohammad Amin Alipour
Abstract:
Background and Context. The increasing integration of large language models (LLMs) in computing education presents an emerging challenge in understanding how students use LLMs and craft prompts to solve computational tasks. Prior research has used both qualitative and quantitative methods to analyze prompting behavior, but these approaches lack scalability or fail to effectively capture the semant…
▽ More
Background and Context. The increasing integration of large language models (LLMs) in computing education presents an emerging challenge in understanding how students use LLMs and craft prompts to solve computational tasks. Prior research has used both qualitative and quantitative methods to analyze prompting behavior, but these approaches lack scalability or fail to effectively capture the semantic evolution of prompts. Objective. In this paper, we investigate whether students prompts can be systematically analyzed using propositional logic constraints. We examine whether this approach can identify patterns in prompt evolution, detect struggling students, and provide insights into effective and ineffective strategies. Method. We introduce Prompt2Constraints, a novel method that translates students prompts into logical constraints. The constraints are able to represent the intent of the prompts in succinct and quantifiable ways. We used this approach to analyze a dataset of 1,872 prompts from 203 students solving introductory programming tasks. Findings. We find that while successful and unsuccessful attempts tend to use a similar number of constraints overall, when students fail, they often modify their prompts more significantly, shifting problem-solving strategies midway. We also identify points where specific interventions could be most helpful to students for refining their prompts. Implications. This work offers a new and scalable way to detect students who struggle in solving natural language programming tasks. This work could be extended to investigate more complex tasks and integrated into programming tools to provide real-time support.
△ Less
Submitted 25 April, 2025;
originally announced April 2025.
-
Kanana: Compute-efficient Bilingual Language Models
Authors:
Kanana LLM Team,
Yunju Bak,
Hojin Lee,
Minho Ryu,
Jiyeon Ham,
Seungjae Jung,
Daniel Wontae Nam,
Taegyeong Eo,
Donghun Lee,
Doohae Jung,
Boseop Kim,
Nayeon Kim,
Jaesun Park,
Hyunho Kim,
Hyunwoong Ko,
Changmin Lee,
Kyoung-Woon On,
Seulye Baeg,
Junrae Cho,
Sunghee Jung,
Jieun Kang,
EungGyun Kim,
Eunhwa Kim,
Byeongil Ko,
Daniel Lee
, et al. (4 additional authors not shown)
Abstract:
We introduce Kanana, a series of bilingual language models that demonstrate exceeding performance in Korean and competitive performance in English. The computational cost of Kanana is significantly lower than that of state-of-the-art models of similar size. The report details the techniques employed during pre-training to achieve compute-efficient yet competitive models, including high quality dat…
▽ More
We introduce Kanana, a series of bilingual language models that demonstrate exceeding performance in Korean and competitive performance in English. The computational cost of Kanana is significantly lower than that of state-of-the-art models of similar size. The report details the techniques employed during pre-training to achieve compute-efficient yet competitive models, including high quality data filtering, staged pre-training, depth up-scaling, and pruning and distillation. Furthermore, the report outlines the methodologies utilized during the post-training of the Kanana models, encompassing supervised fine-tuning and preference optimization, aimed at enhancing their capability for seamless interaction with users. Lastly, the report elaborates on plausible approaches used for language model adaptation to specific scenarios, such as embedding, retrieval augmented generation, and function calling. The Kanana model series spans from 2.1B to 32.5B parameters with 2.1B models (base, instruct, embedding) publicly released to promote research on Korean language models.
△ Less
Submitted 28 February, 2025; v1 submitted 26 February, 2025;
originally announced February 2025.
-
Vision-Ultrasound Robotic System based on Deep Learning for Gas and Arc Hazard Detection in Manufacturing
Authors:
Jin-Hee Lee,
Dahyun Nam,
Robin Inho Kee,
YoungKey Kim,
Seok-Jun Buu
Abstract:
Gas leaks and arc discharges present significant risks in industrial environments, requiring robust detection systems to ensure safety and operational efficiency. Inspired by human protocols that combine visual identification with acoustic verification, this study proposes a deep learning-based robotic system for autonomously detecting and classifying gas leaks and arc discharges in manufacturing…
▽ More
Gas leaks and arc discharges present significant risks in industrial environments, requiring robust detection systems to ensure safety and operational efficiency. Inspired by human protocols that combine visual identification with acoustic verification, this study proposes a deep learning-based robotic system for autonomously detecting and classifying gas leaks and arc discharges in manufacturing settings. The system is designed to execute all experimental tasks entirely onboard the robot. Utilizing a 112-channel acoustic camera operating at a 96 kHz sampling rate to capture ultrasonic frequencies, the system processes real-world datasets recorded in diverse industrial scenarios. These datasets include multiple gas leak configurations (e.g., pinhole, open end) and partial discharge types (Corona, Surface, Floating) under varying environmental noise conditions. Proposed system integrates visual detection and a beamforming-enhanced acoustic analysis pipeline. Signals are transformed using STFT and refined through Gamma Correction, enabling robust feature extraction. An Inception-inspired CNN further classifies hazards, achieving 99% gas leak detection accuracy. The system not only detects individual hazard sources but also enhances classification reliability by fusing multi-modal data from both vision and acoustic sensors. When tested in reverberation and noise-augmented environments, the system outperformed conventional models by up to 44%p, with experimental tasks meticulously designed to ensure fairness and reproducibility. Additionally, the system is optimized for real-time deployment, maintaining an inference time of 2.1 seconds on a mobile robotic platform. By emulating human-like inspection protocols and integrating vision with acoustic modalities, this study presents an effective solution for industrial automation, significantly improving safety and operational reliability.
△ Less
Submitted 8 February, 2025;
originally announced February 2025.
-
How much does AI impact development speed? An enterprise-based randomized controlled trial
Authors:
Elise Paradis,
Kate Grey,
Quinn Madison,
Daye Nam,
Andrew Macvean,
Vahid Meimand,
Nan Zhang,
Ben Ferrari-Church,
Satish Chandra
Abstract:
How much does AI assistance impact developer productivity? To date, the software engineering literature has provided a range of answers, targeting a diversity of outcomes: from perceived productivity to speed on task and developer throughput. Our randomized controlled trial with 96 full-time Google software engineers contributes to this literature by sharing an estimate of the impact of three AI f…
▽ More
How much does AI assistance impact developer productivity? To date, the software engineering literature has provided a range of answers, targeting a diversity of outcomes: from perceived productivity to speed on task and developer throughput. Our randomized controlled trial with 96 full-time Google software engineers contributes to this literature by sharing an estimate of the impact of three AI features on the time developers spent on a complex, enterprise-grade task. We found that AI significantly shortened the time developers spent on task. Our best estimate of the size of this effect, controlling for factors known to influence developer time on task, stands at about 21\%, although our confidence interval is large. We also found an interesting effect whereby developers who spend more hours on code-related activities per day were faster with AI. Product and future research considerations are discussed. In particular, we invite further research that explores the impact of AI at the ecosystem level and across multiple suites of AI-enhanced tools, since we cannot assume that the effect size obtained in our lab study will necessarily apply more broadly, or that the effect of AI found using internal Google tooling in the summer of 2024 will translate across tools and over time.
△ Less
Submitted 11 November, 2024; v1 submitted 16 October, 2024;
originally announced October 2024.
-
Semantic Parsing with Candidate Expressions for Knowledge Base Question Answering
Authors:
Daehwan Nam,
Gary Geunbae Lee
Abstract:
Semantic parsers convert natural language to logical forms, which can be evaluated on knowledge bases (KBs) to produce denotations. Recent semantic parsers have been developed with sequence-to-sequence (seq2seq) pre-trained language models (PLMs) or large language models, where the models treat logical forms as sequences of tokens. For syntactic and semantic validity, the semantic parsers use gram…
▽ More
Semantic parsers convert natural language to logical forms, which can be evaluated on knowledge bases (KBs) to produce denotations. Recent semantic parsers have been developed with sequence-to-sequence (seq2seq) pre-trained language models (PLMs) or large language models, where the models treat logical forms as sequences of tokens. For syntactic and semantic validity, the semantic parsers use grammars that enable constrained decoding. However, the grammars lack the ability to utilize large information of KBs, although logical forms contain representations of KB elements, such as entities or relations. In this work, we propose a grammar augmented with candidate expressions for semantic parsing on a large KB with a seq2seq PLM. The grammar defines actions as production rules, and our semantic parser predicts actions during inference under the constraints by types and candidate expressions. We apply the grammar to knowledge base question answering, where the constraints by candidate expressions assist a semantic parser to generate valid KB elements. We also introduce two special rules, sub-type inference and union types, and a mask caching algorithm. In particular, sub-type inference and the mask caching algorithm greatly increase the decoding speed of our semantic parser. We experimented on two benchmarks, KQA Pro and Overnight, where the constraints by candidate expressions increased the accuracy of our semantic parser, whether it was trained with strong supervision or weak supervision. In addition, our semantic parser had a fast decoding speed in the experiments.
△ Less
Submitted 7 April, 2025; v1 submitted 1 October, 2024;
originally announced October 2024.
-
Deep-learning real-time phase retrieval of imperfect diffraction patterns from X-ray free-electron lasers
Authors:
Sung Yun Lee,
Do Hyung Cho,
Chulho Jung,
Daeho Sung,
Daewoong Nam,
Sangsoo Kim,
Changyong Song
Abstract:
Machine learning is attracting surging interest across nearly all scientific areas by enabling the analysis of large datasets and the extraction of scientific information from incomplete data. Data-driven science is rapidly growing, especially in X-ray methodologies, where advanced light sources and detection technologies accumulate vast amounts of data that exceed meticulous human inspection capa…
▽ More
Machine learning is attracting surging interest across nearly all scientific areas by enabling the analysis of large datasets and the extraction of scientific information from incomplete data. Data-driven science is rapidly growing, especially in X-ray methodologies, where advanced light sources and detection technologies accumulate vast amounts of data that exceed meticulous human inspection capabilities. Despite the increasing demands, the full application of machine learning has been hindered by the need for data-specific optimizations. In this study, we introduce a new deep-learning-based phase retrieval method for imperfect diffraction data. This method provides robust phase retrieval for simulated data and performs well on weak-signal single-pulse diffraction data from X-ray free-electron lasers. Moreover, the method significantly reduces data processing time, facilitating real-time image reconstructions that are crucial for high-repetition-rate data acquisition. Thus, this approach offers a reliable solution to the phase problem and is expected to be widely adopted across various research areas.
△ Less
Submitted 24 September, 2024;
originally announced September 2024.
-
GraspSAM: When Segment Anything Model Meets Grasp Detection
Authors:
Sangjun Noh,
Jongwon Kim,
Dongwoo Nam,
Seunghyeok Back,
Raeyoung Kang,
Kyoobin Lee
Abstract:
Grasp detection requires flexibility to handle objects of various shapes without relying on prior knowledge of the object, while also offering intuitive, user-guided control. This paper introduces GraspSAM, an innovative extension of the Segment Anything Model (SAM), designed for prompt-driven and category-agnostic grasp detection. Unlike previous methods, which are often limited by small-scale tr…
▽ More
Grasp detection requires flexibility to handle objects of various shapes without relying on prior knowledge of the object, while also offering intuitive, user-guided control. This paper introduces GraspSAM, an innovative extension of the Segment Anything Model (SAM), designed for prompt-driven and category-agnostic grasp detection. Unlike previous methods, which are often limited by small-scale training data, GraspSAM leverages the large-scale training and prompt-based segmentation capabilities of SAM to efficiently support both target-object and category-agnostic grasping. By utilizing adapters, learnable token embeddings, and a lightweight modified decoder, GraspSAM requires minimal fine-tuning to integrate object segmentation and grasp prediction into a unified framework. The model achieves state-of-the-art (SOTA) performance across multiple datasets, including Jacquard, Grasp-Anything, and Grasp-Anything++. Extensive experiments demonstrate the flexibility of GraspSAM in handling different types of prompts (such as points, boxes, and language), highlighting its robustness and effectiveness in real-world robotic applications.
△ Less
Submitted 23 September, 2024; v1 submitted 19 September, 2024;
originally announced September 2024.
-
TLCR: Token-Level Continuous Reward for Fine-grained Reinforcement Learning from Human Feedback
Authors:
Eunseop Yoon,
Hee Suk Yoon,
SooHwan Eom,
Gunsoo Han,
Daniel Wontae Nam,
Daejin Jo,
Kyoung-Woon On,
Mark A. Hasegawa-Johnson,
Sungwoong Kim,
Chang D. Yoo
Abstract:
Reinforcement Learning from Human Feedback (RLHF) leverages human preference data to train language models to align more closely with human essence. These human preference data, however, are labeled at the sequence level, creating a mismatch between sequence-level preference labels and tokens, which are autoregressively generated from the language model. Although several recent approaches have tri…
▽ More
Reinforcement Learning from Human Feedback (RLHF) leverages human preference data to train language models to align more closely with human essence. These human preference data, however, are labeled at the sequence level, creating a mismatch between sequence-level preference labels and tokens, which are autoregressively generated from the language model. Although several recent approaches have tried to provide token-level (i.e., dense) rewards for each individual token, these typically rely on predefined discrete reward values (e.g., positive: +1, negative: -1, neutral: 0), failing to account for varying degrees of preference inherent to each token. To address this limitation, we introduce TLCR (Token-Level Continuous Reward) for RLHF, which incorporates a discriminator trained to distinguish positive and negative tokens, and the confidence of the discriminator is used to assign continuous rewards to each token considering the context. Extensive experiments show that our proposed TLCR leads to consistent performance improvements over previous sequence-level or token-level discrete rewards on open-ended generation benchmarks.
△ Less
Submitted 8 December, 2024; v1 submitted 23 July, 2024;
originally announced July 2024.
-
Student-AI Interaction: A Case Study of CS1 students
Authors:
Matin Amoozadeh,
Daye Nam,
Daniel Prol,
Ali Alfageeh,
James Prather,
Michael Hilton,
Sruti Srinivasa Ragavan,
Mohammad Amin Alipour
Abstract:
The new capabilities of generative artificial intelligence tools Generative AI, such as ChatGPT, allow users to interact with the system in intuitive ways, such as simple conversations, and receive (mostly) good-quality answers. These systems can support students' learning objectives by providing accessible explanations and examples even with vague queries. At the same time, they can encourage und…
▽ More
The new capabilities of generative artificial intelligence tools Generative AI, such as ChatGPT, allow users to interact with the system in intuitive ways, such as simple conversations, and receive (mostly) good-quality answers. These systems can support students' learning objectives by providing accessible explanations and examples even with vague queries. At the same time, they can encourage undesired help-seeking behaviors by providing solutions to the students' homework. Therefore, it is important to better understand how students approach such tools and the potential issues such approaches might present for the learners. In this paper, we present a case study for understanding student-AI collaboration to solve programming tasks in the CS1 introductory programming course. To this end, we recruited a gender-balanced majority non-white set of 15 CS1 students at a large public university in the US. We observed them solving programming tasks. We used a mixed-method approach to study their interactions as they tackled Python programming tasks, focusing on when and why they used ChatGPT for problem-solving. We analyze and classify the questions submitted by the 15 participants to ChatGPT. Additionally, we analyzed user interaction patterns, their reactions to ChatGPT's responses, and the potential impacts of Generative AI on their perception of self-efficacy. Our results suggest that in about a third of the cases, the student attempted to complete the task by submitting the full description of the tasks to ChatGPT without making any effort on their own. We also observed that few students verified their solutions. We discuss the results and their potential implications.
△ Less
Submitted 10 October, 2024; v1 submitted 29 June, 2024;
originally announced July 2024.
-
Binary Classifier Optimization for Large Language Model Alignment
Authors:
Seungjae Jung,
Gunsoo Han,
Daniel Wontae Nam,
Kyoung-Woon On
Abstract:
In real-world services such as ChatGPT, aligning models based on user feedback is crucial for improving model performance. However, due to the simplicity and convenience of providing feedback, users typically offer only basic binary signals, such as 'thumbs-up' or 'thumbs-down'. Most existing alignment research, on the other hand, relies on preference-based approaches that require both positive an…
▽ More
In real-world services such as ChatGPT, aligning models based on user feedback is crucial for improving model performance. However, due to the simplicity and convenience of providing feedback, users typically offer only basic binary signals, such as 'thumbs-up' or 'thumbs-down'. Most existing alignment research, on the other hand, relies on preference-based approaches that require both positive and negative responses as a pair. We propose Binary Classifier Optimization (BCO), a technique that effectively aligns LLMs using only binary feedback. BCO trains a binary classifier, where the logit serves as an implicit reward, effectively minimizing the Direct Preference Optimization (DPO) loss. We demonstrate that the binary cross-entropy loss employed in classifier training acts as an upper bound for the DPO loss. Additionally, a novel reward shift technique further minimizes the gap between the losses. We validate our methodology in two settings: first, on a paired preference dataset, where our method performs on par with DPO; and second, on a Likert-5 scale annotation dataset which stems from real users' queries. Our model consistently demonstrates effective and robust alignment across four base LLMs and three different datasets, showcasing the strength of our approach to learning from binary signals.
△ Less
Submitted 9 June, 2025; v1 submitted 6 April, 2024;
originally announced April 2024.
-
Classifying Proposals of Decentralized Autonomous Organizations Using Large Language Models
Authors:
Christian Ziegler,
Marcos Miranda,
Guangye Cao,
Gustav Arentoft,
Doo Wan Nam
Abstract:
Our study demonstrates the effective use of Large Language Models (LLMs) for automating the classification of complex datasets. We specifically target proposals of Decentralized Autonomous Organizations (DAOs), as the clas-sification of this data requires the understanding of context and, therefore, depends on human expertise, leading to high costs associated with the task. The study applies an it…
▽ More
Our study demonstrates the effective use of Large Language Models (LLMs) for automating the classification of complex datasets. We specifically target proposals of Decentralized Autonomous Organizations (DAOs), as the clas-sification of this data requires the understanding of context and, therefore, depends on human expertise, leading to high costs associated with the task. The study applies an iterative approach to specify categories and further re-fine them and the prompt in each iteration, which led to an accuracy rate of 95% in classifying a set of 100 proposals. With this, we demonstrate the po-tential of LLMs to automate data labeling tasks that depend on textual con-text effectively.
△ Less
Submitted 3 July, 2024; v1 submitted 13 January, 2024;
originally announced January 2024.
-
Understanding Documentation Use Through Log Analysis: An Exploratory Case Study of Four Cloud Services
Authors:
Daye Nam,
Andrew Macvean,
Brad Myers,
Bogdan Vasilescu
Abstract:
Almost no modern software system is written from scratch, and developers are required to effectively learn to use third-party libraries or software services. Thus, many practitioners and researchers have looked for ways to create effective documentation that supports developers' learning. However, few efforts have focused on how people actually use the documentation. In this paper, we report on an…
▽ More
Almost no modern software system is written from scratch, and developers are required to effectively learn to use third-party libraries or software services. Thus, many practitioners and researchers have looked for ways to create effective documentation that supports developers' learning. However, few efforts have focused on how people actually use the documentation. In this paper, we report on an exploratory, multi-phase, mixed methods empirical study of documentation page-view logs from four cloud-based industrial services. By analyzing page-view logs for over 100,000 users, we find diverse patterns of documentation page visits. Moreover, we show statistically that which documentation pages people visit often correlates with user characteristics such as past experience with the specific product, on the one hand, and with future adoption of the API on the other hand. We discuss the implications of these results on documentation design and propose documentation page-view log analysis as a feasible technique for design audits of documentation, from ones written for software developers to ones designed to support end users (e.g., Adobe Photoshop).
△ Less
Submitted 29 February, 2024; v1 submitted 16 October, 2023;
originally announced October 2023.
-
Hexa: Self-Improving for Knowledge-Grounded Dialogue System
Authors:
Daejin Jo,
Daniel Wontae Nam,
Gunsoo Han,
Kyoung-Woon On,
Taehwan Kwon,
Seungeun Rho,
Sungwoong Kim
Abstract:
A common practice in knowledge-grounded dialogue generation is to explicitly utilize intermediate steps (e.g., web-search, memory retrieval) with modular approaches. However, data for such steps are often inaccessible compared to those of dialogue responses as they are unobservable in an ordinary dialogue. To fill in the absence of these data, we develop a self-improving method to improve the gene…
▽ More
A common practice in knowledge-grounded dialogue generation is to explicitly utilize intermediate steps (e.g., web-search, memory retrieval) with modular approaches. However, data for such steps are often inaccessible compared to those of dialogue responses as they are unobservable in an ordinary dialogue. To fill in the absence of these data, we develop a self-improving method to improve the generative performances of intermediate steps without the ground truth data. In particular, we propose a novel bootstrapping scheme with a guided prompt and a modified loss function to enhance the diversity of appropriate self-generated responses. Through experiments on various benchmark datasets, we empirically demonstrate that our method successfully leverages a self-improving mechanism in generating intermediate and final responses and improves the performances on the task of knowledge-grounded dialogue generation.
△ Less
Submitted 2 April, 2024; v1 submitted 10 October, 2023;
originally announced October 2023.
-
Trust in Generative AI among students: An Exploratory Study
Authors:
Matin Amoozadeh,
David Daniels,
Daye Nam,
Aayush Kumar,
Stella Chen,
Michael Hilton,
Sruti Srinivasa Ragavan,
Mohammad Amin Alipour
Abstract:
Generative artificial systems (GenAI) have experienced exponential growth in the past couple of years. These systems offer exciting capabilities, such as generating programs, that students can well utilize for their learning. Among many dimensions that might affect the effective adoption of GenAI, in this paper, we investigate students' \textit{trust}. Trust in GenAI influences the extent to which…
▽ More
Generative artificial systems (GenAI) have experienced exponential growth in the past couple of years. These systems offer exciting capabilities, such as generating programs, that students can well utilize for their learning. Among many dimensions that might affect the effective adoption of GenAI, in this paper, we investigate students' \textit{trust}. Trust in GenAI influences the extent to which students adopt GenAI, in turn affecting their learning. In this study, we surveyed 253 students at two large universities to understand how much they trust \genai tools and their feedback on how GenAI impacts their performance in CS courses. Our results show that students have different levels of trust in GenAI. We also observe different levels of confidence and motivation, highlighting the need for further understanding of factors impacting trust.
△ Less
Submitted 1 February, 2024; v1 submitted 6 October, 2023;
originally announced October 2023.
-
Using an LLM to Help With Code Understanding
Authors:
Daye Nam,
Andrew Macvean,
Vincent Hellendoorn,
Bogdan Vasilescu,
Brad Myers
Abstract:
Understanding code is challenging, especially when working in new and complex development environments. Code comments and documentation can help, but are typically scarce or hard to navigate. Large language models (LLMs) are revolutionizing the process of writing code. Can they do the same for helping understand it? In this study, we provide a first investigation of an LLM-based conversational UI…
▽ More
Understanding code is challenging, especially when working in new and complex development environments. Code comments and documentation can help, but are typically scarce or hard to navigate. Large language models (LLMs) are revolutionizing the process of writing code. Can they do the same for helping understand it? In this study, we provide a first investigation of an LLM-based conversational UI built directly in the IDE that is geared towards code understanding. Our IDE plugin queries OpenAI's GPT-3.5-turbo model with four high-level requests without the user having to write explicit prompts: to explain a highlighted section of code, provide details of API calls used in the code, explain key domain-specific terms, and provide usage examples for an API. The plugin also allows for open-ended prompts, which are automatically contextualized to the LLM with the program being edited. We evaluate this system in a user study with 32 participants, which confirms that using our plugin can aid task completion more than web search. We additionally provide a thorough analysis of the ways developers use, and perceive the usefulness of, our system, among others finding that the usage and benefits differ between students and professionals. We conclude that in-IDE prompt-less interaction with LLMs is a promising future direction for tool builders.
△ Less
Submitted 16 January, 2024; v1 submitted 16 July, 2023;
originally announced July 2023.
-
Effortless Integration of Memory Management into Open-Domain Conversation Systems
Authors:
Eunbi Choi,
Kyoung-Woon On,
Gunsoo Han,
Sungwoong Kim,
Daniel Wontae Nam,
Daejin Jo,
Seung Eun Rho,
Taehwan Kwon,
Minjoon Seo
Abstract:
Open-domain conversation systems integrate multiple conversation skills into a single system through a modular approach. One of the limitations of the system, however, is the absence of management capability for external memory. In this paper, we propose a simple method to improve BlenderBot3 by integrating memory management ability into it. Since no training data exists for this purpose, we propo…
▽ More
Open-domain conversation systems integrate multiple conversation skills into a single system through a modular approach. One of the limitations of the system, however, is the absence of management capability for external memory. In this paper, we propose a simple method to improve BlenderBot3 by integrating memory management ability into it. Since no training data exists for this purpose, we propose an automating dataset creation for memory management. Our method 1) requires little cost for data construction, 2) does not affect performance in other tasks, and 3) reduces external memory. We show that our proposed model BlenderBot3-M^3, which is multi-task trained with memory management, outperforms BlenderBot3 with a relative 4% performance gain in terms of F1 score.
△ Less
Submitted 23 May, 2023;
originally announced May 2023.
-
TRACE: Table Reconstruction Aligned to Corner and Edges
Authors:
Youngmin Baek,
Daehyun Nam,
Jaeheung Surh,
Seung Shin,
Seonghyeon Kim
Abstract:
A table is an object that captures structured and informative content within a document, and recognizing a table in an image is challenging due to the complexity and variety of table layouts. Many previous works typically adopt a two-stage approach; (1) Table detection(TD) localizes the table region in an image and (2) Table Structure Recognition(TSR) identifies row- and column-wise adjacency rela…
▽ More
A table is an object that captures structured and informative content within a document, and recognizing a table in an image is challenging due to the complexity and variety of table layouts. Many previous works typically adopt a two-stage approach; (1) Table detection(TD) localizes the table region in an image and (2) Table Structure Recognition(TSR) identifies row- and column-wise adjacency relations between the cells. The use of a two-stage approach often entails the consequences of error propagation between the modules and raises training and inference inefficiency. In this work, we analyze the natural characteristics of a table, where a table is composed of cells and each cell is made up of borders consisting of edges. We propose a novel method to reconstruct the table in a bottom-up manner. Through a simple process, the proposed method separates cell boundaries from low-level features, such as corners and edges, and localizes table positions by combining the cells. A simple design makes the model easier to train and requires less computation than previous two-stage methods. We achieve state-of-the-art performance on the ICDAR2013 table competition benchmark and Wired Table in the Wild(WTW) dataset.
△ Less
Submitted 30 April, 2023;
originally announced May 2023.
-
Detecting Pump&Dump Stock Market Manipulation from Online Forums
Authors:
D. Nam,
D. B. Skillicorn
Abstract:
The intersection of social media, low-cost trading platforms, and naive investors has created an ideal situation for information-based market manipulations, especially pump&dumps. Manipulators accumulate small-cap stocks, disseminate false information on social media to inflate their price, and sell at the peak. We collect a dataset of stocks whose price and volume profiles have the characteristic…
▽ More
The intersection of social media, low-cost trading platforms, and naive investors has created an ideal situation for information-based market manipulations, especially pump&dumps. Manipulators accumulate small-cap stocks, disseminate false information on social media to inflate their price, and sell at the peak. We collect a dataset of stocks whose price and volume profiles have the characteristic shape of a pump&dump, and social media posts for those same stocks that match the timing of the initial price rises. From these we build predictive models for pump&dump events based on the language used in the social media posts.
There are multiple difficulties: not every post will cause the intended market reaction, some pump&dump events may be triggered by posts in other forums, and there may be accidental confluences of post timing and market movements. Nevertheless, our best model achieves a prediction accuracy of 85% and an F1-score of 62%. Such a tool can provide early warning to investors and regulators that a pump&dump may be underway.
△ Less
Submitted 26 January, 2023;
originally announced January 2023.
-
LECO: Learnable Episodic Count for Task-Specific Intrinsic Reward
Authors:
Daejin Jo,
Sungwoong Kim,
Daniel Wontae Nam,
Taehwan Kwon,
Seungeun Rho,
Jongmin Kim,
Donghoon Lee
Abstract:
Episodic count has been widely used to design a simple yet effective intrinsic motivation for reinforcement learning with a sparse reward. However, the use of episodic count in a high-dimensional state space as well as over a long episode time requires a thorough state compression and fast hashing, which hinders rigorous exploitation of it in such hard and complex exploration environments. Moreove…
▽ More
Episodic count has been widely used to design a simple yet effective intrinsic motivation for reinforcement learning with a sparse reward. However, the use of episodic count in a high-dimensional state space as well as over a long episode time requires a thorough state compression and fast hashing, which hinders rigorous exploitation of it in such hard and complex exploration environments. Moreover, the interference from task-irrelevant observations in the episodic count may cause its intrinsic motivation to overlook task-related important changes of states, and the novelty in an episodic manner can lead to repeatedly revisit the familiar states across episodes. In order to resolve these issues, in this paper, we propose a learnable hash-based episodic count, which we name LECO, that efficiently performs as a task-specific intrinsic reward in hard exploration problems. In particular, the proposed intrinsic reward consists of the episodic novelty and the task-specific modulation where the former employs a vector quantized variational autoencoder to automatically obtain the discrete state codes for fast counting while the latter regulates the episodic novelty by learning a modulator to optimize the task-specific extrinsic reward. The proposed LECO specifically enables the automatic transition from exploration to exploitation during reinforcement learning. We experimentally show that in contrast to the previous exploration methods LECO successfully solves hard exploration problems and also scales to large state spaces through the most difficult tasks in MiniGrid and DMLab environments.
△ Less
Submitted 11 October, 2022;
originally announced October 2022.
-
Predictive Synthesis of API-Centric Code
Authors:
Daye Nam,
Baishakhi Ray,
Seohyun Kim,
Xianshan Qu,
Satish Chandra
Abstract:
Today's programmers, especially data science practitioners, make heavy use of data-processing libraries (APIs) such as PyTorch, Tensorflow, NumPy, Pandas, and the like. Program synthesizers can provide significant coding assistance to this community of users; however program synthesis also can be slow due to enormous search spaces. In this work, we examine ways in which machine learning can be use…
▽ More
Today's programmers, especially data science practitioners, make heavy use of data-processing libraries (APIs) such as PyTorch, Tensorflow, NumPy, Pandas, and the like. Program synthesizers can provide significant coding assistance to this community of users; however program synthesis also can be slow due to enormous search spaces. In this work, we examine ways in which machine learning can be used to accelerate enumerative program synthesis. We present a deep-learning-based model to predict the sequence of API functions that would be needed to go from a given input to a desired output, both being numeric vectors. Our work is based on two insights. First, it is possible to learn, based on a large number of input-output examples, to predict the likely API function needed in a given situation. Second, and crucially, it is also possible to learn to compose API functions into a sequence, given an input and the desired final output, without explicitly knowing the intermediate values. We show that we can speed up an enumerative program synthesizer by using predictions from our model variants. These speedups significantly outperform previous ways (e.g. DeepCoder) in which researchers have used ML models in enumerative synthesis.
△ Less
Submitted 17 May, 2022; v1 submitted 10 January, 2022;
originally announced January 2022.
-
One-step replica symmetry breaking of random regular NAE-SAT II
Authors:
Danny Nam,
Allan Sly,
Youngtak Sohn
Abstract:
Continuing our earlier work in \cite{nss20a}, we study the random regular k-NAE-SAT model in the condensation regime. In \cite{nss20a}, the 1RSB properties of the model were established with positive probability. In this paper, we improve the result to probability arbitrarily close to one. To do so, we introduce a new framework which is the synthesis of two approaches: the small subgraph condition…
▽ More
Continuing our earlier work in \cite{nss20a}, we study the random regular k-NAE-SAT model in the condensation regime. In \cite{nss20a}, the 1RSB properties of the model were established with positive probability. In this paper, we improve the result to probability arbitrarily close to one. To do so, we introduce a new framework which is the synthesis of two approaches: the small subgraph conditioning and a variance decomposition technique using Doob martingales and discrete Fourier analysis. The main challenge is a delicate integration of the two methods to overcome the difficulty arising from applying the moment method to an unbounded state space.
△ Less
Submitted 17 December, 2023; v1 submitted 30 November, 2021;
originally announced December 2021.
-
BROS: A Pre-trained Language Model Focusing on Text and Layout for Better Key Information Extraction from Documents
Authors:
Teakgyu Hong,
Donghyun Kim,
Mingi Ji,
Wonseok Hwang,
Daehyun Nam,
Sungrae Park
Abstract:
Key information extraction (KIE) from document images requires understanding the contextual and spatial semantics of texts in two-dimensional (2D) space. Many recent studies try to solve the task by developing pre-trained language models focusing on combining visual features from document images with texts and their layout. On the other hand, this paper tackles the problem by going back to the bas…
▽ More
Key information extraction (KIE) from document images requires understanding the contextual and spatial semantics of texts in two-dimensional (2D) space. Many recent studies try to solve the task by developing pre-trained language models focusing on combining visual features from document images with texts and their layout. On the other hand, this paper tackles the problem by going back to the basic: effective combination of text and layout. Specifically, we propose a pre-trained language model, named BROS (BERT Relying On Spatiality), that encodes relative positions of texts in 2D space and learns from unlabeled documents with area-masking strategy. With this optimized training scheme for understanding texts in 2D space, BROS shows comparable or better performance compared to previous methods on four KIE benchmarks (FUNSD, SROIE*, CORD, and SciTSR) without relying on visual features. This paper also reveals two real-world challenges in KIE tasks-(1) minimizing the error from incorrect text ordering and (2) efficient learning from fewer downstream examples-and demonstrates the superiority of BROS over previous methods. Code is available at https://github.com/clovaai/bros.
△ Less
Submitted 5 April, 2022; v1 submitted 10 August, 2021;
originally announced August 2021.
-
GMAC: A Distributional Perspective on Actor-Critic Framework
Authors:
Daniel Wontae Nam,
Younghoon Kim,
Chan Y. Park
Abstract:
In this paper, we devise a distributional framework on actor-critic as a solution to distributional instability, action type restriction, and conflation between samples and statistics. We propose a new method that minimizes the Cramér distance with the multi-step Bellman target distribution generated from a novel Sample-Replacement algorithm denoted SR($λ$), which learns the correct value distribu…
▽ More
In this paper, we devise a distributional framework on actor-critic as a solution to distributional instability, action type restriction, and conflation between samples and statistics. We propose a new method that minimizes the Cramér distance with the multi-step Bellman target distribution generated from a novel Sample-Replacement algorithm denoted SR($λ$), which learns the correct value distribution under multiple Bellman operations. Parameterizing a value distribution with Gaussian Mixture Model further improves the efficiency and the performance of the method, which we name GMAC. We empirically show that GMAC captures the correct representation of value distributions and improves the performance of a conventional actor-critic method with low computational cost, in both discrete and continuous action spaces using Arcade Learning Environment (ALE) and PyBullet environment.
△ Less
Submitted 15 July, 2021; v1 submitted 24 May, 2021;
originally announced May 2021.
-
Character Region Attention For Text Spotting
Authors:
Youngmin Baek,
Seung Shin,
Jeonghun Baek,
Sungrae Park,
Junyeop Lee,
Daehyun Nam,
Hwalsuk Lee
Abstract:
A scene text spotter is composed of text detection and recognition modules. Many studies have been conducted to unify these modules into an end-to-end trainable model to achieve better performance. A typical architecture places detection and recognition modules into separate branches, and a RoI pooling is commonly used to let the branches share a visual feature. However, there still exists a chanc…
▽ More
A scene text spotter is composed of text detection and recognition modules. Many studies have been conducted to unify these modules into an end-to-end trainable model to achieve better performance. A typical architecture places detection and recognition modules into separate branches, and a RoI pooling is commonly used to let the branches share a visual feature. However, there still exists a chance of establishing a more complimentary connection between the modules when adopting recognizer that uses attention-based decoder and detector that represents spatial information of the character regions. This is possible since the two modules share a common sub-task which is to find the location of the character regions. Based on the insight, we construct a tightly coupled single pipeline model. This architecture is formed by utilizing detection outputs in the recognizer and propagating the recognition loss through the detection stage. The use of character score map helps the recognizer attend better to the character center points, and the recognition loss propagation to the detector module enhances the localization of the character regions. Also, a strengthened sharing stage allows feature rectification and boundary localization of arbitrary-shaped text regions. Extensive experiments demonstrate state-of-the-art performance in publicly available straight and curved benchmark dataset.
△ Less
Submitted 19 July, 2020;
originally announced July 2020.
-
CLEval: Character-Level Evaluation for Text Detection and Recognition Tasks
Authors:
Youngmin Baek,
Daehyun Nam,
Sungrae Park,
Junyeop Lee,
Seung Shin,
Jeonghun Baek,
Chae Young Lee,
Hwalsuk Lee
Abstract:
Despite the recent success of text detection and recognition methods, existing evaluation metrics fail to provide a fair and reliable comparison among those methods. In addition, there exists no end-to-end evaluation metric that takes characteristics of OCR tasks into account. Previous end-to-end metric contains cascaded errors from the binary scoring process applied in both detection and recognit…
▽ More
Despite the recent success of text detection and recognition methods, existing evaluation metrics fail to provide a fair and reliable comparison among those methods. In addition, there exists no end-to-end evaluation metric that takes characteristics of OCR tasks into account. Previous end-to-end metric contains cascaded errors from the binary scoring process applied in both detection and recognition tasks. Ignoring partially correct results raises a gap between quantitative and qualitative analysis, and prevents fine-grained assessment. Based on the fact that character is a key element of text, we hereby propose a Character-Level Evaluation metric (CLEval). In CLEval, the \textit{instance matching} process handles split and merge detection cases, and the \textit{scoring process} conducts character-level evaluation. By aggregating character-level scores, the CLEval metric provides a fine-grained evaluation of end-to-end results composed of the detection and recognition as well as individual evaluations for each module from the end-performance perspective. We believe that our metrics can play a key role in developing and analyzing state-of-the-art text detection and recognition methods. The evaluation code is publicly available at https://github.com/clovaai/CLEval.
△ Less
Submitted 11 June, 2020;
originally announced June 2020.
-
3D Display Calibration by Visual Pattern Analysis
Authors:
Hyoseok Hwang,
Hyun Sung Chang,
Dongkyung Nam,
In So Kweon
Abstract:
Nearly all 3D displays need calibration for correct rendering. More often than not, the optical elements in a 3D display are misaligned from the designed parameter setting. As a result, 3D magic does not perform well as intended. The observed images tend to get distorted. In this paper, we propose a novel display calibration method to fix the situation. In our method, a pattern image is displayed…
▽ More
Nearly all 3D displays need calibration for correct rendering. More often than not, the optical elements in a 3D display are misaligned from the designed parameter setting. As a result, 3D magic does not perform well as intended. The observed images tend to get distorted. In this paper, we propose a novel display calibration method to fix the situation. In our method, a pattern image is displayed on the panel and a camera takes its pictures twice at different positions. Then, based on a quantitative model, we extract all display parameters (i.e., pitch, slanted angle, gap or thickness, offset) from the observed patterns in the captured images. For high accuracy and robustness, our method analyzes the patterns mostly in frequency domain. We conduct two types of experiments for validation; one with optical simulation for quantitative results and the other with real-life displays for qualitative assessment. Experimental results demonstrate that our method is quite accurate, about a half order of magnitude higher than prior work; is efficient, spending less than 2 s for computation; and is robust to noise, working well in the SNR regime as low as 6 dB.
△ Less
Submitted 22 June, 2016;
originally announced June 2016.