Skip to main content

Showing 1–50 of 126 results for author: Baek, J

Searching in archive cs. Search in all archives.
.
  1. arXiv:2505.09666  [pdf, ps, other

    cs.CL cs.AI cs.LG

    System Prompt Optimization with Meta-Learning

    Authors: Yumin Choi, Jinheon Baek, Sung Ju Hwang

    Abstract: Large Language Models (LLMs) have shown remarkable capabilities, with optimizing their input prompts playing a pivotal role in maximizing their performance. However, while LLM prompts consist of both the task-agnostic system prompts and task-specific user prompts, existing work on prompt optimization has focused on user prompts specific to individual queries or tasks, and largely overlooked the sy… ▽ More

    Submitted 14 May, 2025; originally announced May 2025.

  2. arXiv:2504.20734  [pdf, other

    cs.CL cs.AI cs.CV cs.IR cs.LG

    UniversalRAG: Retrieval-Augmented Generation over Multiple Corpora with Diverse Modalities and Granularities

    Authors: Woongyeong Yeo, Kangsan Kim, Soyeong Jeong, Jinheon Baek, Sung Ju Hwang

    Abstract: Retrieval-Augmented Generation (RAG) has shown substantial promise in improving factual accuracy by grounding model responses with external knowledge relevant to queries. However, most existing RAG approaches are limited to a text-only corpus, and while recent efforts have extended RAG to other modalities such as images and videos, they typically operate over a single modality-specific corpus. In… ▽ More

    Submitted 29 April, 2025; originally announced April 2025.

    Comments: Project page : https://universalrag.github.io

  3. arXiv:2504.17192  [pdf, other

    cs.CL

    Paper2Code: Automating Code Generation from Scientific Papers in Machine Learning

    Authors: Minju Seo, Jinheon Baek, Seongyun Lee, Sung Ju Hwang

    Abstract: Despite the rapid growth of machine learning research, corresponding code implementations are often unavailable, making it slow and labor-intensive for researchers to reproduce results and build upon prior work. In the meantime, recent Large Language Models (LLMs) excel at understanding scientific documents and generating high-quality code. Inspired by this, we introduce PaperCoder, a multi-agent… ▽ More

    Submitted 26 April, 2025; v1 submitted 23 April, 2025; originally announced April 2025.

  4. arXiv:2504.01081  [pdf, other

    cs.CV cs.CL eess.IV

    ShieldGemma 2: Robust and Tractable Image Content Moderation

    Authors: Wenjun Zeng, Dana Kurniawan, Ryan Mullins, Yuchi Liu, Tamoghna Saha, Dirichi Ike-Njoku, Jindong Gu, Yiwen Song, Cai Xu, Jingjing Zhou, Aparna Joshi, Shravan Dheep, Mani Malek, Hamid Palangi, Joon Baek, Rick Pereira, Karthik Narasimhan

    Abstract: We introduce ShieldGemma 2, a 4B parameter image content moderation model built on Gemma 3. This model provides robust safety risk predictions across the following key harm categories: Sexually Explicit, Violence \& Gore, and Dangerous Content for synthetic images (e.g. output of any image generation model) and natural images (e.g. any image input to a Vision-Language Model). We evaluated on both… ▽ More

    Submitted 8 April, 2025; v1 submitted 1 April, 2025; originally announced April 2025.

  5. arXiv:2503.22931  [pdf, other

    cs.AI

    Factored Agents: Decoupling In-Context Learning and Memorization for Robust Tool Use

    Authors: Nicholas Roth, Christopher Hidey, Lucas Spangher, William F. Arnold, Chang Ye, Nick Masiewicki, Jinoo Baek, Peter Grabowski, Eugene Ie

    Abstract: In this paper, we propose a novel factored agent architecture designed to overcome the limitations of traditional single-agent systems in agentic AI. Our approach decomposes the agent into two specialized components: (1) a large language model (LLM) that serves as a high level planner and in-context learner, which may use dynamically available information in user prompts, (2) a smaller language mo… ▽ More

    Submitted 2 April, 2025; v1 submitted 28 March, 2025; originally announced March 2025.

  6. arXiv:2503.22087  [pdf, other

    cs.CV

    Mitigating Trade-off: Stream and Query-guided Aggregation for Efficient and Effective 3D Occupancy Prediction

    Authors: Seokha Moon, Janghyun Baek, Giseop Kim, Jinkyu Kim, Sunwook Choi

    Abstract: 3D occupancy prediction has emerged as a key perception task for autonomous driving, as it reconstructs 3D environments to provide a comprehensive scene understanding. Recent studies focus on integrating spatiotemporal information obtained from past observations to improve prediction accuracy, using a multi-frame fusion approach that processes multiple past frames together. However, these methods… ▽ More

    Submitted 27 March, 2025; originally announced March 2025.

  7. arXiv:2503.17728  [pdf, other

    cs.CV cs.AI

    DynASyn: Multi-Subject Personalization Enabling Dynamic Action Synthesis

    Authors: Yongjin Choi, Chanhun Park, Seung Jun Baek

    Abstract: Recent advances in text-to-image diffusion models spurred research on personalization, i.e., a customized image synthesis, of subjects within reference images. Although existing personalization methods are able to alter the subjects' positions or to personalize multiple subjects simultaneously, they often struggle to modify the behaviors of subjects or their dynamic interactions. The difficulty is… ▽ More

    Submitted 22 March, 2025; originally announced March 2025.

    Comments: Accepted at AAAI 2025

  8. arXiv:2503.07660  [pdf, other

    cs.AI cs.CY cs.LG

    Research on Superalignment Should Advance Now with Parallel Optimization of Competence and Conformity

    Authors: HyunJin Kim, Xiaoyuan Yi, Jing Yao, Muhua Huang, JinYeong Bak, James Evans, Xing Xie

    Abstract: The recent leap in AI capabilities, driven by big generative models, has sparked the possibility of achieving Artificial General Intelligence (AGI) and further triggered discussions on Artificial Superintelligence (ASI), a system surpassing all humans across all domains. This gives rise to the critical research question of: If we realize ASI, how do we align it with human values, ensuring it benef… ▽ More

    Submitted 7 March, 2025; originally announced March 2025.

  9. arXiv:2503.06832  [pdf, other

    cs.CV

    GUIDE-CoT: Goal-driven and User-Informed Dynamic Estimation for Pedestrian Trajectory using Chain-of-Thought

    Authors: Sungsik Kim, Janghyun Baek, Jinkyu Kim, Jaekoo Lee

    Abstract: While Large Language Models (LLMs) have recently shown impressive results in reasoning tasks, their application to pedestrian trajectory prediction remains challenging due to two key limitations: insufficient use of visual information and the difficulty of predicting entire trajectories. To address these challenges, we propose Goal-driven and User-Informed Dynamic Estimation for pedestrian traject… ▽ More

    Submitted 9 March, 2025; originally announced March 2025.

    Comments: 10 pages, 5 figures, will be published on The 24th International Conference on Autonomous Agents and Multiagent Systems (AAMAS 2025)

    MSC Class: 93C85 ACM Class: I.2.10

  10. arXiv:2503.05179  [pdf, other

    cs.CL cs.AI cs.LG

    Sketch-of-Thought: Efficient LLM Reasoning with Adaptive Cognitive-Inspired Sketching

    Authors: Simon A. Aytes, Jinheon Baek, Sung Ju Hwang

    Abstract: Recent advances in large language models have demonstrated remarkable reasoning capabilities through Chain of Thought (CoT) prompting, but often at the cost of excessive verbosity in their intermediate outputs, which increases computational overhead. We introduce Sketch-of-Thought (SoT), a novel prompting framework that combines cognitive-inspired reasoning paradigms with linguistic constraints to… ▽ More

    Submitted 7 March, 2025; originally announced March 2025.

  11. arXiv:2503.02338  [pdf

    cs.AI

    Enhancing the Product Quality of the Injection Process Using eXplainable Artificial Intelligence

    Authors: Jisoo Hong, Yongmin Hong, Jung-Woo Baek, Sung-Woo Kang

    Abstract: The injection molding process is a traditional technique for making products in various industries such as electronics and automobiles via solidifying liquid resin into certain molds. Although the process is not related to creating the main part of engines or semiconductors, this manufacturing methodology sets the final form of the products. Re-cently, research has continued to reduce the defect r… ▽ More

    Submitted 4 March, 2025; originally announced March 2025.

  12. arXiv:2502.20063  [pdf, other

    cs.GT cs.CY cs.LG

    Hiring under Congestion and Algorithmic Monoculture: Value of Strategic Behavior

    Authors: Jackie Baek, Hamsa Bastani, Shihan Chen

    Abstract: We study the impact of strategic behavior in a setting where firms compete to hire from a shared pool of applicants, and firms use a common algorithm to evaluate them. Each applicant is associated with a scalar score that is observed by all firms, provided by the algorithm. Firms simultaneously make interview decisions, where the number of interviews is capacity-constrained. Job offers are given t… ▽ More

    Submitted 27 February, 2025; originally announced February 2025.

  13. arXiv:2502.17956  [pdf, other

    cs.CL

    Towards Better Understanding of Program-of-Thought Reasoning in Cross-Lingual and Multilingual Environments

    Authors: Patomporn Payoungkhamdee, Pume Tuchinda, Jinheon Baek, Samuel Cahyawijaya, Can Udomcharoenchaikit, Potsawee Manakul, Peerat Limkonchotiwat, Ekapol Chuangsuwanich, Sarana Nutanong

    Abstract: Multi-step reasoning is essential for large language models (LLMs), yet multilingual performance remains challenging. While Chain-of-Thought (CoT) prompting improves reasoning, it struggles with non-English languages due to the entanglement of reasoning and execution. Program-of-Thought (PoT) prompting separates reasoning from execution, offering a promising alternative but shifting the challenge… ▽ More

    Submitted 25 February, 2025; originally announced February 2025.

  14. arXiv:2502.16691  [pdf, other

    cs.CL cs.DC cs.MA

    Toward Responsible Federated Large Language Models: Leveraging a Safety Filter and Constitutional AI

    Authors: Eunchung Noh, Jeonghun Baek

    Abstract: Recent research has increasingly focused on training large language models (LLMs) using federated learning, known as FedLLM. However, responsible AI (RAI), which aims to ensure safe responses, remains underexplored in the context of FedLLM. In FedLLM, client data used for training may contain harmful content, leading to unsafe LLMs that generate harmful responses. Aggregating such unsafe LLMs into… ▽ More

    Submitted 23 February, 2025; originally announced February 2025.

    Comments: 5 pages, 3 figures

  15. arXiv:2502.14778  [pdf, other

    cs.CL cs.AI cs.CV

    Harnessing PDF Data for Improving Japanese Large Multimodal Models

    Authors: Jeonghun Baek, Akiko Aizawa, Kiyoharu Aizawa

    Abstract: Large Multimodal Models (LMMs) have demonstrated strong performance in English, but their effectiveness in Japanese remains limited due to the lack of high-quality training data. Current Japanese LMMs often rely on translated English datasets, restricting their ability to capture Japan-specific cultural knowledge. To address this, we explore the potential of Japanese PDF data as a training resourc… ▽ More

    Submitted 20 February, 2025; originally announced February 2025.

    Comments: 15 pages, 8 figures

  16. arXiv:2501.18103  [pdf, other

    cs.HC cs.CL

    Beyond Turn-taking: Introducing Text-based Overlap into Human-LLM Interactions

    Authors: JiWoo Kim, Minsuk Chang, JinYeong Bak

    Abstract: Traditional text-based human-AI interactions often adhere to a strict turn-taking approach. In this research, we propose a novel approach that incorporates overlapping messages, mirroring natural human conversations. Through a formative study, we observed that even in text-based contexts, users instinctively engage in overlapping behaviors like "A: Today I went to-" "B: yeah." To capitalize on the… ▽ More

    Submitted 29 January, 2025; originally announced January 2025.

    Comments: 16 pages, 9 figures

  17. arXiv:2501.14174  [pdf, other

    cs.CV cs.AI cs.LG

    Dreamweaver: Learning Compositional World Models from Pixels

    Authors: Junyeob Baek, Yi-Fu Wu, Gautam Singh, Sungjin Ahn

    Abstract: Humans have an innate ability to decompose their perceptions of the world into objects and their attributes, such as colors, shapes, and movement patterns. This cognitive process enables us to imagine novel futures by recombining familiar concepts. However, replicating this ability in artificial intelligence systems has proven challenging, particularly when it comes to modeling videos into composi… ▽ More

    Submitted 10 April, 2025; v1 submitted 23 January, 2025; originally announced January 2025.

  18. arXiv:2501.07824  [pdf, other

    cs.CL cs.AI cs.LG

    Real-time Verification and Refinement of Language Model Text Generation

    Authors: Joonho Ko, Jinheon Baek, Sung Ju Hwang

    Abstract: Large language models (LLMs) have shown remarkable performance across a wide range of natural language tasks. However, a critical challenge remains in that they sometimes generate factually incorrect answers. To address this, while many previous work has focused on identifying errors in their generation and further refining them, they are slow in deployment since they are designed to verify the re… ▽ More

    Submitted 13 April, 2025; v1 submitted 13 January, 2025; originally announced January 2025.

  19. arXiv:2501.05874  [pdf, other

    cs.CV cs.AI cs.CL cs.IR cs.LG

    VideoRAG: Retrieval-Augmented Generation over Video Corpus

    Authors: Soyeong Jeong, Kangsan Kim, Jinheon Baek, Sung Ju Hwang

    Abstract: Retrieval-Augmented Generation (RAG) is a powerful strategy for improving the factual accuracy of models by retrieving external knowledge relevant to queries and incorporating it into the generation process. However, existing approaches primarily focus on text, with some recent advancements considering images, and they largely overlook videos, a rich source of multimodal knowledge capable of repre… ▽ More

    Submitted 4 March, 2025; v1 submitted 10 January, 2025; originally announced January 2025.

  20. arXiv:2412.18232  [pdf, other

    cs.IR

    Efficient Long Context Language Model Retrieval with Compression

    Authors: Minju Seo, Jinheon Baek, Seongyun Lee, Sung Ju Hwang

    Abstract: Long Context Language Models (LCLMs) have emerged as a new paradigm to perform Information Retrieval (IR), which enables the direct ingestion and retrieval of information by processing an entire corpus in their single context, showcasing the potential to surpass traditional sparse and dense retrieval methods. However, processing a large number of passages within in-context for retrieval is computa… ▽ More

    Submitted 24 December, 2024; originally announced December 2024.

  21. arXiv:2412.16926  [pdf, other

    cs.CL cs.AI cs.LG

    Revisiting In-Context Learning with Long Context Language Models

    Authors: Jinheon Baek, Sun Jae Lee, Prakhar Gupta, Geunseob Oh, Siddharth Dalmia, Prateek Kolhar

    Abstract: In-Context Learning (ICL) is a technique by which language models make predictions based on examples provided in their input context. Previously, their context window size imposed a limit on the number of examples that can be shown, making example selection techniques crucial for identifying the maximally effective set of examples. However, the recent advent of Long Context Language Models (LCLMs)… ▽ More

    Submitted 6 January, 2025; v1 submitted 22 December, 2024; originally announced December 2024.

  22. arXiv:2412.16468  [pdf, other

    cs.LG

    The Road to Artificial SuperIntelligence: A Comprehensive Survey of Superalignment

    Authors: HyunJin Kim, Xiaoyuan Yi, Jing Yao, Jianxun Lian, Muhua Huang, Shitong Duan, JinYeong Bak, Xing Xie

    Abstract: The emergence of large language models (LLMs) has sparked the possibility of about Artificial Superintelligence (ASI), a hypothetical AI system surpassing human intelligence. However, existing alignment paradigms struggle to guide such advanced AI systems. Superalignment, the alignment of AI systems with human values and safety requirements at superhuman levels of capability aims to addresses two… ▽ More

    Submitted 25 December, 2024; v1 submitted 20 December, 2024; originally announced December 2024.

  23. arXiv:2411.06387  [pdf, other

    cs.LG cs.AI cs.CL

    Self-Training Meets Consistency: Improving LLMs' Reasoning with Consistency-Driven Rationale Evaluation

    Authors: Jaehyeok Lee, Keisuke Sakaguchi, JinYeong Bak

    Abstract: Self-training approach for large language models (LLMs) improves reasoning abilities by training the models on their self-generated rationales. Previous approaches have labeled rationales that produce correct answers for a given question as appropriate for training. However, a single measure risks misjudging rationale quality, leading the models to learn flawed reasoning patterns. To address this… ▽ More

    Submitted 6 February, 2025; v1 submitted 10 November, 2024; originally announced November 2024.

    Comments: Accepted to NAACL 2025

  24. arXiv:2411.06071  [pdf, other

    cs.CV

    GlocalCLIP: Object-agnostic Global-Local Prompt Learning for Zero-shot Anomaly Detection

    Authors: Jiyul Ham, Yonggon Jung, Jun-Geol Baek

    Abstract: Zero-shot anomaly detection (ZSAD) is crucial for detecting anomalous patterns in target datasets without using training samples, specifically in scenarios where there are distributional differences between the target domain and training data or where data scarcity arises because of restricted access. Although recently pretrained vision-language models demonstrate strong zero-shot performance acro… ▽ More

    Submitted 8 December, 2024; v1 submitted 9 November, 2024; originally announced November 2024.

    Comments: 29 pages, 36 figures

  25. arXiv:2410.22375  [pdf, other

    cs.SE cs.AI cs.CL

    Rethinking Code Refinement: Learning to Judge Code Efficiency

    Authors: Minju Seo, Jinheon Baek, Sung Ju Hwang

    Abstract: Large Language Models (LLMs) have demonstrated impressive capabilities in understanding and generating codes. Due to these capabilities, many recent methods are proposed to automatically refine the codes with LLMs. However, we should rethink that the refined codes (from LLMs and even humans) are not always more efficient than their original versions. On the other hand, running two different versio… ▽ More

    Submitted 29 October, 2024; originally announced October 2024.

  26. arXiv:2410.17250  [pdf, other

    cs.CL cs.AI cs.CV

    JMMMU: A Japanese Massive Multi-discipline Multimodal Understanding Benchmark for Culture-aware Evaluation

    Authors: Shota Onohara, Atsuyuki Miyai, Yuki Imajuku, Kazuki Egashira, Jeonghun Baek, Xiang Yue, Graham Neubig, Kiyoharu Aizawa

    Abstract: Accelerating research on Large Multimodal Models (LMMs) in non-English languages is crucial for enhancing user experiences across broader populations. In this paper, we introduce JMMMU (Japanese MMMU), the first large-scale Japanese benchmark designed to evaluate LMMs on expert-level tasks based on the Japanese cultural context. To facilitate comprehensive culture-aware evaluation, JMMMU features… ▽ More

    Submitted 19 March, 2025; v1 submitted 22 October, 2024; originally announced October 2024.

    Comments: Accepted at NAACL 2025. Project page: https://mmmu-japanese-benchmark.github.io/JMMMU/

  27. arXiv:2410.02729  [pdf, other

    cs.CL cs.AI cs.IR

    Unified Multimodal Interleaved Document Representation for Retrieval

    Authors: Jaewoo Lee, Joonho Ko, Jinheon Baek, Soyeong Jeong, Sung Ju Hwang

    Abstract: Information Retrieval (IR) methods aim to identify documents relevant to a query, which have been widely applied in various natural language tasks. However, existing approaches typically consider only the textual content within documents, overlooking the fact that documents can contain multiple modalities, including images and tables. Also, they often segment each long document into multiple discr… ▽ More

    Submitted 16 December, 2024; v1 submitted 3 October, 2024; originally announced October 2024.

    Comments: Preprint

  28. arXiv:2410.00328  [pdf, other

    cs.PF

    Tuning Fast Memory Size based on Modeling of Page Migration for Tiered Memory

    Authors: Shangye Chen, Jin Huang, Shuangyan Yang, Jie Liu, Huaicheng Li, Dimitrios Nikolopoulos, Junhee Ryu, Jinho Baek, Kwangsik Shin, Dong Li

    Abstract: Tiered memory, built upon a combination of fast memory and slow memory, provides a cost-effective solution to meet ever-increasing requirements from emerging applications for large memory capacity. Reducing the size of fast memory is valuable to improve memory utilization in production and reduce production costs because fast memory tends to be expensive. However, deciding the fast memory size is… ▽ More

    Submitted 30 September, 2024; originally announced October 2024.

  29. arXiv:2408.15180  [pdf, ps, other

    cs.LO math.RA

    Formalizing Mason-Stothers Theorem and its Corollaries in Lean 4

    Authors: Jineon Baek, Seewoo Lee

    Abstract: The ABC conjecture implies many conjectures and theorems in number theory, including the celebrated Fermat's Last Theorem. Mason-Stothers Theorem is a function field analogue of the ABC conjecture that admits a much more elementary proof with many interesting consequences, including a polynomial version of Fermat's Last Theorem. While years of dedicated effort are expected for a full formalization… ▽ More

    Submitted 27 August, 2024; originally announced August 2024.

  30. arXiv:2408.10107  [pdf, other

    cs.LG cs.AI stat.ML

    Perturb-and-Compare Approach for Detecting Out-of-Distribution Samples in Constrained Access Environments

    Authors: Heeyoung Lee, Hoyoon Byun, Changdae Oh, JinYeong Bak, Kyungwoo Song

    Abstract: Accessing machine learning models through remote APIs has been gaining prevalence following the recent trend of scaling up model parameters for increased performance. Even though these models exhibit remarkable ability, detecting out-of-distribution (OOD) samples remains a crucial safety concern for end users as these samples may induce unreliable outputs from the model. In this work, we propose a… ▽ More

    Submitted 19 August, 2024; originally announced August 2024.

    Comments: Accepted to European Conference on Artificial Intelligence (ECAI) 2024

  31. arXiv:2407.13942  [pdf, other

    cs.CY cs.AI cs.CL cs.SI

    Harmful Suicide Content Detection

    Authors: Kyumin Park, Myung Jae Baik, YeongJun Hwang, Yen Shin, HoJae Lee, Ruda Lee, Sang Min Lee, Je Young Hannah Sun, Ah Rah Lee, Si Yeun Yoon, Dong-ho Lee, Jihyung Moon, JinYeong Bak, Kyunghyun Cho, Jong-Woo Paik, Sungjoon Park

    Abstract: Harmful suicide content on the Internet is a significant risk factor inducing suicidal thoughts and behaviors among vulnerable populations. Despite global efforts, existing resources are insufficient, specifically in high-risk regions like the Republic of Korea. Current research mainly focuses on understanding negative effects of such content or suicide risk in individuals, rather than on automati… ▽ More

    Submitted 2 June, 2024; originally announced July 2024.

    Comments: 30 pages, 7 figures

  32. arXiv:2407.07413  [pdf, other

    cs.CL

    KpopMT: Translation Dataset with Terminology for Kpop Fandom

    Authors: JiWoo Kim, Yunsu Kim, JinYeong Bak

    Abstract: While machines learn from existing corpora, humans have the unique capability to establish and accept new language systems. This makes human form unique language systems within social groups. Aligning with this, we focus on a gap remaining in addressing translation challenges within social groups, where in-group members utilize unique terminologies. We propose KpopMT dataset, which aims to fill th… ▽ More

    Submitted 10 July, 2024; originally announced July 2024.

    Comments: accepted to LoresMT 2024

  33. arXiv:2407.02736  [pdf, other

    cs.CL

    MentalAgora: A Gateway to Advanced Personalized Care in Mental Health through Multi-Agent Debating and Attribute Control

    Authors: Yeonji Lee, Sangjun Park, Kyunghyun Cho, JinYeong Bak

    Abstract: As mental health issues globally escalate, there is a tremendous need for advanced digital support systems. We introduce MentalAgora, a novel framework employing large language models enhanced by interaction between multiple agents for tailored mental health support. This framework operates through three stages: strategic debating, tailored counselor creation, and response generation, enabling the… ▽ More

    Submitted 2 July, 2024; originally announced July 2024.

  34. arXiv:2406.16042  [pdf, other

    cs.CV

    Pose-dIVE: Pose-Diversified Augmentation with Diffusion Model for Person Re-Identification

    Authors: Inès Hyeonsu Kim, JoungBin Lee, Woojeong Jin, Soowon Son, Kyusun Cho, Junyoung Seo, Min-Seop Kwak, Seokju Cho, JeongYeol Baek, Byeongwon Lee, Seungryong Kim

    Abstract: Person re-identification (Re-ID) often faces challenges due to variations in human poses and camera viewpoints, which significantly affect the appearance of individuals across images. Existing datasets frequently lack diversity and scalability in these aspects, hindering the generalization of Re-ID models to new camera systems. We propose Pose-dIVE, a novel data augmentation approach that incorpor… ▽ More

    Submitted 15 October, 2024; v1 submitted 23 June, 2024; originally announced June 2024.

  35. arXiv:2406.16013  [pdf, other

    cs.CL cs.AI cs.IR

    Database-Augmented Query Representation for Information Retrieval

    Authors: Soyeong Jeong, Jinheon Baek, Sukmin Cho, Sung Ju Hwang, Jong C. Park

    Abstract: Information retrieval models that aim to search for the documents relevant to the given query have shown many successes, which have been applied to diverse tasks. However, the query provided by the user is oftentimes very short, which challenges the retrievers to correctly fetch relevant documents. To tackle this, existing studies have proposed expanding the query with a couple of additional (user… ▽ More

    Submitted 23 June, 2024; originally announced June 2024.

  36. arXiv:2406.06929  [pdf, ps, other

    cs.GT

    Social Learning with Bounded Rationality: Negative Reviews Persist under Newest First

    Authors: Jackie Baek, Atanas Dinev, Thodoris Lykouris

    Abstract: We study a model of social learning from reviews where customers are computationally limited and make purchases based on reading only the first few reviews displayed by the platform. Under this bounded rationality, we establish that the review ordering policy can have a significant impact. In particular, the popular Newest First ordering induces a negative review to persist as the most recent revi… ▽ More

    Submitted 22 August, 2024; v1 submitted 11 June, 2024; originally announced June 2024.

    Comments: An extended abstract appeared at the Twenty-Fifth ACM Conference on Economics and Computation (EC 2024)

  37. arXiv:2406.06793  [pdf, other

    cs.LG cs.AI

    PlanDQ: Hierarchical Plan Orchestration via D-Conductor and Q-Performer

    Authors: Chang Chen, Junyeob Baek, Fei Deng, Kenji Kawaguchi, Caglar Gulcehre, Sungjin Ahn

    Abstract: Despite the recent advancements in offline RL, no unified algorithm could achieve superior performance across a broad range of tasks. Offline \textit{value function learning}, in particular, struggles with sparse-reward, long-horizon tasks due to the difficulty of solving credit assignment and extrapolation errors that accumulates as the horizon of the task grows.~On the other hand, models that ca… ▽ More

    Submitted 10 June, 2024; originally announced June 2024.

  38. arXiv:2406.05967  [pdf, other

    cs.CV cs.AI cs.CL cs.LG

    CVQA: Culturally-diverse Multilingual Visual Question Answering Benchmark

    Authors: David Romero, Chenyang Lyu, Haryo Akbarianto Wibowo, Teresa Lynn, Injy Hamed, Aditya Nanda Kishore, Aishik Mandal, Alina Dragonetti, Artem Abzaliev, Atnafu Lambebo Tonja, Bontu Fufa Balcha, Chenxi Whitehouse, Christian Salamea, Dan John Velasco, David Ifeoluwa Adelani, David Le Meur, Emilio Villa-Cueva, Fajri Koto, Fauzan Farooqui, Frederico Belcavello, Ganzorig Batnasan, Gisela Vallejo, Grainne Caulfield, Guido Ivetta, Haiyue Song , et al. (51 additional authors not shown)

    Abstract: Visual Question Answering (VQA) is an important task in multimodal AI, and it is often used to test the ability of vision-language models to understand and reason on knowledge present in both visual and textual data. However, most of the current VQA models use datasets that are primarily focused on English and a few major world languages, with images that are typically Western-centric. While recen… ▽ More

    Submitted 4 November, 2024; v1 submitted 9 June, 2024; originally announced June 2024.

    Comments: 38th Conference on Neural Information Processing Systems (NeurIPS 2024) Track on Datasets and Benchmarks

  39. arXiv:2406.05761  [pdf, other

    cs.CL

    The BiGGen Bench: A Principled Benchmark for Fine-grained Evaluation of Language Models with Language Models

    Authors: Seungone Kim, Juyoung Suk, Ji Yong Cho, Shayne Longpre, Chaeeun Kim, Dongkeun Yoon, Guijin Son, Yejin Cho, Sheikh Shafayat, Jinheon Baek, Sue Hyun Park, Hyeonbin Hwang, Jinkyung Jo, Hyowon Cho, Haebin Shin, Seongyun Lee, Hanseok Oh, Noah Lee, Namgyu Ho, Se June Joo, Miyoung Ko, Yoonjoo Lee, Hyungjoo Chae, Jamin Shin, Joel Jang , et al. (7 additional authors not shown)

    Abstract: As language models (LMs) become capable of handling a wide range of tasks, their evaluation is becoming as challenging as their development. Most generation benchmarks currently assess LMs using abstract evaluation criteria like helpfulness and harmlessness, which often lack the flexibility and granularity of human assessment. Additionally, these benchmarks tend to focus disproportionately on spec… ▽ More

    Submitted 25 March, 2025; v1 submitted 9 June, 2024; originally announced June 2024.

    Comments: NAACL 2025 (Main Conference)

  40. arXiv:2404.07738  [pdf, other

    cs.CL cs.AI cs.LG

    ResearchAgent: Iterative Research Idea Generation over Scientific Literature with Large Language Models

    Authors: Jinheon Baek, Sujay Kumar Jauhar, Silviu Cucerzan, Sung Ju Hwang

    Abstract: The pace of scientific research, vital for improving human life, is complex, slow, and needs specialized expertise. Meanwhile, novel, impactful research often stems from both a deep understanding of prior work, and a cross-pollination of ideas across domains and fields. To enhance the productivity of researchers, we propose ResearchAgent, which leverages the encyclopedic knowledge and linguistic r… ▽ More

    Submitted 9 February, 2025; v1 submitted 11 April, 2024; originally announced April 2024.

    Comments: NAACL 2025

  41. arXiv:2404.02949  [pdf, other

    cs.LG cs.AI

    The SaTML '24 CNN Interpretability Competition: New Innovations for Concept-Level Interpretability

    Authors: Stephen Casper, Jieun Yun, Joonhyuk Baek, Yeseong Jung, Minhwan Kim, Kiwan Kwon, Saerom Park, Hayden Moore, David Shriver, Marissa Connor, Keltin Grimes, Angus Nicolson, Arush Tagade, Jessica Rumbelow, Hieu Minh Nguyen, Dylan Hadfield-Menell

    Abstract: Interpretability techniques are valuable for helping humans understand and oversee AI systems. The SaTML 2024 CNN Interpretability Competition solicited novel methods for studying convolutional neural networks (CNNs) at the ImageNet scale. The objective of the competition was to help human crowd-workers identify trojans in CNNs. This report showcases the methods and results of four featured compet… ▽ More

    Submitted 3 April, 2024; originally announced April 2024.

    Comments: Competition for SaTML 2024

  42. arXiv:2403.14403  [pdf, other

    cs.CL cs.AI

    Adaptive-RAG: Learning to Adapt Retrieval-Augmented Large Language Models through Question Complexity

    Authors: Soyeong Jeong, Jinheon Baek, Sukmin Cho, Sung Ju Hwang, Jong C. Park

    Abstract: Retrieval-Augmented Large Language Models (LLMs), which incorporate the non-parametric knowledge from external knowledge bases into LLMs, have emerged as a promising approach to enhancing response accuracy in several tasks, such as Question-Answering (QA). However, even though there are various approaches dealing with queries of different complexities, they either handle simple queries with unnece… ▽ More

    Submitted 28 March, 2024; v1 submitted 21 March, 2024; originally announced March 2024.

    Comments: NAACL 2024

  43. arXiv:2402.13482  [pdf, other

    cs.CL cs.AI cs.LG

    Retrieval-Augmented Data Augmentation for Low-Resource Domain Tasks

    Authors: Minju Seo, Jinheon Baek, James Thorne, Sung Ju Hwang

    Abstract: Despite large successes of recent language models on diverse tasks, they suffer from severe performance degeneration in low-resource settings with limited training data available. Many existing works tackle this problem by generating synthetic data from the training data and then training models on them, recently using Large Language Models (LLMs). However, in low-resource settings, the amount of… ▽ More

    Submitted 20 February, 2024; originally announced February 2024.

  44. arXiv:2401.10404  [pdf, other

    cs.CV

    Inflation with Diffusion: Efficient Temporal Adaptation for Text-to-Video Super-Resolution

    Authors: Xin Yuan, Jinoo Baek, Keyang Xu, Omer Tov, Hongliang Fei

    Abstract: We propose an efficient diffusion-based text-to-video super-resolution (SR) tuning approach that leverages the readily learned capacity of pixel level image diffusion model to capture spatial information for video generation. To accomplish this goal, we design an efficient architecture by inflating the weightings of the text-to-image SR model into our video generation framework. Additionally, we i… ▽ More

    Submitted 18 January, 2024; originally announced January 2024.

    Comments: WACV'24 workshop

  45. arXiv:2401.08544  [pdf

    math.NA cs.LG

    N-Adaptive Ritz Method: A Neural Network Enriched Partition of Unity for Boundary Value Problems

    Authors: Jonghyuk Baek, Yanran Wang, J. S. Chen

    Abstract: Conventional finite element methods are known to be tedious in adaptive refinements due to their conformal regularity requirements. Further, the enrichment functions for adaptive refinements are often not readily available in general applications. This work introduces a novel neural network-enriched Partition of Unity (NN-PU) approach for solving boundary value problems via artificial neural netwo… ▽ More

    Submitted 16 January, 2024; originally announced January 2024.

    Comments: 66 pages, 41 figures, 7 tables

  46. arXiv:2312.14492  [pdf, other

    cs.CV

    Context Enhanced Transformer for Single Image Object Detection

    Authors: Seungjun An, Seonghoon Park, Gyeongnyeon Kim, Jeongyeol Baek, Byeongwon Lee, Seungryong Kim

    Abstract: With the increasing importance of video data in real-world applications, there is a rising need for efficient object detection methods that utilize temporal information. While existing video object detection (VOD) techniques employ various strategies to address this challenge, they typically depend on locally adjacent frames or randomly sampled images within a clip. Although recent Transformer-bas… ▽ More

    Submitted 26 December, 2023; v1 submitted 22 December, 2023; originally announced December 2023.

    Comments: Project page: https://ku-cvlab.github.io/CETR

  47. arXiv:2312.10806  [pdf, other

    cs.CV

    Cross-Lingual Learning in Multilingual Scene Text Recognition

    Authors: Jeonghun Baek, Yusuke Matsui, Kiyoharu Aizawa

    Abstract: In this paper, we investigate cross-lingual learning (CLL) for multilingual scene text recognition (STR). CLL transfers knowledge from one language to another. We aim to find the condition that exploits knowledge from high-resource languages for improving performance in low-resource languages. To do so, we first examine if two general insights about CLL discussed in previous works are applied to m… ▽ More

    Submitted 17 December, 2023; originally announced December 2023.

    Comments: Accepted at ICASSP2024, 5 pages, 2 figures

  48. 3D Teeth Reconstruction from Panoramic Radiographs using Neural Implicit Functions

    Authors: Sihwa Park, Seongjun Kim, In-Seok Song, Seung Jun Baek

    Abstract: Panoramic radiography is a widely used imaging modality in dental practice and research. However, it only provides flattened 2D images, which limits the detailed assessment of dental structures. In this paper, we propose Occudent, a framework for 3D teeth reconstruction from panoramic radiographs using neural implicit functions, which, to the best of our knowledge, is the first work to do so. For… ▽ More

    Submitted 28 November, 2023; originally announced November 2023.

    Comments: 12 pages, 2 figures, accepted to International Conference on Medical Image Computing and Computer-Assisted Intervention MICCAI 2023

  49. arXiv:2311.08590  [pdf, other

    cs.CL

    PEMA: An Offsite-Tunable Plug-in External Memory Adaptation for Language Models

    Authors: HyunJin Kim, Young Jin Kim, JinYeong Bak

    Abstract: Pre-trained language models (PLMs) show impressive performance in various downstream NLP tasks. However, pre-training large language models demands substantial memory and training compute. Furthermore, due to the substantial resources required, many PLM weights are confidential. Consequently, users are compelled to share their data with model owners for fine-tuning specific tasks. To overcome the… ▽ More

    Submitted 29 March, 2024; v1 submitted 14 November, 2023; originally announced November 2023.

    Comments: Accepted to NAACL 2024

  50. arXiv:2311.06318  [pdf, other

    cs.IR cs.AI cs.CL cs.LG

    Knowledge-Augmented Large Language Models for Personalized Contextual Query Suggestion

    Authors: Jinheon Baek, Nirupama Chandrasekaran, Silviu Cucerzan, Allen herring, Sujay Kumar Jauhar

    Abstract: Large Language Models (LLMs) excel at tackling various natural language tasks. However, due to the significant costs involved in re-training or fine-tuning them, they remain largely static and difficult to personalize. Nevertheless, a variety of applications could benefit from generations that are tailored to users' preferences, goals, and knowledge. Among them is web search, where knowing what a… ▽ More

    Submitted 19 February, 2024; v1 submitted 9 November, 2023; originally announced November 2023.

    Comments: The Web Conference (WWW) 2024