Skip to main content

Showing 1–15 of 15 results for author: Sohn, S S

Searching in archive cs. Search in all archives.
.
  1. arXiv:2503.02907  [pdf, other

    cs.SD cs.LG eess.AS

    Fine-Tuning Whisper for Inclusive Prosodic Stress Analysis

    Authors: Samuel S. Sohn, Sten Knutsen, Karin Stromswold

    Abstract: Prosody plays a crucial role in speech perception, influencing both human understanding and automatic speech recognition (ASR) systems. Despite its importance, prosodic stress remains under-studied due to the challenge of efficiently analyzing it. This study explores fine-tuning OpenAI's Whisper large-v2 ASR model to recognize phrasal, lexical, and contrastive stress in speech. Using a dataset of… ▽ More

    Submitted 3 March, 2025; originally announced March 2025.

    Comments: Appears in Proceedings of the ISCA/ITG Workshop on Diversity in Large Speech and Language Models

  2. arXiv:2502.07128  [pdf, other

    cs.CL cs.AI cs.MM

    Cardiverse: Harnessing LLMs for Novel Card Game Prototyping

    Authors: Danrui Li, Sen Zhang, Sam S. Sohn, Kaidong Hu, Muhammad Usman, Mubbasir Kapadia

    Abstract: The prototyping of computer games, particularly card games, requires extensive human effort in creative ideation and gameplay evaluation. Recent advances in Large Language Models (LLMs) offer opportunities to automate and streamline these processes. However, it remains challenging for LLMs to design novel game mechanics beyond existing databases, generate consistent gameplay environments, and deve… ▽ More

    Submitted 10 February, 2025; originally announced February 2025.

    Comments: 13 pages, 7 figures, 3 tables

  3. TrajDiffuse: A Conditional Diffusion Model for Environment-Aware Trajectory Prediction

    Authors: Qingze, Liu, Danrui Li, Samuel S. Sohn, Sejong Yoon, Mubbasir Kapadia, Vladimir Pavlovic

    Abstract: Accurate prediction of human or vehicle trajectories with good diversity that captures their stochastic nature is an essential task for many applications. However, many trajectory prediction models produce unreasonable trajectory samples that focus on improving diversity or accuracy while neglecting other key requirements, such as collision avoidance with the surrounding environment. In this work,… ▽ More

    Submitted 14 October, 2024; originally announced October 2024.

    Comments: Accepted to be published as inpreceedings of the 2024 International Conference on Pattern Recognition (ICPR)

  4. arXiv:2406.10478  [pdf, other

    cs.CL cs.AI cs.GR

    From Words to Worlds: Transforming One-line Prompt into Immersive Multi-modal Digital Stories with Communicative LLM Agent

    Authors: Samuel S. Sohn, Danrui Li, Sen Zhang, Che-Jui Chang, Mubbasir Kapadia

    Abstract: Digital storytelling, essential in entertainment, education, and marketing, faces challenges in production scalability and flexibility. The StoryAgent framework, introduced in this paper, utilizes Large Language Models and generative tools to automate and refine digital storytelling. Employing a top-down story drafting and bottom-up asset generation approach, StoryAgent tackles key issues such as… ▽ More

    Submitted 21 June, 2024; v1 submitted 14 June, 2024; originally announced June 2024.

    Comments: 16 pages, 13 figures

  5. arXiv:2406.05431  [pdf

    cs.CL

    MaTableGPT: GPT-based Table Data Extractor from Materials Science Literature

    Authors: Gyeong Hoon Yi, Jiwoo Choi, Hyeongyun Song, Olivia Miano, Jaewoong Choi, Kihoon Bang, Byungju Lee, Seok Su Sohn, David Buttler, Anna Hiszpanski, Sang Soo Han, Donghun Kim

    Abstract: Efficiently extracting data from tables in the scientific literature is pivotal for building large-scale databases. However, the tables reported in materials science papers exist in highly diverse forms; thus, rule-based extractions are an ineffective approach. To overcome this challenge, we present MaTableGPT, which is a GPT-based table data extractor from the materials science literature. MaTabl… ▽ More

    Submitted 8 June, 2024; originally announced June 2024.

  6. arXiv:2309.15311  [pdf, other

    cs.HC cs.AI cs.GR

    The Importance of Multimodal Emotion Conditioning and Affect Consistency for Embodied Conversational Agents

    Authors: Che-Jui Chang, Samuel S. Sohn, Sen Zhang, Rajath Jayashankar, Muhammad Usman, Mubbasir Kapadia

    Abstract: Previous studies regarding the perception of emotions for embodied virtual agents have shown the effectiveness of using virtual characters in conveying emotions through interactions with humans. However, creating an autonomous embodied conversational agent with expressive behaviors presents two major challenges. The first challenge is the difficulty of synthesizing the conversational behaviors for… ▽ More

    Submitted 6 December, 2023; v1 submitted 26 September, 2023; originally announced September 2023.

  7. arXiv:2306.16772  [pdf, other

    cs.CV cs.AI cs.LG

    M3Act: Learning from Synthetic Human Group Activities

    Authors: Che-Jui Chang, Danrui Li, Deep Patel, Parth Goel, Honglu Zhou, Seonghyeon Moon, Samuel S. Sohn, Sejong Yoon, Vladimir Pavlovic, Mubbasir Kapadia

    Abstract: The study of complex human interactions and group activities has become a focal point in human-centric computer vision. However, progress in related tasks is often hindered by the challenges of obtaining large-scale labeled datasets from real-world scenarios. To address the limitation, we introduce M3Act, a synthetic data generator for multi-view multi-group multi-person human atomic actions and g… ▽ More

    Submitted 2 May, 2024; v1 submitted 29 June, 2023; originally announced June 2023.

  8. arXiv:2212.04673  [pdf, other

    cs.CV

    MSI: Maximize Support-Set Information for Few-Shot Segmentation

    Authors: Seonghyeon Moon, Samuel S. Sohn, Honglu Zhou, Sejong Yoon, Vladimir Pavlovic, Muhammad Haris Khan, Mubbasir Kapadia

    Abstract: FSS(Few-shot segmentation) aims to segment a target class using a small number of labeled images(support set). To extract information relevant to the target class, a dominant approach in best-performing FSS methods removes background features using a support mask. We observe that this feature excision through a limiting support mask introduces an information bottleneck in several challenging FSS c… ▽ More

    Submitted 10 November, 2023; v1 submitted 9 December, 2022; originally announced December 2022.

    Comments: ICCV 2023

  9. arXiv:2211.00817  [pdf, other

    cs.LG cs.MA

    An Information-Theoretic Approach for Estimating Scenario Generalization in Crowd Motion Prediction

    Authors: Gang Qiao, Kaidong Hu, Seonghyeon Moon, Samuel S. Sohn, Sejong Yoon, Mubbasir Kapadia, Vladimir Pavlovic

    Abstract: Learning-based approaches to modeling crowd motion have become increasingly successful but require training and evaluation on large datasets, coupled with complex model selection and parameter tuning. To circumvent this tremendously time-consuming process, we propose a novel scoring method, which characterizes generalization of models trained on source crowd scenarios and applied to target crowd s… ▽ More

    Submitted 1 November, 2022; originally announced November 2022.

  10. arXiv:2205.09075  [pdf

    cond-mat.mtrl-sci cs.LG

    Predicting failure characteristics of structural materials via deep learning based on nondestructive void topology

    Authors: Leslie Ching Ow Tiong, Gunjick Lee, Seok Su Sohn, Donghun Kim

    Abstract: Accurate predictions of the failure progression of structural materials is critical for preventing failure-induced accidents. Despite considerable mechanics modeling-based efforts, accurate prediction remains a challenging task in real-world environments due to unexpected damage factors and defect evolutions. Here, we report a novel method for predicting material failure characteristics that uniqu… ▽ More

    Submitted 17 May, 2022; originally announced May 2022.

  11. arXiv:2203.12826  [pdf, other

    cs.CV

    HM: Hybrid Masking for Few-Shot Segmentation

    Authors: Seonghyeon Moon, Samuel S. Sohn, Honglu Zhou, Sejong Yoon, Vladimir Pavlovic, Muhammad Haris Khan, Mubbasir Kapadia

    Abstract: We study few-shot semantic segmentation that aims to segment a target object from a query image when provided with a few annotated support images of the target class. Several recent methods resort to a feature masking (FM) technique to discard irrelevant feature activations which eventually facilitates the reliable prediction of segmentation mask. A fundamental limitation of FM is the inability to… ▽ More

    Submitted 24 July, 2022; v1 submitted 23 March, 2022; originally announced March 2022.

    Comments: 14 pages

    MSC Class: 68T45

  12. arXiv:2201.07189  [pdf, other

    cs.CV

    MUSE-VAE: Multi-Scale VAE for Environment-Aware Long Term Trajectory Prediction

    Authors: Mihee Lee, Samuel S. Sohn, Seonghyeon Moon, Sejong Yoon, Mubbasir Kapadia, Vladimir Pavlovic

    Abstract: Accurate long-term trajectory prediction in complex scenes, where multiple agents (e.g., pedestrians or vehicles) interact with each other and the environment while attempting to accomplish diverse and often unknown goals, is a challenging stochastic forecasting problem. In this work, we propose MUSE, a new probabilistic modeling framework based on a cascade of Conditional VAEs, which tackles the… ▽ More

    Submitted 18 January, 2022; originally announced January 2022.

  13. arXiv:2112.11734  [pdf, other

    cs.LG cs.AI

    D-HYPR: Harnessing Neighborhood Modeling and Asymmetry Preservation for Digraph Representation Learning

    Authors: Honglu Zhou, Advith Chegu, Samuel S. Sohn, Zuohui Fu, Gerard de Melo, Mubbasir Kapadia

    Abstract: Digraph Representation Learning (DRL) aims to learn representations for directed homogeneous graphs (digraphs). Prior work in DRL is largely constrained (e.g., limited to directed acyclic graphs), or has poor generalizability across tasks (e.g., evaluated solely on one task). Most Graph Neural Networks (GNNs) exhibit poor performance on digraphs due to the neglect of modeling neighborhoods and pre… ▽ More

    Submitted 28 September, 2022; v1 submitted 22 December, 2021; originally announced December 2021.

    Comments: CIKM 2022

  14. arXiv:1910.05810  [pdf, other

    cs.AI cs.CV

    Deep Crowd-Flow Prediction in Built Environments

    Authors: Samuel S. Sohn, Seonghyeon Moon, Honglu Zhou, Sejong Yoon, Vladimir Pavlovic, Mubbasir Kapadia

    Abstract: Predicting the behavior of crowds in complex environments is a key requirement in a multitude of application areas, including crowd and disaster management, architectural design, and urban planning. Given a crowd's immediate state, current approaches simulate crowd movement to arrive at a future state. However, most applications require the ability to predict hundreds of possible simulation outcom… ▽ More

    Submitted 13 October, 2019; originally announced October 2019.

  15. arXiv:1910.00767  [pdf

    cs.MA cs.AI

    Cognitive Agent Based Simulation Model For Improving Disaster Response Procedures

    Authors: Rohit K. Dubey, Samuel S. Sohn, Christoph Hoelscher, Mubbasir Kapadia

    Abstract: In the event of a disaster, saving human lives is of utmost importance. For developing proper evacuation procedures and guidance systems, behavioural data on how people respond during panic and stress is crucial. In the absence of real human data on building evacuation, there is a need for a crowd simulator to model egress and decision-making under uncertainty. In this paper, we propose an agent-b… ▽ More

    Submitted 1 October, 2019; originally announced October 2019.