Search | arXiv e-print repository

arXiv:2507.06261 [pdf, ps, other]

Gemini 2.5: Pushing the Frontier with Advanced Reasoning, Multimodality, Long Context, and Next Generation Agentic Capabilities

Authors: Gheorghe Comanici, Eric Bieber, Mike Schaekermann, Ice Pasupat, Noveen Sachdeva, Inderjit Dhillon, Marcel Blistein, Ori Ram, Dan Zhang, Evan Rosen, Luke Marris, Sam Petulla, Colin Gaffney, Asaf Aharoni, Nathan Lintz, Tiago Cardal Pais, Henrik Jacobsson, Idan Szpektor, Nan-Jiang Jiang, Krishna Haridasan, Ahmed Omran, Nikunj Saunshi, Dara Bahri, Gaurav Mishra, Eric Chu , et al. (3284 additional authors not shown)

Abstract: In this report, we introduce the Gemini 2.X model family: Gemini 2.5 Pro and Gemini 2.5 Flash, as well as our earlier Gemini 2.0 Flash and Flash-Lite models. Gemini 2.5 Pro is our most capable model yet, achieving SoTA performance on frontier coding and reasoning benchmarks. In addition to its incredible coding and reasoning skills, Gemini 2.5 Pro is a thinking model that excels at multimodal unde… ▽ More In this report, we introduce the Gemini 2.X model family: Gemini 2.5 Pro and Gemini 2.5 Flash, as well as our earlier Gemini 2.0 Flash and Flash-Lite models. Gemini 2.5 Pro is our most capable model yet, achieving SoTA performance on frontier coding and reasoning benchmarks. In addition to its incredible coding and reasoning skills, Gemini 2.5 Pro is a thinking model that excels at multimodal understanding and it is now able to process up to 3 hours of video content. Its unique combination of long context, multimodal and reasoning capabilities can be combined to unlock new agentic workflows. Gemini 2.5 Flash provides excellent reasoning abilities at a fraction of the compute and latency requirements and Gemini 2.0 Flash and Flash-Lite provide high performance at low latency and cost. Taken together, the Gemini 2.X model generation spans the full Pareto frontier of model capability vs cost, allowing users to explore the boundaries of what is possible with complex agentic problem solving. △ Less

Submitted 22 July, 2025; v1 submitted 7 July, 2025; originally announced July 2025.

Comments: 72 pages, 17 figures

arXiv:2504.13216 [pdf, other]

KFinEval-Pilot: A Comprehensive Benchmark Suite for Korean Financial Language Understanding

Authors: Bokwang Hwang, Seonkyu Lim, Taewoong Kim, Yongjae Geun, Sunghyun Bang, Sohyun Park, Jihyun Park, Myeonggyu Lee, Jinwoo Lee, Yerin Kim, Jinsun Yoo, Jingyeong Hong, Jina Park, Yongchan Kim, Suhyun Kim, Younggyun Hahm, Yiseul Lee, Yejee Kang, Chanhyuk Yoon, Chansu Lee, Heeyewon Jeong, Jiyeon Lee, Seonhye Gu, Hyebin Kang, Yousang Cho , et al. (2 additional authors not shown)

Abstract: We introduce KFinEval-Pilot, a benchmark suite specifically designed to evaluate large language models (LLMs) in the Korean financial domain. Addressing the limitations of existing English-centric benchmarks, KFinEval-Pilot comprises over 1,000 curated questions across three critical areas: financial knowledge, legal reasoning, and financial toxicity. The benchmark is constructed through a semi-au… ▽ More We introduce KFinEval-Pilot, a benchmark suite specifically designed to evaluate large language models (LLMs) in the Korean financial domain. Addressing the limitations of existing English-centric benchmarks, KFinEval-Pilot comprises over 1,000 curated questions across three critical areas: financial knowledge, legal reasoning, and financial toxicity. The benchmark is constructed through a semi-automated pipeline that combines GPT-4-generated prompts with expert validation to ensure domain relevance and factual accuracy. We evaluate a range of representative LLMs and observe notable performance differences across models, with trade-offs between task accuracy and output safety across different model families. These results highlight persistent challenges in applying LLMs to high-stakes financial applications, particularly in reasoning and safety. Grounded in real-world financial use cases and aligned with the Korean regulatory and linguistic context, KFinEval-Pilot serves as an early diagnostic tool for developing safer and more reliable financial AI systems. △ Less

Submitted 16 April, 2025; originally announced April 2025.

arXiv:2503.14035 [pdf, other]

A Revisit to the Decoder for Camouflaged Object Detection

Authors: Seung Woo Ko, Joopyo Hong, Suyoung Kim, Seungjai Bang, Sungzoon Cho, Nojun Kwak, Hyung-Sin Kim, Joonseok Lee

Abstract: Camouflaged object detection (COD) aims to generate a fine-grained segmentation map of camouflaged objects hidden in their background. Due to the hidden nature of camouflaged objects, it is essential for the decoder to be tailored to effectively extract proper features of camouflaged objects and extra-carefully generate their complex boundaries. In this paper, we propose a novel architecture that… ▽ More Camouflaged object detection (COD) aims to generate a fine-grained segmentation map of camouflaged objects hidden in their background. Due to the hidden nature of camouflaged objects, it is essential for the decoder to be tailored to effectively extract proper features of camouflaged objects and extra-carefully generate their complex boundaries. In this paper, we propose a novel architecture that augments the prevalent decoding strategy in COD with Enrich Decoder and Retouch Decoder, which help to generate a fine-grained segmentation map. Specifically, the Enrich Decoder amplifies the channels of features that are important for COD using channel-wise attention. Retouch Decoder further refines the segmentation maps by spatially attending to important pixels, such as the boundary regions. With extensive experiments, we demonstrate that ENTO shows superior performance using various encoders, with the two novel components playing their unique roles that are mutually complementary. △ Less

Submitted 18 March, 2025; originally announced March 2025.

Comments: Published in BMVC 2024, 13 pages, 7 figures (Appendix: 5 pages, 2 figures)

Journal ref: British Machine Vision Conference (BMVC) 2024

arXiv:2502.14541 [pdf, ps, other]

LLM-based User Profile Management for Recommender System

Authors: Seunghwan Bang, Hwanjun Song

Abstract: The rapid advancement of Large Language Models (LLMs) has opened new opportunities in recommender systems by enabling zero-shot recommendation without conventional training. Despite their potential, most existing works rely solely on users' purchase histories, leaving significant room for improvement by incorporating user-generated textual data, such as reviews and product descriptions. Addressing… ▽ More The rapid advancement of Large Language Models (LLMs) has opened new opportunities in recommender systems by enabling zero-shot recommendation without conventional training. Despite their potential, most existing works rely solely on users' purchase histories, leaving significant room for improvement by incorporating user-generated textual data, such as reviews and product descriptions. Addressing this gap, we propose PURE, a novel LLM-based recommendation framework that builds and maintains evolving user profiles by systematically extracting and summarizing key information from user reviews. PURE consists of three core components: a Review Extractor for identifying user preferences and key product features, a Profile Updater for refining and updating user profiles, and a Recommender for generating personalized recommendations using the most current profile. To evaluate PURE, we introduce a continuous sequential recommendation task that reflects real-world scenarios by adding reviews over time and updating predictions incrementally. Our experimental results on Amazon datasets demonstrate that PURE outperforms existing LLM-based methods, effectively leveraging long-term user information while managing token limitations. △ Less

Submitted 9 July, 2025; v1 submitted 20 February, 2025; originally announced February 2025.

Comments: Accepted GENNEXT@SIGIR'25 Workshop

arXiv:2411.16767 [pdf, other]

Background-Aware Defect Generation for Robust Industrial Anomaly Detection

Authors: Youngjae Cho, Gwangyeol Kim, Sirojbek Safarov, Seongdeok Bang, Jaewoo Park

Abstract: Detecting anomalies in industrial settings is challenging due to the scarcity of labeled anomalous data. Generative models can mitigate this issue by synthesizing realistic defect samples, but existing approaches often fail to model the crucial interplay between defects and their background. This oversight leads to unrealistic anomalies, especially in scenarios where contextual consistency is esse… ▽ More Detecting anomalies in industrial settings is challenging due to the scarcity of labeled anomalous data. Generative models can mitigate this issue by synthesizing realistic defect samples, but existing approaches often fail to model the crucial interplay between defects and their background. This oversight leads to unrealistic anomalies, especially in scenarios where contextual consistency is essential (i.e., logical anomaly). To address this, we propose a novel background-aware defect generation framework, where the background influences defect denoising without affecting the background itself by ensuring realistic synthesis while preserving structural integrity. Our method leverages a disentanglement loss to separate the background' s denoising process from the defect, enabling controlled defect synthesis through DDIM Inversion. We theoretically demonstrate that our approach maintains background fidelity while generating contextually accurate defects. Extensive experiments on MVTec AD and MVTec Loco benchmarks validate our mehtod's superiority over existing techniques in both defect generation quality and anomaly detection performance. △ Less

Submitted 28 February, 2025; v1 submitted 24 November, 2024; originally announced November 2024.

Comments: 16 pages

arXiv:2411.10947 [pdf, other]

Direct and Explicit 3D Generation from a Single Image

Authors: Haoyu Wu, Meher Gitika Karumuri, Chuhang Zou, Seungbae Bang, Yuelong Li, Dimitris Samaras, Sunil Hadap

Abstract: Current image-to-3D approaches suffer from high computational costs and lack scalability for high-resolution outputs. In contrast, we introduce a novel framework to directly generate explicit surface geometry and texture using multi-view 2D depth and RGB images along with 3D Gaussian features using a repurposed Stable Diffusion model. We introduce a depth branch into U-Net for efficient and high q… ▽ More Current image-to-3D approaches suffer from high computational costs and lack scalability for high-resolution outputs. In contrast, we introduce a novel framework to directly generate explicit surface geometry and texture using multi-view 2D depth and RGB images along with 3D Gaussian features using a repurposed Stable Diffusion model. We introduce a depth branch into U-Net for efficient and high quality multi-view, cross-domain generation and incorporate epipolar attention into the latent-to-pixel decoder for pixel-level multi-view consistency. By back-projecting the generated depth pixels into 3D space, we create a structured 3D representation that can be either rendered via Gaussian splatting or extracted to high-quality meshes, thereby leveraging additional novel view synthesis loss to further improve our performance. Extensive experiments demonstrate that our method surpasses existing baselines in geometry and texture quality while achieving significantly faster generation time. △ Less

Submitted 16 November, 2024; originally announced November 2024.

Comments: 3DV 2025, Project page: https://hao-yu-wu.github.io/gen3d/

arXiv:2409.10015 [pdf, other]

RPC: A Modular Framework for Robot Planning, Control, and Deployment

Authors: Seung Hyeon Bang, Carlos Gonzalez, Gabriel Moore, Dong Ho Kang, Mingyo Seo, Luis Sentis

Abstract: This paper presents an open-source, lightweight, yet comprehensive software framework, named RPC, which integrates physics-based simulators, planning and control libraries, debugging tools, and a user-friendly operator interface. RPC enables users to thoroughly evaluate and develop control algorithms for robotic systems. While existing software frameworks provide some of these capabilities, integr… ▽ More This paper presents an open-source, lightweight, yet comprehensive software framework, named RPC, which integrates physics-based simulators, planning and control libraries, debugging tools, and a user-friendly operator interface. RPC enables users to thoroughly evaluate and develop control algorithms for robotic systems. While existing software frameworks provide some of these capabilities, integrating them into a cohesive system can be challenging and cumbersome. To overcome this challenge, we have modularized each component in RPC to ensure easy and seamless integration or replacement with new modules. Additionally, our framework currently supports a variety of model-based planning and control algorithms for robotic manipulators and legged robots, alongside essential debugging tools, making it easier for users to design and execute complex robotics tasks. The code and usage instructions of RPC are available at https://github.com/shbang91/rpc. △ Less

Submitted 16 September, 2024; originally announced September 2024.

Comments: 7pages, 4 figures

arXiv:2407.17683 [pdf, other]

RL-augmented MPC Framework for Agile and Robust Bipedal Footstep Locomotion Planning and Control

Authors: Seung Hyeon Bang, Carlos Arribalzaga Jové, Luis Sentis

Abstract: This paper proposes an online bipedal footstep planning strategy that combines model predictive control (MPC) and reinforcement learning (RL) to achieve agile and robust bipedal maneuvers. While MPC-based foot placement controllers have demonstrated their effectiveness in achieving dynamic locomotion, their performance is often limited by the use of simplified models and assumptions. To address th… ▽ More This paper proposes an online bipedal footstep planning strategy that combines model predictive control (MPC) and reinforcement learning (RL) to achieve agile and robust bipedal maneuvers. While MPC-based foot placement controllers have demonstrated their effectiveness in achieving dynamic locomotion, their performance is often limited by the use of simplified models and assumptions. To address this challenge, we develop a novel foot placement controller that leverages a learned policy to bridge the gap between the use of a simplified model and the more complex full-order robot system. Specifically, our approach employs a unique combination of an ALIP-based MPC foot placement controller for sub-optimal footstep planning and the learned policy for refining footstep adjustments, enabling the resulting footstep policy to capture the robot's whole-body dynamics effectively. This integration synergizes the predictive capability of MPC with the flexibility and adaptability of RL. We validate the effectiveness of our framework through a series of experiments using the full-body humanoid robot DRACO 3. The results demonstrate significant improvements in dynamic locomotion performance, including better tracking of a wide range of walking speeds, enabling reliable turning and traversing challenging terrains while preserving the robustness and stability of the walking gaits compared to the baseline ALIP-based MPC approach. △ Less

Submitted 24 July, 2024; originally announced July 2024.

Comments: 8 pages, 7 figures

arXiv:2407.16811 [pdf, other]

Variable Inertia Model Predictive Control for Fast Bipedal Maneuvers

Authors: Seung Hyeon Bang, Jaemin Lee, Carlos Gonzalez, Luis Sentis

Abstract: This paper proposes a novel control framework for agile and robust bipedal locomotion, addressing model discrepancies between full-body and reduced-order models. Specifically, assumptions such as constant centroidal inertia have introduced significant challenges and limitations in locomotion tasks. To enhance the agility and versatility of full-body humanoid robots, we formalize a Model Predictive… ▽ More This paper proposes a novel control framework for agile and robust bipedal locomotion, addressing model discrepancies between full-body and reduced-order models. Specifically, assumptions such as constant centroidal inertia have introduced significant challenges and limitations in locomotion tasks. To enhance the agility and versatility of full-body humanoid robots, we formalize a Model Predictive Control (MPC) problem that accounts for the variable centroidal inertia of humanoid robots within a convex optimization framework, ensuring computational efficiency for real-time operations. In the proposed formulation, we incorporate a centroidal inertia network designed to predict the variable centroidal inertia over the MPC horizon, taking into account the swing foot trajectories -- an aspect often overlooked in ROM-based MPC frameworks. By integrating the MPC-based contact wrench planning with our low-level whole-body controller, we significantly improve the locomotion performance, achieving stable walking at higher velocities that are not attainable with the baseline method. The effectiveness of our proposed framework is validated through high-fidelity simulations using our full-body bipedal humanoid robot DRACO 3, demonstrating dynamic behaviors. △ Less

Submitted 14 September, 2024; v1 submitted 23 July, 2024; originally announced July 2024.

Comments: 8pages, 6figures

arXiv:2401.03123 [pdf, ps, other]

A least distance estimator for a multivariate regression model using deep neural networks

Authors: Jungmin Shin, Seung Jun Shin, Sungwan Bang

Abstract: We propose a deep neural network (DNN) based least distance (LD) estimator (DNN-LD) for a multivariate regression problem, addressing the limitations of the conventional methods. Due to the flexibility of a DNN structure, both linear and nonlinear conditional mean functions can be easily modeled, and a multivariate regression model can be realized by simply adding extra nodes at the output layer.… ▽ More We propose a deep neural network (DNN) based least distance (LD) estimator (DNN-LD) for a multivariate regression problem, addressing the limitations of the conventional methods. Due to the flexibility of a DNN structure, both linear and nonlinear conditional mean functions can be easily modeled, and a multivariate regression model can be realized by simply adding extra nodes at the output layer. The proposed method is more efficient in capturing the dependency structure among responses than the least squares loss, and robust to outliers. In addition, we consider $L_1$-type penalization for variable selection, crucial in analyzing high-dimensional data. Namely, we propose what we call (A)GDNN-LD estimator that enjoys variable selection and model estimation simultaneously, by applying the (adaptive) group Lasso penalty to weight parameters in the DNN structure. For the computation, we propose a quadratic smoothing approximation method to facilitate optimizing the non-smooth objective function based on the least distance loss. The simulation studies and a real data analysis demonstrate the promising performance of the proposed method. △ Less

Submitted 5 January, 2024; originally announced January 2024.

Comments: Submitted to 'Journal of Statistical Computation and Simulation'

arXiv:2311.14312 [pdf, other]

An Adaptive Fast-Multipole-Accelerated Hybrid Boundary Integral Equation Method for Accurate Diffusion Curves

Authors: Seungbae Bang, Kirill Serkh, Oded Stein, Alec Jacobson

Abstract: In theory, diffusion curves promise complex color gradations for infinite-resolution vector graphics. In practice, existing realizations suffer from poor scaling, discretization artifacts, or insufficient support for rich boundary conditions. Previous applications of the boundary element method to diffusion curves have relied on polygonal approximations, which either forfeit the high-order smoothn… ▽ More In theory, diffusion curves promise complex color gradations for infinite-resolution vector graphics. In practice, existing realizations suffer from poor scaling, discretization artifacts, or insufficient support for rich boundary conditions. Previous applications of the boundary element method to diffusion curves have relied on polygonal approximations, which either forfeit the high-order smoothness of Bézier curves, or, when the polygonal approximation is extremely detailed, result in large and costly systems of equations that must be solved. In this paper, we utilize the boundary integral equation method to accurately and efficiently solve the underlying partial differential equation. Given a desired resolution and viewport, we then interpolate this solution and use the boundary element method to render it. We couple this hybrid approach with the fast multipole method on a non-uniform quadtree for efficient computation. Furthermore, we introduce an adaptive strategy to enable truly scalable infinite-resolution diffusion curves. △ Less

Submitted 24 November, 2023; originally announced November 2023.

Comments: 28 pages, 22 figures

arXiv:2310.10893 [pdf, other]

Active Learning Framework for Cost-Effective TCR-Epitope Binding Affinity Prediction

Authors: Pengfei Zhang, Seojin Bang, Heewook Lee

Abstract: T cell receptors (TCRs) are critical components of adaptive immune systems, responsible for responding to threats by recognizing epitope sequences presented on host cell surface. Computational prediction of binding affinity between TCRs and epitope sequences using machine/deep learning has attracted intense attention recently. However, its success is hindered by the lack of large collections of an… ▽ More T cell receptors (TCRs) are critical components of adaptive immune systems, responsible for responding to threats by recognizing epitope sequences presented on host cell surface. Computational prediction of binding affinity between TCRs and epitope sequences using machine/deep learning has attracted intense attention recently. However, its success is hindered by the lack of large collections of annotated TCR-epitope pairs. Annotating their binding affinity requires expensive and time-consuming wet-lab evaluation. To reduce annotation cost, we present ActiveTCR, a framework that incorporates active learning and TCR-epitope binding affinity prediction models. Starting with a small set of labeled training pairs, ActiveTCR iteratively searches for unlabeled TCR-epitope pairs that are ''worth'' for annotation. It aims to maximize performance gains while minimizing the cost of annotation. We compared four query strategies with a random sampling baseline and demonstrated that ActiveTCR reduces annotation costs by approximately 40%. Furthermore, we showed that providing ground truth labels of TCR-epitope pairs to query strategies can help identify and reduce more than 40% redundancy among already annotated pairs without compromising model performance, enabling users to train equally powerful prediction models with less training data. Our work is the first systematic investigation of data optimization for TCR-epitope binding affinity prediction. △ Less

Submitted 30 October, 2023; v1 submitted 16 October, 2023; originally announced October 2023.

Comments: 10 pages, 7 figures, this paper has been accepted for publication in the proceedings of the IEEE International Conference on Bioinformatics and Biomedicine (BIBM) 2023

arXiv:2309.01952 [pdf, other]

Deep Imitation Learning for Humanoid Loco-manipulation through Human Teleoperation

Authors: Mingyo Seo, Steve Han, Kyutae Sim, Seung Hyeon Bang, Carlos Gonzalez, Luis Sentis, Yuke Zhu

Abstract: We tackle the problem of developing humanoid loco-manipulation skills with deep imitation learning. The difficulty of collecting task demonstrations and training policies for humanoids with a high degree of freedom presents substantial challenges. We introduce TRILL, a data-efficient framework for training humanoid loco-manipulation policies from human demonstrations. In this framework, we collect… ▽ More We tackle the problem of developing humanoid loco-manipulation skills with deep imitation learning. The difficulty of collecting task demonstrations and training policies for humanoids with a high degree of freedom presents substantial challenges. We introduce TRILL, a data-efficient framework for training humanoid loco-manipulation policies from human demonstrations. In this framework, we collect human demonstration data through an intuitive Virtual Reality (VR) interface. We employ the whole-body control formulation to transform task-space commands by human operators into the robot's joint-torque actuation while stabilizing its dynamics. By employing high-level action abstractions tailored for humanoid loco-manipulation, our method can efficiently learn complex sensorimotor skills. We demonstrate the effectiveness of TRILL in simulation and on a real-world robot for performing various loco-manipulation tasks. Videos and additional materials can be found on the project page: https://ut-austin-rpl.github.io/TRILL. △ Less

Submitted 19 November, 2023; v1 submitted 5 September, 2023; originally announced September 2023.

Comments: Accepted to Humanoids 2023

arXiv:2210.00961 [pdf]

Control and Evaluation of a Humanoid Robot with Rolling Contact Knees

Authors: Seung Hyeon Bang, Carlos Gonzalez, Junhyeok Ahn, Nicholas Paine, Luis Sentis

Abstract: In this paper, we introduce the humanoid robot DRACO 3 by providing a high-level description of its design and control. This robot features proximal actuation and mechanical artifacts to provide a high range of hip, knee and ankle motion. Its versatile design brings interesting problems as it requires a more elaborate control system to perform its motions. For this reason, we introduce a whole bod… ▽ More In this paper, we introduce the humanoid robot DRACO 3 by providing a high-level description of its design and control. This robot features proximal actuation and mechanical artifacts to provide a high range of hip, knee and ankle motion. Its versatile design brings interesting problems as it requires a more elaborate control system to perform its motions. For this reason, we introduce a whole body controller (WBC) with support for rolling contact joints and show how it can be easily integrated into our previously presented open-source Planning and Control (PnC) framework. We then validate our controller experimentally on DRACO 3 by showing preliminary results carrying out two postural tasks. Lastly, we analyze the impact of the proximal actuation design and show where it stands in comparison to other adult-size humanoids. △ Less

Submitted 3 October, 2022; originally announced October 2022.

arXiv:2206.09667 [pdf, other]

MSANet: Multi-Similarity and Attention Guidance for Boosting Few-Shot Segmentation

Authors: Ehtesham Iqbal, Sirojbek Safarov, Seongdeok Bang

Abstract: Few-shot segmentation aims to segment unseen-class objects given only a handful of densely labeled samples. Prototype learning, where the support feature yields a singleor several prototypes by averaging global and local object information, has been widely used in FSS. However, utilizing only prototype vectors may be insufficient to represent the features for all training data. To extract abundant… ▽ More Few-shot segmentation aims to segment unseen-class objects given only a handful of densely labeled samples. Prototype learning, where the support feature yields a singleor several prototypes by averaging global and local object information, has been widely used in FSS. However, utilizing only prototype vectors may be insufficient to represent the features for all training data. To extract abundant features and make more precise predictions, we propose a Multi-Similarity and Attention Network (MSANet) including two novel modules, a multi-similarity module and an attention module. The multi-similarity module exploits multiple feature-maps of support images and query images to estimate accurate semantic relationships. The attention module instructs the network to concentrate on class-relevant information. The network is tested on standard FSS datasets, PASCAL-5i 1-shot, PASCAL-5i 5-shot, COCO-20i 1-shot, and COCO-20i 5-shot. The MSANet with the backbone of ResNet-101 achieves the state-of-the-art performance for all 4-benchmark datasets with mean intersection over union (mIoU) of 69.13%, 73.99%, 51.09%, 56.80%, respectively. Code is available at https://github.com/AIVResearch/MSANet △ Less

Submitted 20 June, 2022; originally announced June 2022.

arXiv:2206.00244 [pdf, other]

Fair Comparison between Efficient Attentions

Authors: Jiuk Hong, Chaehyeon Lee, Soyoun Bang, Heechul Jung

Abstract: Transformers have been successfully used in various fields and are becoming the standard tools in computer vision. However, self-attention, a core component of transformers, has a quadratic complexity problem, which limits the use of transformers in various vision tasks that require dense prediction. Many studies aiming at solving this problem have been reported proposed. However, no comparative s… ▽ More Transformers have been successfully used in various fields and are becoming the standard tools in computer vision. However, self-attention, a core component of transformers, has a quadratic complexity problem, which limits the use of transformers in various vision tasks that require dense prediction. Many studies aiming at solving this problem have been reported proposed. However, no comparative study of these methods using the same scale has been reported due to different model configurations, training schemes, and new methods. In our paper, we validate these efficient attention models on the ImageNet1K classification task by changing only the attention operation and examining which efficient attention is better. △ Less

Submitted 1 June, 2022; originally announced June 2022.

Comments: 4 pages abstract

arXiv:2202.12399 [pdf, other]

Data-Driven Safety Verification for Legged Robots

Authors: Junhyeok Ahn, Seung Hyeon Bang, Carlos Gonzalez, Yuanchen Yuan, Luis Sentis

Abstract: Planning safe motions for legged robots requires sophisticated safety verification tools. However, designing such tools for such complex systems is challenging due to the nonlinear and high-dimensional nature of these systems' dynamics. In this letter, we present a probabilistic verification framework for legged systems, which evaluates the safety of planned trajectories by learning an assessment… ▽ More Planning safe motions for legged robots requires sophisticated safety verification tools. However, designing such tools for such complex systems is challenging due to the nonlinear and high-dimensional nature of these systems' dynamics. In this letter, we present a probabilistic verification framework for legged systems, which evaluates the safety of planned trajectories by learning an assessment function from trajectories collected from a closed-loop system. Our approach does not require an analytic expression of the closed-loop dynamics, thus enabling safety verification of systems with complex models and controllers. Our framework consists of an offline stage that initializes a safety assessment function by simulating a nominal model and an online stage that adapts the function to address the sim-to-real gap. The performance of the proposed approach for safety verification is demonstrated using a quadruped balancing task and a humanoid reaching task. The results demonstrate that our framework accurately predicts the systems' safety both at the planning phase to generate robust trajectories and at execution phase to detect unexpected external disturbances. △ Less

Submitted 24 February, 2022; originally announced February 2022.

Comments: 8 pages, 8 figures, submitted to RA-L with IROS option

arXiv:2106.05161 [pdf, other]

doi 10.1145/3450626.3459769

Interactive Modelling of Volumetric Musculoskeletal Anatomy

Authors: Rinat Abdrashitov, Seungbae Bang, David I. W. Levin, Karan Singh, Alec Jacobson

Abstract: We present a new approach for modelling musculoskeletal anatomy. Unlike previous methods, we do not model individual muscle shapes as geometric primitives (polygonal meshes, NURBS etc.). Instead, we adopt a volumetric segmentation approach where every point in our volume is assigned to a muscle, fat, or bone tissue. We provide an interactive modelling tool where the user controls the segmentation… ▽ More We present a new approach for modelling musculoskeletal anatomy. Unlike previous methods, we do not model individual muscle shapes as geometric primitives (polygonal meshes, NURBS etc.). Instead, we adopt a volumetric segmentation approach where every point in our volume is assigned to a muscle, fat, or bone tissue. We provide an interactive modelling tool where the user controls the segmentation via muscle curves and we visualize the muscle shapes using volumetric rendering. Muscle curves enable intuitive yet powerful control over the muscle shapes. This representation allows us to automatically handle intersections between different tissues (musclemuscle, muscle-bone, and muscle-skin) during the modelling and automates computation of muscle fiber fields. We further introduce a novel algorithm for converting the volumetric muscle representation into tetrahedral or surface geometry for use in downstream tasks. Additionally, we introduce an interactive skeleton authoring tool that allows the users to create skeletal anatomy starting from only a skin mesh using a library of bone parts. △ Less

Submitted 9 June, 2021; originally announced June 2021.

Comments: 13 pages, 20 figures, SIGGRAPH 2021

Journal ref: ACM Trans. Graph., Vol. 40, No. 4, Article 122. Publication date: August 2021

arXiv:2105.07571 [pdf, other]

Classifying Argumentative Relations Using Logical Mechanisms and Argumentation Schemes

Authors: Yohan Jo, Seojin Bang, Chris Reed, Eduard Hovy

Abstract: While argument mining has achieved significant success in classifying argumentative relations between statements (support, attack, and neutral), we have a limited computational understanding of logical mechanisms that constitute those relations. Most recent studies rely on black-box models, which are not as linguistically insightful as desired. On the other hand, earlier studies use rather simple… ▽ More While argument mining has achieved significant success in classifying argumentative relations between statements (support, attack, and neutral), we have a limited computational understanding of logical mechanisms that constitute those relations. Most recent studies rely on black-box models, which are not as linguistically insightful as desired. On the other hand, earlier studies use rather simple lexical features, missing logical relations between statements. To overcome these limitations, our work classifies argumentative relations based on four logical and theory-informed mechanisms between two statements, namely (i) factual consistency, (ii) sentiment coherence, (iii) causal relation, and (iv) normative relation. We demonstrate that our operationalization of these logical mechanisms classifies argumentative relations without directly training on data labeled with the relations, significantly better than several unsupervised baselines. We further demonstrate that these mechanisms also improve supervised classifiers through representation learning. △ Less

Submitted 16 May, 2021; originally announced May 2021.

Comments: To Appear in TACL 2021

arXiv:2010.02660 [pdf, other]

Detecting Attackable Sentences in Arguments

Authors: Yohan Jo, Seojin Bang, Emaad Manzoor, Eduard Hovy, Chris Reed

Abstract: Finding attackable sentences in an argument is the first step toward successful refutation in argumentation. We present a first large-scale analysis of sentence attackability in online arguments. We analyze driving reasons for attacks in argumentation and identify relevant characteristics of sentences. We demonstrate that a sentence's attackability is associated with many of these characteristics… ▽ More Finding attackable sentences in an argument is the first step toward successful refutation in argumentation. We present a first large-scale analysis of sentence attackability in online arguments. We analyze driving reasons for attacks in argumentation and identify relevant characteristics of sentences. We demonstrate that a sentence's attackability is associated with many of these characteristics regarding the sentence's content, proposition types, and tone, and that an external knowledge source can provide useful information about attackability. Building on these findings, we demonstrate that machine learning models can automatically detect attackable sentences in arguments, significantly better than several baselines and comparably well to laypeople. △ Less

Submitted 6 October, 2020; originally announced October 2020.

Comments: EMNLP 2020

arXiv:2009.05891 [pdf, other]

MPC-Based Hierarchical Task Space Control of Underactuated and Constrained Robots for Execution of Multiple Tasks

Authors: Jaemin Lee, Seung Hyeon Bang, Efstathios Bakolas, Luis Sentis

Abstract: This paper proposes an MPC-based controller to efficiently execute multiple hierarchical tasks for underactuated and constrained robotic systems. Existing task-space controllers or whole-body controllers solve instantaneous optimization problems given task trajectories and the robot plant dynamics. However, the task-space control method we propose here relies on the prediction of future state traj… ▽ More This paper proposes an MPC-based controller to efficiently execute multiple hierarchical tasks for underactuated and constrained robotic systems. Existing task-space controllers or whole-body controllers solve instantaneous optimization problems given task trajectories and the robot plant dynamics. However, the task-space control method we propose here relies on the prediction of future state trajectories and the corresponding costs-to-go terms over a finite time-horizon for computing control commands. We employ acceleration energy error as the performance index for the optimization problem and extend it over the finite-time horizon of our MPC. Our approach employs quadratically constrained quadratic programming, which includes quadratic constraints to handle multiple hierarchical tasks, and is computationally more efficient than nonlinear MPC-based approaches that rely on nonlinear programming. We validate our approach using numerical simulations of a new type of robot manipulator system, which contains underactuated and constrained mechanical structures. △ Less

Submitted 12 September, 2020; originally announced September 2020.

Comments: 8 pages, 5 figures

arXiv:2009.02462 [pdf, other]

doi 10.1145/3414685.3417819

Complementary Dynamics

Authors: Jiayi Eris Zhang, Seungbae Bang, David I. W. Levin, Alec Jacobson

Abstract: We present a novel approach to enrich arbitrary rig animations with elastodynamic secondary effects. Unlike previous methods which pit rig displacements and physical forces as adversaries against each other, we advocate that physics should complement artists intentions. We propose optimizing for elastodynamic displacements in the subspace orthogonal to displacements that can be created by the rig.… ▽ More We present a novel approach to enrich arbitrary rig animations with elastodynamic secondary effects. Unlike previous methods which pit rig displacements and physical forces as adversaries against each other, we advocate that physics should complement artists intentions. We propose optimizing for elastodynamic displacements in the subspace orthogonal to displacements that can be created by the rig. This ensures that the additional dynamic motions do not undo the rig animation. The complementary space is high dimensional, algebraically constructed without manual oversight, and capable of rich high-frequency dynamics. Unlike prior tracking methods, we do not require extra painted weights, segmentation into fixed and free regions or tracking clusters. Our method is agnostic to the physical model and plugs into non-linear FEM simulations, geometric as-rigid-as-possible energies, or mass-spring models. Our method does not require a particular type of rig and adds secondary effects to skeletal animations, cage-based deformations, wire deformers, motion capture data, and rigid-body simulations. △ Less

Submitted 5 September, 2020; originally announced September 2020.

Comments: 11 pages, 16 figures, ACM SIGGRAPH ASIA 2020

arXiv:2002.01598 [pdf, other]

Dropout Prediction over Weeks in MOOCs via Interpretable Multi-Layer Representation Learning

Authors: Byungsoo Jeon, Namyong Park, Seojin Bang

Abstract: Massive Open Online Courses (MOOCs) have become popular platforms for online learning. While MOOCs enable students to study at their own pace, this flexibility makes it easy for students to drop out of class. In this paper, our goal is to predict if a learner is going to drop out within the next week, given clickstream data for the current week. To this end, we present a multi-layer representation… ▽ More Massive Open Online Courses (MOOCs) have become popular platforms for online learning. While MOOCs enable students to study at their own pace, this flexibility makes it easy for students to drop out of class. In this paper, our goal is to predict if a learner is going to drop out within the next week, given clickstream data for the current week. To this end, we present a multi-layer representation learning solution based on branch and bound (BB) algorithm, which learns from low-level clickstreams in an unsupervised manner, produces interpretable results, and avoids manual feature engineering. In experiments on Coursera data, we show that our model learns a representation that allows a simple model to perform similarly well to more complex, task-specific models, and how the BB algorithm enables interpretable results. In our analysis of the observed limitations, we discuss promising future directions. △ Less

Submitted 4 February, 2020; originally announced February 2020.

Comments: Accepted at AAAI 2020 AI4Edu Workshop

arXiv:1906.03811 [pdf, other]

doi 10.1109/Humanoids43949.2019.9035023

Control of A High Performance Bipedal Robot using Viscoelastic Liquid Cooled Actuators

Authors: Junhyeok Ahn, Donghyun Kim, SeungHyeon Bang, Nick Paine, Luis Sentis

Abstract: This paper describes the control, and evaluation of a new human-scaled biped robot with liquid cooled viscoelastic actuators (VLCA). Based on the lessons learned from previous work from our team on VLCA [1], we present a new system design embodying a Reaction Force Sensing Series Elastic Actuator (RFSEA) and a Force Sensing Series Elastic Actuator (FSEA). These designs are aimed at reducing the si… ▽ More This paper describes the control, and evaluation of a new human-scaled biped robot with liquid cooled viscoelastic actuators (VLCA). Based on the lessons learned from previous work from our team on VLCA [1], we present a new system design embodying a Reaction Force Sensing Series Elastic Actuator (RFSEA) and a Force Sensing Series Elastic Actuator (FSEA). These designs are aimed at reducing the size and weight of the robot's actuation system while inheriting the advantages of our designs such as energy efficiency, torque density, impact resistance and position/force controllability. The system design takes into consideration human-inspired kinematics and range-of-motion (ROM), while relying on foot placement to balance. In terms of actuator control, we perform a stability analysis on a Disturbance Observer (DOB) designed for force control. We then evaluate various position control algorithms both in the time and frequency domains for our VLCA actuators. Having the low level baseline established, we first perform a controller evaluation on the legs using Operational Space Control (OSC) [2]. Finally, we move on to evaluating the full bipedal robot by accomplishing unsupported dynamic walking by means of the algorithms to appear in [3]. △ Less

Submitted 19 September, 2019; v1 submitted 10 June, 2019; originally announced June 2019.

Comments: 8 pages, 8 figures

arXiv:1902.06918 [pdf, other]

Explaining a black-box using Deep Variational Information Bottleneck Approach

Authors: Seojin Bang, Pengtao Xie, Heewook Lee, Wei Wu, Eric Xing

Abstract: Interpretable machine learning has gained much attention recently. Briefness and comprehensiveness are necessary in order to provide a large amount of information concisely when explaining a black-box decision system. However, existing interpretable machine learning methods fail to consider briefness and comprehensiveness simultaneously, leading to redundant explanations. We propose the variationa… ▽ More Interpretable machine learning has gained much attention recently. Briefness and comprehensiveness are necessary in order to provide a large amount of information concisely when explaining a black-box decision system. However, existing interpretable machine learning methods fail to consider briefness and comprehensiveness simultaneously, leading to redundant explanations. We propose the variational information bottleneck for interpretation, VIBI, a system-agnostic interpretable method that provides a brief but comprehensive explanation. VIBI adopts an information theoretic principle, information bottleneck principle, as a criterion for finding such explanations. For each instance, VIBI selects key features that are maximally compressed about an input (briefness), and informative about a decision made by a black-box system on that input (comprehensive). We evaluate VIBI on three datasets and compare with state-of-the-art interpretable machine learning methods in terms of both interpretability and fidelity evaluated by human and quantitative metrics △ Less

Submitted 3 October, 2019; v1 submitted 19 February, 2019; originally announced February 2019.

arXiv:1803.02458 [pdf, other]

Robust Multiple Kernel k-means Clustering using Min-Max Optimization

Authors: Seojin Bang, Yaoliang Yu, Wei Wu

Abstract: Multiple kernel learning is a type of multiview learning that combines different data modalities by capturing view-specific patterns using kernels. Although supervised multiple kernel learning has been extensively studied, until recently, only a few unsupervised approaches have been proposed. In the meanwhile, adversarial learning has recently received much attention. Many works have been proposed… ▽ More Multiple kernel learning is a type of multiview learning that combines different data modalities by capturing view-specific patterns using kernels. Although supervised multiple kernel learning has been extensively studied, until recently, only a few unsupervised approaches have been proposed. In the meanwhile, adversarial learning has recently received much attention. Many works have been proposed to defend against adversarial examples. However, little is known about the effect of adversarial perturbation in the context of multiview learning, and even less in the unsupervised case. In this study, we show that adversarial features added to a view can make the existing approaches with the min-max formulation in multiple kernel clustering yield unfavorable clusters. To address this problem and inspired by recent works in adversarial learning, we propose a multiple kernel clustering method with the min-max framework that aims to be robust to such adversarial perturbation. We evaluate the robustness of our method on simulation data under different types of adversarial perturbations and show that it outperforms several compared existing methods. In the real data analysis, We demonstrate the utility of our method on a real-world problem. △ Less

Submitted 10 September, 2019; v1 submitted 6 March, 2018; originally announced March 2018.

Comments: R package is available at https://github.com/SeojinBang/MKKC

arXiv:cs/0111018 [pdf]

Data Acquisition and Database Management System for Samsung Superconductor Test Facility

Authors: Y. Chu, S. Baek, H. Yonekawa, A. Chertovskikh, M. Kim, J. S. Kim, K. Park, S. Baang, Y. Chang, J. H. Kim, S. Lee, B. Lim, W. Chung, H. Park, K. Kim

Abstract: In order to fulfill the test requirement of KSTAR (Korea Superconducting Tokamak Advanced Research) superconducting magnet system, a large scale superconducting magnet and conductor test facility, SSTF (Samsung Superconductor Test Facility), has been constructed at Samsung Advanced Institute of Technology. The computer system for SSTF DAC (Data Acquisition and Control) is based on UNIX system an… ▽ More In order to fulfill the test requirement of KSTAR (Korea Superconducting Tokamak Advanced Research) superconducting magnet system, a large scale superconducting magnet and conductor test facility, SSTF (Samsung Superconductor Test Facility), has been constructed at Samsung Advanced Institute of Technology. The computer system for SSTF DAC (Data Acquisition and Control) is based on UNIX system and VxWorks is used for the real-time OS of the VME system. EPICS (Experimental Physics and Industrial Control System) is used for the communication between IOC server and client. A database program has been developed for the efficient management of measured data and a Linux workstation with PENTIUM-4 CPU is used for the database server. In this paper, the current status of SSTF DAC system, the database management system and recent test results are presented. △ Less

Submitted 8 November, 2001; originally announced November 2001.

Comments: 3 pages, 3 figures, ICALEPCS 2001

ACM Class: B.1.1

Journal ref: eConf C011127 (2001) TUAP018

Showing 1–27 of 27 results for author: Bang, S