Skip to main content

Showing 1–29 of 29 results for author: Hara, K

Searching in archive cs. Search in all archives.
.
  1. arXiv:2503.11331  [pdf, other

    cs.LG cs.AI cs.CV

    Cardiomyopathy Diagnosis Model from Endomyocardial Biopsy Specimens: Appropriate Feature Space and Class Boundary in Small Sample Size Data

    Authors: Masaya Mori, Yuto Omae, Yutaka Koyama, Kazuyuki Hara, Jun Toyotani, Yasuo Okumura, Hiroyuki Hao

    Abstract: As the number of patients with heart failure increases, machine learning (ML) has garnered attention in cardiomyopathy diagnosis, driven by the shortage of pathologists. However, endomyocardial biopsy specimens are often small sample size and require techniques such as feature extraction and dimensionality reduction. This study aims to determine whether texture features are effective for feature e… ▽ More

    Submitted 14 March, 2025; originally announced March 2025.

  2. Can masking background and object reduce static bias for zero-shot action recognition?

    Authors: Takumi Fukuzawa, Kensho Hara, Hirokatsu Kataoka, Toru Tamaki

    Abstract: In this paper, we address the issue of static bias in zero-shot action recognition. Action recognition models need to represent the action itself, not the appearance. However, some fully-supervised works show that models often rely on static appearances, such as the background and objects, rather than human actions. This issue, known as static bias, has not been investigated for zero-shot. Althoug… ▽ More

    Submitted 22 January, 2025; originally announced January 2025.

    Comments: In proc. of MMM2025

    Journal ref: MMM2025

  3. arXiv:2501.09278  [pdf, other

    cs.CV

    Text-guided Synthetic Geometric Augmentation for Zero-shot 3D Understanding

    Authors: Kohei Torimi, Ryosuke Yamada, Daichi Otsuka, Kensho Hara, Yuki M. Asano, Hirokatsu Kataoka, Yoshimitsu Aoki

    Abstract: Zero-shot recognition models require extensive training data for generalization. However, in zero-shot 3D classification, collecting 3D data and captions is costly and laborintensive, posing a significant barrier compared to 2D vision. Recent advances in generative models have achieved unprecedented realism in synthetic data production, and recent research shows the potential for using generated d… ▽ More

    Submitted 17 January, 2025; v1 submitted 15 January, 2025; originally announced January 2025.

  4. arXiv:2409.13535  [pdf, other

    cs.CV

    Formula-Supervised Visual-Geometric Pre-training

    Authors: Ryosuke Yamada, Kensho Hara, Hirokatsu Kataoka, Koshi Makihara, Nakamasa Inoue, Rio Yokota, Yutaka Satoh

    Abstract: Throughout the history of computer vision, while research has explored the integration of images (visual) and point clouds (geometric), many advancements in image and 3D object recognition have tended to process these modalities separately. We aim to bridge this divide by integrating images and point clouds on a unified transformer model. This approach integrates the modality-specific properties o… ▽ More

    Submitted 20 September, 2024; originally announced September 2024.

    Comments: Accepted to ECCV2024

  5. Audio Description Customization

    Authors: Rosiana Natalie, Ruei-Che Chang, Smitha Sheshadri, Anhong Guo, Kotaro Hara

    Abstract: Blind and low-vision (BLV) people use audio descriptions (ADs) to access videos. However, current ADs are unalterable by end users, thus are incapable of supporting BLV individuals' potentially diverse needs and preferences. This research investigates if customizing AD could improve how BLV individuals consume videos. We conducted an interview study (Study 1) with fifteen BLV participants, which r… ▽ More

    Submitted 21 August, 2024; originally announced August 2024.

    Comments: ASSETS 2024

  6. arXiv:2408.11347  [pdf, other

    cs.AI

    Multimodal Datasets and Benchmarks for Reasoning about Dynamic Spatio-Temporality in Everyday Environments

    Authors: Takanori Ugai, Kensho Hara, Shusaku Egami, Ken Fukuda

    Abstract: We used a 3D simulator to create artificial video data with standardized annotations, aiming to aid in the development of Embodied AI. Our question answering (QA) dataset measures the extent to which a robot can understand human behavior and the environment in a home setting. Preliminary experiments suggest our dataset is useful in measuring AI's comprehension of daily life. \end{abstract}

    Submitted 16 September, 2024; v1 submitted 21 August, 2024; originally announced August 2024.

    Comments: 5 pages, 1 figure, 1 table, accepted in Embodied AI 2024 Workshop held in conjunction with CVPR 2024

  7. How People Prompt to Create Interactive VR Scenes

    Authors: Setareh Aghel Manesh, Tianyi Zhang, Yuki Onishi, Kotaro Hara, Scott Bateman, Jiannan Li, Anthony Tang

    Abstract: Generative AI tools can provide people with the ability to create virtual environments and scenes with natural language prompts. Yet, how people will formulate such prompts is unclear -- particularly when they inhabit the environment that they are designing. For instance, it is likely that a person might say, "Put a chair here", while pointing at a location. If such linguistic features are common… ▽ More

    Submitted 29 May, 2024; v1 submitted 16 February, 2024; originally announced February 2024.

    Comments: Accepted at ACM 2024 Designing Interactive Systems (DIS)

  8. arXiv:2401.09323  [pdf, other

    cs.LG

    BENO: Boundary-embedded Neural Operators for Elliptic PDEs

    Authors: Haixin Wang, Jiaxin Li, Anubhav Dwivedi, Kentaro Hara, Tailin Wu

    Abstract: Elliptic partial differential equations (PDEs) are a major class of time-independent PDEs that play a key role in many scientific and engineering domains such as fluid dynamics, plasma physics, and solid mechanics. Recently, neural operators have emerged as a promising technique to solve elliptic PDEs more efficiently by directly mapping the input to solutions. However, existing networks typically… ▽ More

    Submitted 17 January, 2024; originally announced January 2024.

    Comments: Accepted by ICLR 2024

  9. arXiv:2312.10737  [pdf, other

    cs.CV cs.RO

    Traffic Incident Database with Multiple Labels Including Various Perspective Environmental Information

    Authors: Shota Nishiyama, Takuma Saito, Ryo Nakamura, Go Ohtani, Hirokatsu Kataoka, Kensho Hara

    Abstract: A large dataset of annotated traffic accidents is necessary to improve the accuracy of traffic accident recognition using deep learning models. Conventional traffic accident datasets provide annotations on traffic accidents and other teacher labels, improving traffic accident recognition performance. However, the labels annotated in conventional datasets need to be more comprehensive to describe t… ▽ More

    Submitted 19 December, 2023; v1 submitted 17 December, 2023; originally announced December 2023.

    Comments: Conference paper accepted to IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2023 Reason for revision: Corrected due to a missing space between sentences in the preview's abstract, which led to an unintended URL interpretation

  10. arXiv:2309.14759  [pdf, other

    cs.GR cs.CV

    Diffusion-based Holistic Texture Rectification and Synthesis

    Authors: Guoqing Hao, Satoshi Iizuka, Kensho Hara, Edgar Simo-Serra, Hirokatsu Kataoka, Kazuhiro Fukui

    Abstract: We present a novel framework for rectifying occlusions and distortions in degraded texture samples from natural images. Traditional texture synthesis approaches focus on generating textures from pristine samples, which necessitate meticulous preparation by humans and are often unattainable in most natural images. These challenges stem from the frequent occlusions and distortions of texture samples… ▽ More

    Submitted 26 September, 2023; originally announced September 2023.

    Comments: SIGGRAPH Asia 2023 Conference Paper

  11. arXiv:2308.04535  [pdf, other

    cs.CV

    Estimation of Human Condition at Disaster Site Using Aerial Drone Images

    Authors: Tomoki Arai, Kenji Iwata, Kensho Hara, Yutaka Satoh

    Abstract: Drones are being used to assess the situation in various disasters. In this study, we investigate a method to automatically estimate the damage status of people based on their actions in aerial drone images in order to understand disaster sites faster and save labor. We constructed a new dataset of aerial images of human actions in a hypothetical disaster that occurred in an urban area, and classi… ▽ More

    Submitted 8 August, 2023; originally announced August 2023.

    Comments: In submission to the ICCV 2023 Artificial Intelligence for Humanitarian Assistance and Disaster Response Workshop

  12. Effectiveness of the COVID-19 Contact-Confirming Application (COCOA) based on a Multi Agent Simulation

    Authors: Yuto Omae, Jun Toyotani, Kazuyuki Hara, Yasuhiro Gon, Hirotaka Takahashi

    Abstract: As of Aug. 2020, coronavirus disease 2019 (COVID-19) is still spreading in the world. In Japan, the Ministry of Health, Labor, and Welfare developed "COVID-19 Contact-Confirming Application (COCOA)," which was released on Jun. 19, 2020. By utilizing COCOA, users can know whether or not they had contact with infected persons. If those who had contact with infectors keep staying at home, they may no… ▽ More

    Submitted 30 August, 2020; originally announced August 2020.

    Comments: 10 pages, 7 figures

    ACM Class: I.2.11; I.6.3; I.6.5; I.6.6

    Journal ref: JACIII, 2022

  13. arXiv:2005.09183  [pdf, other

    cs.CV cs.CL cs.IR

    Retrieving and Highlighting Action with Spatiotemporal Reference

    Authors: Seito Kasai, Yuchi Ishikawa, Masaki Hayashi, Yoshimitsu Aoki, Kensho Hara, Hirokatsu Kataoka

    Abstract: In this paper, we present a framework that jointly retrieves and spatiotemporally highlights actions in videos by enhancing current deep cross-modal retrieval methods. Our work takes on the novel task of action highlighting, which visualizes where and when actions occur in an untrimmed video setting. Action highlighting is a fine-grained task, compared to conventional action recognition tasks whic… ▽ More

    Submitted 18 May, 2020; originally announced May 2020.

    Comments: Accepted to ICIP 2020

  14. arXiv:2004.04968  [pdf, other

    cs.CV

    Would Mega-scale Datasets Further Enhance Spatiotemporal 3D CNNs?

    Authors: Hirokatsu Kataoka, Tenga Wakamiya, Kensho Hara, Yutaka Satoh

    Abstract: How can we collect and use a video dataset to further improve spatiotemporal 3D Convolutional Neural Networks (3D CNNs)? In order to positively answer this open question in video recognition, we have conducted an exploration study using a couple of large-scale video datasets and 3D CNNs. In the early era of deep neural networks, 2D CNNs have been better than 3D CNNs in the context of video recogni… ▽ More

    Submitted 10 April, 2020; originally announced April 2020.

    Comments: Codes and pre-trained models are publicly available: https://github.com/kenshohara/3D-ResNets-PyTorch

  15. arXiv:1812.07045  [pdf, other

    cs.CV cs.LG

    EventNet: Asynchronous Recursive Event Processing

    Authors: Yusuke Sekikawa, Kosuke Hara, Hideo Saito

    Abstract: Event cameras are bio-inspired vision sensors that mimic retinas to asynchronously report per-pixel intensity changes rather than outputting an actual intensity image at regular intervals. This new paradigm of image sensor offers significant potential advantages; namely, sparse and non-redundant data representation. Unfortunately, however, most of the existing artificial neural network architectur… ▽ More

    Submitted 1 April, 2019; v1 submitted 7 December, 2018; originally announced December 2018.

  16. arXiv:1712.05796  [pdf

    cs.CY cs.HC

    A Data-Driven Analysis of Workers' Earnings on Amazon Mechanical Turk

    Authors: Kotaro Hara, Abi Adams, Kristy Milland, Saiph Savage, Chris Callison-Burch, Jeffrey Bigham

    Abstract: A growing number of people are working as part of on-line crowd work, which has been characterized by its low wages; yet, we know little about wage distribution and causes of low/high earnings. We recorded 2,676 workers performing 3.8 million tasks on Amazon Mechanical Turk. Our task-level analysis revealed that workers earned a median hourly wage of only ~\$2/h, and only 4% earned more than \$7.2… ▽ More

    Submitted 28 December, 2017; v1 submitted 14 December, 2017; originally announced December 2017.

    Comments: Conditionally accepted for inclusion in the 2018 ACM Conference on Human Factors in Computing Systems (CHI'18) Papers program

  17. arXiv:1711.09577  [pdf, other

    cs.CV

    Can Spatiotemporal 3D CNNs Retrace the History of 2D CNNs and ImageNet?

    Authors: Kensho Hara, Hirokatsu Kataoka, Yutaka Satoh

    Abstract: The purpose of this study is to determine whether current video datasets have sufficient data for training very deep convolutional neural networks (CNNs) with spatio-temporal three-dimensional (3D) kernels. Recently, the performance levels of 3D CNNs in the field of action recognition have improved significantly. However, to date, conventional research has only explored relatively shallow 3D archi… ▽ More

    Submitted 1 April, 2018; v1 submitted 27 November, 2017; originally announced November 2017.

    Comments: Accepted to CVPR 2018

  18. arXiv:1711.03343  [pdf, ps, other

    cs.LG stat.ML

    Analysis of Dropout in Online Learning

    Authors: Kazuyuki Hara

    Abstract: Deep learning is the state-of-the-art in fields such as visual object recognition and speech recognition. This learning uses a large number of layers and a huge number of units and connections. Therefore, overfitting is a serious problem with it, and the dropout which is a kind of regularization tool is used. However, in online learning, the effect of dropout is not well known. This paper presents… ▽ More

    Submitted 9 November, 2017; originally announced November 2017.

    Comments: 8 pages, 6 pages

    Journal ref: IEICE Technical Report IBIS2017-61

  19. arXiv:1708.07632  [pdf, other

    cs.CV

    Learning Spatio-Temporal Features with 3D Residual Networks for Action Recognition

    Authors: Kensho Hara, Hirokatsu Kataoka, Yutaka Satoh

    Abstract: Convolutional neural networks with spatio-temporal 3D kernels (3D CNNs) have an ability to directly extract spatio-temporal features from videos for action recognition. Although the 3D kernels tend to overfit because of a large number of their parameters, the 3D CNNs are greatly improved by using recent huge video databases. However, the architecture of 3D CNNs is relatively shallow against to the… ▽ More

    Submitted 25 August, 2017; originally announced August 2017.

    Comments: To appear in ICCV 2017 Workshop (Chalearn)

  20. Statistical Mechanics of Node-perturbation Learning with Noisy Baseline

    Authors: Kazuyuki Hara, Kentaro Katahira, Masato Okada

    Abstract: Node-perturbation learning is a type of statistical gradient descent algorithm that can be applied to problems where the objective function is not explicitly formulated, including reinforcement learning. It estimates the gradient of an objective function by using the change in the object function in response to the perturbation. The value of the objective function for an unperturbed output is call… ▽ More

    Submitted 20 June, 2017; originally announced June 2017.

    Comments: 16 pages, 7 figures, submitted to JPSJ

    Journal ref: Journal of the Physical Society of Japan 86, 024002 (2017)

  21. Analysis of dropout learning regarded as ensemble learning

    Authors: Kazuyuki Hara, Daisuke Saitoh, Hayaru Shouno

    Abstract: Deep learning is the state-of-the-art in fields such as visual object recognition and speech recognition. This learning uses a large number of layers, huge number of units, and connections. Therefore, overfitting is a serious problem. To avoid this problem, dropout learning is proposed. Dropout learning neglects some inputs and hidden units in the learning process with a probability, p, and then,… ▽ More

    Submitted 20 June, 2017; originally announced June 2017.

    Comments: 9 pages, 8 figures, submitted to Conference

    Journal ref: A. E. P. VIlla et al. (Eds.): ICANN 2016 ( Part II, LNCS 9887, pp. 1-8, 2016)

  22. arXiv:1702.01499  [pdf, other

    cs.CV

    Designing Deep Convolutional Neural Networks for Continuous Object Orientation Estimation

    Authors: Kota Hara, Raviteja Vemulapalli, Rama Chellappa

    Abstract: Deep Convolutional Neural Networks (DCNN) have been proven to be effective for various computer vision problems. In this work, we demonstrate its effectiveness on a continuous object orientation estimation task, which requires prediction of 0 to 360 degrees orientation of the objects. We do so by proposing and comparing three continuous orientation prediction approaches designed for the DCNNs. The… ▽ More

    Submitted 6 February, 2017; originally announced February 2017.

  23. arXiv:1702.01478  [pdf, other

    cs.CV

    Attentional Network for Visual Object Detection

    Authors: Kota Hara, Ming-Yu Liu, Oncel Tuzel, Amir-massoud Farahmand

    Abstract: We propose augmenting deep neural networks with an attention mechanism for the visual object detection task. As perceiving a scene, humans have the capability of multiple fixation points, each attended to scene content at different locations and scales. However, such a mechanism is missing in the current state-of-the-art visual object detection methods. Inspired by the human vision system, we prop… ▽ More

    Submitted 5 February, 2017; originally announced February 2017.

  24. Easy-setup eye movement recording system for human-computer interaction

    Authors: Manh Duong Phung, Quang Vinh Tran, Kenji Hara, Hirohito Inagaki, Masanobu Abe

    Abstract: Tracking the movement of human eyes is expected to yield natural and convenient applications based on human-computer interaction (HCI). To implement an effective eye-tracking system, eye movements must be recorded without placing any restriction on the user's behavior or user discomfort. This paper describes an eye movement recording system that offers free-head, simple configuration. It does not… ▽ More

    Submitted 28 November, 2016; originally announced November 2016.

    Comments: In IEEE International Conference on Research, Innovation and Vision for the Future (RIVF), 2008

  25. arXiv:1608.07876  [pdf, other

    cs.CV cs.MM

    Human Action Recognition without Human

    Authors: Hirokatsu Kataoka, Kensho Hara, Yutaka Satoh

    Abstract: The objective of this paper is to evaluate "human action recognition without human". Motion representation is frequently discussed in human action recognition. We have examined several sophisticated options, such as dense trajectories (DT) and the two-stream convolutional neural network (CNN). However, some features from the background could be too strong, as shown in some recent studies on human… ▽ More

    Submitted 23 October, 2024; v1 submitted 28 August, 2016; originally announced August 2016.

    Comments: This paper is an extension of the work presented at the ECCV 2016 Workshop and was primarily conducted in 2017

  26. arXiv:1507.00825  [pdf, other

    cs.LG stat.ML

    Ridge Regression, Hubness, and Zero-Shot Learning

    Authors: Yutaro Shigeto, Ikumi Suzuki, Kazuo Hara, Masashi Shimbo, Yuji Matsumoto

    Abstract: This paper discusses the effect of hubness in zero-shot learning, when ridge regression is used to find a mapping between the example space to the label space. Contrary to the existing approach, which attempts to find a mapping from the example space to the label space, we show that mapping labels into the example space is desirable to suppress the emergence of hubs in the subsequent nearest neigh… ▽ More

    Submitted 3 July, 2015; originally announced July 2015.

    Comments: To be presented at ECML/PKDD 2015

  27. Fashion Apparel Detection: The Role of Deep Convolutional Neural Network and Pose-dependent Priors

    Authors: Kota Hara, Vignesh Jagadeesh, Robinson Piramuthu

    Abstract: In this work, we propose and address a new computer vision task, which we call fashion item detection, where the aim is to detect various fashion items a person in the image is wearing or carrying. The types of fashion items we consider in this work include hat, glasses, bag, pants, shoes and so on. The detection of fashion items can be an important first step of various e-commerce applications fo… ▽ More

    Submitted 24 January, 2016; v1 submitted 19 November, 2014; originally announced November 2014.

    Comments: Accepted for publication at IEEE Winter Conference on Applications of Computer Vision (WACV) 2016

  28. arXiv:1312.6430  [pdf, other

    cs.CV cs.LG stat.ML

    Growing Regression Forests by Classification: Applications to Object Pose Estimation

    Authors: Kota Hara, Rama Chellappa

    Abstract: In this work, we propose a novel node splitting method for regression trees and incorporate it into the regression forest framework. Unlike traditional binary splitting, where the splitting rule is selected from a predefined set of binary splitting rules via trial-and-error, the proposed node splitting method first finds clusters of the training data which at least locally minimize the empirical l… ▽ More

    Submitted 14 July, 2014; v1 submitted 22 December, 2013; originally announced December 2013.

    Comments: Paper accepted by ECCV 2014

  29. arXiv:1310.6110  [pdf, ps, other

    cs.IR

    A two-step model and the algorithm for recalling in recommender systems

    Authors: Keisuke Hara, Tomihisa Kamada

    Abstract: When a user finds an interesting recommendation in a recommender system, the user may want to recall related items recommended in the past to reconsider or to enjoy them again. If the system can pick up such "recalled" items at each user's request, it must deepen the user experience. We propose a model and the algorithm for such personalized "recalling" in conventional recommender systems, which… ▽ More

    Submitted 23 October, 2013; originally announced October 2013.

    Comments: 6 pages, No figure