Skip to main content

Showing 1–25 of 25 results for author: Akiba, T

Searching in archive cs. Search in all archives.
.
  1. arXiv:2506.14202  [pdf, ps, other

    cs.LG cs.AI stat.ML

    DiffusionBlocks: Blockwise Training for Generative Models via Score-Based Diffusion

    Authors: Makoto Shing, Takuya Akiba

    Abstract: Training large neural networks with end-to-end backpropagation creates significant memory bottlenecks, limiting accessibility to state-of-the-art AI research. We propose $\textit{DiffusionBlocks}$, a novel training framework that interprets neural network blocks as performing denoising operations in a continuous-time diffusion process. By partitioning the network into independently trainable block… ▽ More

    Submitted 17 June, 2025; originally announced June 2025.

    Comments: To appear at TTODLer-FM Workshop of the 42nd International Conference on Machine Learning

  2. arXiv:2506.09050  [pdf, other

    cs.AI

    ALE-Bench: A Benchmark for Long-Horizon Objective-Driven Algorithm Engineering

    Authors: Yuki Imajuku, Kohki Horie, Yoichi Iwata, Kensho Aoki, Naohiro Takahashi, Takuya Akiba

    Abstract: How well do AI systems perform in algorithm engineering for hard optimization problems in domains such as package-delivery routing, crew scheduling, factory production planning, and power-grid balancing? We introduce ALE-Bench, a new benchmark for evaluating AI systems on score-based algorithmic programming contests. Drawing on real tasks from the AtCoder Heuristic Contests, ALE-Bench presents opt… ▽ More

    Submitted 10 June, 2025; originally announced June 2025.

    Comments: 36 pages

  3. arXiv:2503.04412  [pdf, ps, other

    cs.AI

    Wider or Deeper? Scaling LLM Inference-Time Compute with Adaptive Branching Tree Search

    Authors: Yuichi Inoue, Kou Misaki, Yuki Imajuku, So Kuroki, Taishi Nakamura, Takuya Akiba

    Abstract: Recent advances demonstrate that increasing inference-time computation can significantly boost the reasoning capabilities of large language models (LLMs). Although repeated sampling (i.e., generating multiple candidate outputs) is a highly effective strategy, it does not leverage external feedback signals for refinement, which are often available in tasks like coding. In this work, we propose Adap… ▽ More

    Submitted 27 June, 2025; v1 submitted 6 March, 2025; originally announced March 2025.

    Comments: Presented at ICLR 2025 Workshop on Foundation Models in the Wild

  4. arXiv:2502.19261  [pdf, other

    cs.CL cs.AI cs.LG

    Drop-Upcycling: Training Sparse Mixture of Experts with Partial Re-initialization

    Authors: Taishi Nakamura, Takuya Akiba, Kazuki Fujii, Yusuke Oda, Rio Yokota, Jun Suzuki

    Abstract: The Mixture of Experts (MoE) architecture reduces the training and inference cost significantly compared to a dense model of equivalent capacity. Upcycling is an approach that initializes and trains an MoE model using a pre-trained dense model. While upcycling leads to initial performance gains, the training progresses slower than when trained from scratch, leading to suboptimal performance in the… ▽ More

    Submitted 15 March, 2025; v1 submitted 26 February, 2025; originally announced February 2025.

    Comments: To appear at the 13th International Conference on Learning Representations (ICLR 2025)

  5. arXiv:2501.16937  [pdf, other

    cs.LG cs.AI cs.CL

    TAID: Temporally Adaptive Interpolated Distillation for Efficient Knowledge Transfer in Language Models

    Authors: Makoto Shing, Kou Misaki, Han Bao, Sho Yokoi, Takuya Akiba

    Abstract: Causal language models have demonstrated remarkable capabilities, but their size poses significant challenges for deployment in resource-constrained environments. Knowledge distillation, a widely-used technique for transferring knowledge from a large teacher model to a small student model, presents a promising approach for model compression. A significant remaining issue lies in the major differen… ▽ More

    Submitted 27 February, 2025; v1 submitted 28 January, 2025; originally announced January 2025.

    Comments: To appear at the 13th International Conference on Learning Representations (ICLR 2025) as a Spotlight presentation

  6. arXiv:2410.14735  [pdf, other

    cs.CL cs.AI cs.NE

    Agent Skill Acquisition for Large Language Models via CycleQD

    Authors: So Kuroki, Taishi Nakamura, Takuya Akiba, Yujin Tang

    Abstract: Training large language models to acquire specific skills remains a challenging endeavor. Conventional training approaches often struggle with data distribution imbalances and inadequacies in objective functions that do not align well with task-specific performance. To address these challenges, we introduce CycleQD, a novel approach that leverages the Quality Diversity framework through a cyclic a… ▽ More

    Submitted 17 February, 2025; v1 submitted 16 October, 2024; originally announced October 2024.

    Comments: To appear at the 13th International Conference on Learning Representations (ICLR 2025)

  7. Evolutionary Optimization of Model Merging Recipes

    Authors: Takuya Akiba, Makoto Shing, Yujin Tang, Qi Sun, David Ha

    Abstract: Large language models (LLMs) have become increasingly capable, but their development often requires substantial computational resources. While model merging has emerged as a cost-effective promising approach for creating new models by combining existing ones, it currently relies on human intuition and domain knowledge, limiting its potential. Here, we propose an evolutionary approach that overcome… ▽ More

    Submitted 27 January, 2025; v1 submitted 19 March, 2024; originally announced March 2024.

    Comments: Authors' submitted version before final edits. Published in Nature Machine Intelligence on January 27, 2025: https://www.nature.com/articles/s42256-024-00975-8

    Journal ref: Nat Mach Intell (2025)

  8. arXiv:1910.11534  [pdf, other

    cs.CV

    Team PFDet's Methods for Open Images Challenge 2019

    Authors: Yusuke Niitani, Toru Ogawa, Shuji Suzuki, Takuya Akiba, Tommi Kerola, Kohei Ozaki, Shotaro Sano

    Abstract: We present the instance segmentation and the object detection method used by team PFDet for Open Images Challenge 2019. We tackle a massive dataset size, huge class imbalance and federated annotations. Using this method, the team PFDet achieved 3rd and 4th place in the instance segmentation and the object detection track, respectively.

    Submitted 25 October, 2019; originally announced October 2019.

  9. arXiv:1908.00213  [pdf, other

    cs.LG cs.CV cs.DC stat.ML

    Chainer: A Deep Learning Framework for Accelerating the Research Cycle

    Authors: Seiya Tokui, Ryosuke Okuta, Takuya Akiba, Yusuke Niitani, Toru Ogawa, Shunta Saito, Shuji Suzuki, Kota Uenishi, Brian Vogel, Hiroyuki Yamazaki Vincent

    Abstract: Software frameworks for neural networks play a key role in the development and application of deep learning methods. In this paper, we introduce the Chainer framework, which intends to provide a flexible, intuitive, and high performance means of implementing the full range of deep learning models needed by researchers and practitioners. Chainer provides acceleration using Graphics Processing Units… ▽ More

    Submitted 1 August, 2019; originally announced August 2019.

    Comments: Accepted for Applied Data Science Track in KDD'19

  10. arXiv:1907.10902  [pdf, other

    cs.LG stat.ML

    Optuna: A Next-generation Hyperparameter Optimization Framework

    Authors: Takuya Akiba, Shotaro Sano, Toshihiko Yanase, Takeru Ohta, Masanori Koyama

    Abstract: The purpose of this study is to introduce new design-criteria for next-generation hyperparameter optimization software. The criteria we propose include (1) define-by-run API that allows users to construct the parameter search space dynamically, (2) efficient implementation of both searching and pruning strategies, and (3) easy-to-setup, versatile architecture that can be deployed for various purpo… ▽ More

    Submitted 25 July, 2019; originally announced July 2019.

    Comments: 10 pages, Accepted at KDD 2019 Applied Data Science track

  11. arXiv:1905.11722  [pdf, ps, other

    cs.LG stat.ML

    A Graph Theoretic Framework of Recomputation Algorithms for Memory-Efficient Backpropagation

    Authors: Mitsuru Kusumoto, Takuya Inoue, Gentaro Watanabe, Takuya Akiba, Masanori Koyama

    Abstract: Recomputation algorithms collectively refer to a family of methods that aims to reduce the memory consumption of the backpropagation by selectively discarding the intermediate results of the forward propagation and recomputing the discarded results as needed. In this paper, we will propose a novel and efficient recomputation method that can be applied to a wider range of neural nets than previous… ▽ More

    Submitted 28 May, 2019; originally announced May 2019.

  12. arXiv:1811.10862  [pdf, other

    cs.CV

    Sampling Techniques for Large-Scale Object Detection from Sparsely Annotated Objects

    Authors: Yusuke Niitani, Takuya Akiba, Tommi Kerola, Toru Ogawa, Shotaro Sano, Shuji Suzuki

    Abstract: Efficient and reliable methods for training of object detectors are in higher demand than ever, and more and more data relevant to the field is becoming available. However, large datasets like Open Images Dataset v4 (OID) are sparsely annotated, and some measure must be taken in order to ensure the training of a reliable detector. In order to take the incompleteness of these datasets into account,… ▽ More

    Submitted 21 April, 2019; v1 submitted 27 November, 2018; originally announced November 2018.

    Comments: CVPR2019 oral

  13. arXiv:1809.00778  [pdf, other

    cs.CV

    PFDet: 2nd Place Solution to Open Images Challenge 2018 Object Detection Track

    Authors: Takuya Akiba, Tommi Kerola, Yusuke Niitani, Toru Ogawa, Shotaro Sano, Shuji Suzuki

    Abstract: We present a large-scale object detection system by team PFDet. Our system enables training with huge datasets using 512 GPUs, handles sparsely verified classes, and massive class imbalance. Using our method, we achieved 2nd place in the Google AI Open Images Object Detection Track 2018 on Kaggle.

    Submitted 3 September, 2018; originally announced September 2018.

    Comments: Technical report for Open Images Challenge 2018 Object Detection Track

  14. arXiv:1804.00097  [pdf, other

    cs.CV cs.CR cs.LG stat.ML

    Adversarial Attacks and Defences Competition

    Authors: Alexey Kurakin, Ian Goodfellow, Samy Bengio, Yinpeng Dong, Fangzhou Liao, Ming Liang, Tianyu Pang, Jun Zhu, Xiaolin Hu, Cihang Xie, Jianyu Wang, Zhishuai Zhang, Zhou Ren, Alan Yuille, Sangxia Huang, Yao Zhao, Yuzhe Zhao, Zhonglin Han, Junjiajia Long, Yerkebulan Berdibekov, Takuya Akiba, Seiya Tokui, Motoki Abe

    Abstract: To accelerate research on adversarial examples and robustness of machine learning classifiers, Google Brain organized a NIPS 2017 competition that encouraged researchers to develop new methods to generate adversarial examples as well as to develop new ways to defend against them. In this chapter, we describe the structure and organization of the competition and the solutions developed by several o… ▽ More

    Submitted 30 March, 2018; originally announced April 2018.

    Comments: 36 pages, 10 figures

  15. arXiv:1802.06058  [pdf, other

    cs.LG

    Variance-based Gradient Compression for Efficient Distributed Deep Learning

    Authors: Yusuke Tsuzuku, Hiroto Imachi, Takuya Akiba

    Abstract: Due to the substantial computational cost, training state-of-the-art deep neural networks for large-scale datasets often requires distributed training using multiple computation workers. However, by nature, workers need to frequently communicate gradients, causing severe bottlenecks, especially on lower bandwidth connections. A few methods have been proposed to compress gradient for efficient comm… ▽ More

    Submitted 19 February, 2018; v1 submitted 16 February, 2018; originally announced February 2018.

    Comments: ICLR 2018 Workshop

  16. ShakeDrop Regularization for Deep Residual Learning

    Authors: Yoshihiro Yamada, Masakazu Iwamura, Takuya Akiba, Koichi Kise

    Abstract: Overfitting is a crucial problem in deep neural networks, even in the latest network architectures. In this paper, to relieve the overfitting effect of ResNet and its improvements (i.e., Wide ResNet, PyramidNet, and ResNeXt), we propose a new regularization method called ShakeDrop regularization. ShakeDrop is inspired by Shake-Shake, which is an effective regularization method, but can be applied… ▽ More

    Submitted 6 January, 2020; v1 submitted 7 February, 2018; originally announced February 2018.

    Journal ref: IEEE Access, 7, 1, pp.186126-186136 (2019)

  17. arXiv:1711.04325  [pdf, other

    cs.DC cs.CV cs.LG

    Extremely Large Minibatch SGD: Training ResNet-50 on ImageNet in 15 Minutes

    Authors: Takuya Akiba, Shuji Suzuki, Keisuke Fukuda

    Abstract: We demonstrate that training ResNet-50 on ImageNet for 90 epochs can be achieved in 15 minutes with 1024 Tesla P100 GPUs. This was made possible by using a large minibatch size of 32k. To maintain accuracy with this large minibatch size, we employed several techniques such as RMSprop warm-up, batch normalization without moving averages, and a slow-start learning rate schedule. This paper also desc… ▽ More

    Submitted 12 November, 2017; originally announced November 2017.

    Comments: NIPS'17 Workshop: Deep Learning at Supercomputer Scale

  18. arXiv:1710.11351  [pdf, other

    cs.DC cs.LG cs.NE

    ChainerMN: Scalable Distributed Deep Learning Framework

    Authors: Takuya Akiba, Keisuke Fukuda, Shuji Suzuki

    Abstract: One of the keys for deep learning to have made a breakthrough in various fields was to utilize high computing powers centering around GPUs. Enabling the use of further computing abilities by distributed processing is essential not only to make the deep learning bigger and faster but also to tackle unsolved challenges. We present the design, implementation, and evaluation of ChainerMN, the distribu… ▽ More

    Submitted 31 October, 2017; originally announced October 2017.

  19. arXiv:1609.08723  [pdf, other

    cs.DS cs.SI

    Cut Tree Construction from Massive Graphs

    Authors: Takuya Akiba, Yoichi Iwata, Yosuke Sameshima, Naoto Mizuno, Yosuke Yano

    Abstract: The construction of cut trees (also known as Gomory-Hu trees) for a given graph enables the minimum-cut size of the original graph to be obtained for any pair of vertices. Cut trees are a powerful back-end for graph management and mining, as they support various procedures related to the minimum cut, maximum flow, and connectivity. However, the crucial drawback with cut trees is the computational… ▽ More

    Submitted 27 September, 2016; originally announced September 2016.

    Comments: Short version will appear at ICDM'16

  20. arXiv:1609.07994  [pdf, other

    cs.DS cs.SI physics.soc-ph

    Fractality of Massive Graphs: Scalable Analysis with Sketch-Based Box-Covering Algorithm

    Authors: Takuya Akiba, Kenko Nakamura, Taro Takaguchi

    Abstract: Analysis and modeling of networked objects are fundamental pieces of modern data mining. Most real-world networks, from biological to social ones, are known to have common structural properties. These properties allow us to model the growth processes of networks and to develop useful algorithms. One remarkable example is the fractality of networks, which suggests the self-similar organization of g… ▽ More

    Submitted 26 September, 2016; originally announced September 2016.

    Comments: Short version will appear at ICDM'16

  21. arXiv:1411.2680  [pdf, ps, other

    cs.DS

    Branch-and-Reduce Exponential/FPT Algorithms in Practice: A Case Study of Vertex Cover

    Authors: Takuya Akiba, Yoichi Iwata

    Abstract: We investigate the gap between theory and practice for exact branching algorithms. In theory, branch-and-reduce algorithms currently have the best time complexity for numerous important problems. On the other hand, in practice, state-of-the-art methods are based on different approaches, and the empirical efficiency of such theoretical algorithms have seldom been investigated probably because they… ▽ More

    Submitted 10 November, 2014; originally announced November 2014.

    Comments: To appear in ALENEX 2015

  22. arXiv:1304.4661  [pdf, ps, other

    cs.DS cs.DB

    Fast Exact Shortest-Path Distance Queries on Large Networks by Pruned Landmark Labeling

    Authors: Takuya Akiba, Yoichi Iwata, Yuichi Yoshida

    Abstract: We propose a new exact method for shortest-path distance queries on large-scale networks. Our method precomputes distance labels for vertices by performing a breadth-first search from every vertex. Seemingly too obvious and too inefficient at first glance, the key ingredient introduced here is pruning during breadth-first searches. While we can still answer the correct distance for any pair of ver… ▽ More

    Submitted 16 April, 2013; originally announced April 2013.

    Comments: To appear in SIGMOD 2013

  23. arXiv:cs/0407028  [pdf, ps, other

    cs.CL

    Effects of Language Modeling on Speech-driven Question Answering

    Authors: Tomoyosi Akiba, Atsushi Fujii, Katunobu Itou

    Abstract: We integrate automatic speech recognition (ASR) and question answering (QA) to realize a speech-driven QA system, and evaluate its performance. We adapt an N-gram language model to natural language questions, so that the input of our system can be recognized with a high accuracy. We target WH-questions which consist of the topic part and fixed phrase used to ask about something. We first produce… ▽ More

    Submitted 10 July, 2004; originally announced July 2004.

    Comments: 4 pages, Proceedings of the 8th International Conference on Spoken Language Processing (to appear)

    ACM Class: I.2.7; H.3.3; H.3.4; H.5.1

    Journal ref: Proceedings of the 8th International Conference on Spoken Language Processing (ICSLP 2004), pp.1053-1056, Oct. 2004

  24. arXiv:cs/0407027  [pdf, ps, other

    cs.CL

    Unsupervised Topic Adaptation for Lecture Speech Retrieval

    Authors: Atsushi Fujii, Katunobu Itou, Tomoyosi Akiba, Tetsuya Ishikawa

    Abstract: We are developing a cross-media information retrieval system, in which users can view specific segments of lecture videos by submitting text queries. To produce a text index, the audio track is extracted from a lecture video and a transcription is generated by automatic speech recognition. In this paper, to improve the quality of our retrieval system, we extensively investigate the effects of ad… ▽ More

    Submitted 10 July, 2004; originally announced July 2004.

    Comments: 4 pages, Proceedings of the 8th International Conference on Spoken Language Processing (to appear)

    ACM Class: I.2.7; H.3.3; H.5.1

    Journal ref: Proceedings of the 8th International Conference on Spoken Language Processing (ICSLP 2004), pp.2957-2960, Oct. 2004

  25. arXiv:cs/0309021  [pdf, ps, other

    cs.CL

    A Cross-media Retrieval System for Lecture Videos

    Authors: Atsushi Fujii, Katunobu Itou, Tomoyosi Akiba, Tetsuya Ishikawa

    Abstract: We propose a cross-media lecture-on-demand system, in which users can selectively view specific segments of lecture videos by submitting text queries. Users can easily formulate queries by using the textbook associated with a target lecture, even if they cannot come up with effective keywords. Our system extracts the audio track from a target lecture video, generates a transcription by large voc… ▽ More

    Submitted 13 September, 2003; originally announced September 2003.

    ACM Class: I.2.7; H.3.3; H.5.1

    Journal ref: Proceedings of the 8th European Conference on Speech Communication and Technology (Eurospeech 2003), pp.1149-1152, Sep. 2003