Skip to main content

Showing 1–9 of 9 results for author: Aktı, Ş

.
  1. arXiv:2506.04013  [pdf, ps, other

    cs.SD cs.AI eess.AS

    Towards Better Disentanglement in Non-Autoregressive Zero-Shot Expressive Voice Conversion

    Authors: Seymanur Akti, Tuan Nam Nguyen, Alexander Waibel

    Abstract: Expressive voice conversion aims to transfer both speaker identity and expressive attributes from a target speech to a given source speech. In this work, we improve over a self-supervised, non-autoregressive framework with a conditional variational autoencoder, focusing on reducing source timbre leakage and improving linguistic-acoustic disentanglement for better style transfer. To minimize style… ▽ More

    Submitted 4 June, 2025; originally announced June 2025.

    Comments: Accepted to Interspeech 2025

  2. arXiv:2505.13036  [pdf, ps, other

    cs.CL cs.AI

    KIT's Offline Speech Translation and Instruction Following Submission for IWSLT 2025

    Authors: Sai Koneru, Maike Züfle, Thai-Binh Nguyen, Seymanur Akti, Jan Niehues, Alexander Waibel

    Abstract: The scope of the International Workshop on Spoken Language Translation (IWSLT) has recently broadened beyond traditional Speech Translation (ST) to encompass a wider array of tasks, including Speech Question Answering and Summarization. This shift is partly driven by the growing capabilities of modern systems, particularly with the success of Large Language Models (LLMs). In this paper, we present… ▽ More

    Submitted 19 May, 2025; originally announced May 2025.

  3. arXiv:2410.14997  [pdf, other

    cs.SD cs.AI eess.AS

    Improving Pronunciation and Accent Conversion through Knowledge Distillation And Synthetic Ground-Truth from Native TTS

    Authors: Tuan Nam Nguyen, Seymanur Akti, Ngoc Quan Pham, Alexander Waibel

    Abstract: Previous approaches on accent conversion (AC) mainly aimed at making non-native speech sound more native while maintaining the original content and speaker identity. However, non-native speakers sometimes have pronunciation issues, which can make it difficult for listeners to understand them. Hence, we developed a new AC approach that not only focuses on accent conversion but also improves pronunc… ▽ More

    Submitted 4 March, 2025; v1 submitted 19 October, 2024; originally announced October 2024.

    Comments: accepted at ICASSP 2025

  4. arXiv:2405.04327  [pdf, other

    cs.CV

    Audio-Visual Speech Representation Expert for Enhanced Talking Face Video Generation and Evaluation

    Authors: Dogucan Yaman, Fevziye Irem Eyiokur, Leonard Bärmann, Seymanur Aktı, Hazım Kemal Ekenel, Alexander Waibel

    Abstract: In the task of talking face generation, the objective is to generate a face video with lips synchronized to the corresponding audio while preserving visual details and identity information. Current methods face the challenge of learning accurate lip synchronization while avoiding detrimental effects on visual quality, as well as robustly evaluating such synchronization. To tackle these problems, w… ▽ More

    Submitted 7 May, 2024; originally announced May 2024.

    Comments: CVPR2024 NTIRE Workshop

  5. arXiv:2204.09432  [pdf, other

    cs.CV

    A Mobile Food Recognition System for Dietary Assessment

    Authors: Şeymanur Aktı, Marwa Qaraqe, Hazım Kemal Ekenel

    Abstract: Food recognition is an important task for a variety of applications, including managing health conditions and assisting visually impaired people. Several food recognition studies have focused on generic types of food or specific cuisines, however, food recognition with respect to Middle Eastern cuisines has remained unexplored. Therefore, in this paper we focus on developing a mobile friendly, Mid… ▽ More

    Submitted 20 April, 2022; originally announced April 2022.

    Comments: Accepted at GoodBrotherVI4IAAL Workshop @ICIAP2021

  6. arXiv:2111.08370  [pdf, other

    cs.CV

    Fight Detection from Still Images in the Wild

    Authors: Şeymanur Aktı, Ferda Ofli, Muhammad Imran, Hazım Kemal Ekenel

    Abstract: Detecting fights from still images shared on social media is an important task required to limit the distribution of violent scenes in order to prevent their negative effects. For this reason, in this study, we address the problem of fight detection from still images collected from the web and social media. We explore how well one can detect fights from just a single still image. We also propose a… ▽ More

    Submitted 17 November, 2021; v1 submitted 16 November, 2021; originally announced November 2021.

    Comments: Accepted for publication at Winter Conference of Applications on Computer Vision Workshops (WACV-W 2022), Workshop on Real-World Surveillance: Applications and Challenges

  7. arXiv:2109.07739  [pdf, other

    cs.LG cs.AI cs.CV

    A Comparative Study of Machine Learning Methods for Predicting the Evolution of Brain Connectivity from a Baseline Timepoint

    Authors: Şeymanur Aktı, Doğay Kamar, Özgür Anıl Özlü, Ihsan Soydemir, Muhammet Akcan, Abdullah Kul, Islem Rekik

    Abstract: Predicting the evolution of the brain network, also called connectome, by foreseeing changes in the connectivity weights linking pairs of anatomical regions makes it possible to spot connectivity-related neurological disorders in earlier stages and detect the development of potential connectomic anomalies. Remarkably, such a challenging prediction problem remains least explored in the predictive c… ▽ More

    Submitted 16 September, 2021; originally announced September 2021.

  8. MOCCA: Multi-Layer One-Class ClassificAtion for Anomaly Detection

    Authors: Fabio Valerio Massoli, Fabrizio Falchi, Alperen Kantarci, Şeymanur Akti, Hazim Kemal Ekenel, Giuseppe Amato

    Abstract: Anomalies are ubiquitous in all scientific fields and can express an unexpected event due to incomplete knowledge about the data distribution or an unknown process that suddenly comes into play and distorts observations. Due to such events' rarity, to train deep learning models on the Anomaly Detection (AD) task, scientists only rely on "normal" data, i.e., non-anomalous samples. Thus, letting the… ▽ More

    Submitted 27 November, 2021; v1 submitted 9 December, 2020; originally announced December 2020.

    Comments: The paper has been accepted for publication in the IEEE Transactions on Neural Networks and Learning Systems, Special Issue on Deep Learning for Anomaly Detection

    MSC Class: 68-XX ACM Class: I.5

    Journal ref: IEEE TNNLS (2021)

  9. arXiv:2002.04355  [pdf, other

    cs.CV cs.LG eess.IV

    Vision-based Fight Detection from Surveillance Cameras

    Authors: Şeymanur Aktı, Gözde Ayşe Tataroğlu, Hazım Kemal Ekenel

    Abstract: Vision-based action recognition is one of the most challenging research topics of computer vision and pattern recognition. A specific application of it, namely, detecting fights from surveillance cameras in public areas, prisons, etc., is desired to quickly get under control these violent incidents. This paper addresses this research problem and explores LSTM-based approaches to solve it. Moreover… ▽ More

    Submitted 11 February, 2020; originally announced February 2020.

    Comments: 6 pages, 5 figures, 4 tables, International Conference on Image Processing Theory, Tools and Applications, IPTA 2019