Granite-speech: open-source speech-aware LLMs with strong English ASR capabilities
Authors:
George Saon,
Avihu Dekel,
Alexander Brooks,
Tohru Nagano,
Abraham Daniels,
Aharon Satt,
Ashish Mittal,
Brian Kingsbury,
David Haws,
Edmilson Morais,
Gakuto Kurata,
Hagai Aronowitz,
Ibrahim Ibrahim,
Jeff Kuo,
Kate Soule,
Luis Lastras,
Masayuki Suzuki,
Ron Hoory,
Samuel Thomas,
Sashi Novitasari,
Takashi Fukuda,
Vishal Sunder,
Xiaodong Cui,
Zvi Kons
Abstract:
Granite-speech LLMs are compact and efficient speech language models specifically designed for English ASR and automatic speech translation (AST). The models were trained by modality aligning the 2B and 8B parameter variants of granite-3.3-instruct to speech on publicly available open-source corpora containing audio inputs and text targets consisting of either human transcripts for ASR or automati…
▽ More
Granite-speech LLMs are compact and efficient speech language models specifically designed for English ASR and automatic speech translation (AST). The models were trained by modality aligning the 2B and 8B parameter variants of granite-3.3-instruct to speech on publicly available open-source corpora containing audio inputs and text targets consisting of either human transcripts for ASR or automatically generated translations for AST. Comprehensive benchmarking shows that on English ASR, which was our primary focus, they outperform several competitors' models that were trained on orders of magnitude more proprietary data, and they keep pace on English-to-X AST for major European languages, Japanese, and Chinese. The speech-specific components are: a conformer acoustic encoder using block attention and self-conditioning trained with connectionist temporal classification, a windowed query-transformer speech modality adapter used to do temporal downsampling of the acoustic embeddings and map them to the LLM text embedding space, and LoRA adapters to further fine-tune the text LLM. Granite-speech-3.3 operates in two modes: in speech mode, it performs ASR and AST by activating the encoder, projector, and LoRA adapters; in text mode, it calls the underlying granite-3.3-instruct model directly (without LoRA), essentially preserving all the text LLM capabilities and safety. Both models are freely available on HuggingFace (https://huggingface.co/ibm-granite/granite-speech-3.3-2b and https://huggingface.co/ibm-granite/granite-speech-3.3-8b) and can be used for both research and commercial purposes under a permissive Apache 2.0 license.
△ Less
Submitted 13 May, 2025; v1 submitted 13 May, 2025;
originally announced May 2025.
Resource allocation for D2D-Based AMI Communications Underlaying LTE Cellular Networks
Authors:
H. H. Esmat,
Mahmoud M. Elmesalawy,
I. I. Ibrahim
Abstract:
Smart meters are utilized to transmit the consumption information to the metering data management system for observing and management in smart grid advanced metering infrastructure systems. In the meantime, for efficient utilization for spectrum, Device-to-Device (D2D) communications underlaying LTE networks are a promising wireless communication technology for advanced metering infrastructure whi…
▽ More
Smart meters are utilized to transmit the consumption information to the metering data management system for observing and management in smart grid advanced metering infrastructure systems. In the meantime, for efficient utilization for spectrum, Device-to-Device (D2D) communications underlaying LTE networks are a promising wireless communication technology for advanced metering infrastructure which supporting a technique for reusing the same radio resources (RRs) of LTE networks. Therefore, we examine the utilization of D2D communication technology for advanced metering infrastructure communications underlaying LTE networks. A novel approach is suggested for provisioning the mandatory communication between serving data concentrator and its set of smart meters using this technology. The suggested approach is dependent on two main stages. The group of permissible cellular user equipment reuse candidates for every smart meter is calculated with taking the quality of service demands for cellular user devices and smart meters into consideration in the first stage. The optimal RR allocation for every smart meter is determined based on maximizing the access rate of smart meters which can be accepted and operated in D2D reuse mode in the second stage. Simulation results prove the efficacy of the suggested approach for efficient advanced metering infrastructure communication underlaying LTE systems with accepting remarkable number of SMs and accomplishing outstanding throughput gain.
△ Less
Submitted 1 June, 2021;
originally announced June 2021.