Search | arXiv e-print repository

Knowledge Distillation Using Frontier Open-source LLMs: Generalizability and the Role of Synthetic Data

Authors: Anup Shirgaonkar, Nikhil Pandey, Nazmiye Ceren Abay, Tolga Aktas, Vijay Aski

Abstract: Leading open-source large language models (LLMs) such as Llama-3.1-Instruct-405B are extremely capable at generating text, answering questions, and solving a variety of natural language understanding tasks. However, they incur higher inference cost and latency compared to smaller LLMs. Knowledge distillation provides a way to use outputs from these large, capable teacher models to train smaller st… ▽ More Leading open-source large language models (LLMs) such as Llama-3.1-Instruct-405B are extremely capable at generating text, answering questions, and solving a variety of natural language understanding tasks. However, they incur higher inference cost and latency compared to smaller LLMs. Knowledge distillation provides a way to use outputs from these large, capable teacher models to train smaller student models which can be used for inference at lower cost and latency, while retaining comparable accuracy. We investigate the efficacy of distillation using the Llama-3.1-405B-Instruct teacher and the smaller Llama-3.1-8B-Instruct and Llama-3.1-70B-Instruct student models. Contributions of this work include (a) We evaluate the generalizability of distillation with the above Llama-3.1 teacher-student pairs across different tasks and datasets (b) We show that using synthetic data during distillation significantly improves the accuracy of 8B and 70B models, and when used with reasoning chains, even matches or surpasses the zero-shot accuracy of 405B model on some datasets (c) We empirically show that distillation enables 8B and 70B models to internalize 405B's reasoning ability by using only standard fine-tuning (without customizing any loss function). This allows cost and latency-efficient student model inference. (d) We show pitfalls in evaluation of distillation, and present task-specific evaluation, including both human and LLM-grading, and ground-truth based traditional accuracy benchmarks. This methodical study brings out the fundamental importance of synthetic data quality in knowledge distillation, and of combining multiple, task-specific ways of accuracy and quality evaluation in assessing the effectiveness of distillation. △ Less

Submitted 24 October, 2024; originally announced October 2024.

Comments: 25 pages, 4 figures

arXiv:2404.00213 [pdf, other]

Injecting New Knowledge into Large Language Models via Supervised Fine-Tuning

Authors: Nick Mecklenburg, Yiyou Lin, Xiaoxiao Li, Daniel Holstein, Leonardo Nunes, Sara Malvar, Bruno Silva, Ranveer Chandra, Vijay Aski, Pavan Kumar Reddy Yannam, Tolga Aktas, Todd Hendry

Abstract: In recent years, Large Language Models (LLMs) have shown remarkable performance in generating human-like text, proving to be a valuable asset across various applications. However, adapting these models to incorporate new, out-of-domain knowledge remains a challenge, particularly for facts and events that occur after the model's knowledge cutoff date. This paper investigates the effectiveness of Su… ▽ More In recent years, Large Language Models (LLMs) have shown remarkable performance in generating human-like text, proving to be a valuable asset across various applications. However, adapting these models to incorporate new, out-of-domain knowledge remains a challenge, particularly for facts and events that occur after the model's knowledge cutoff date. This paper investigates the effectiveness of Supervised Fine-Tuning (SFT) as a method for knowledge injection in LLMs, specifically focusing on the domain of recent sporting events. We compare different dataset generation strategies -- token-based and fact-based scaling -- to create training data that helps the model learn new information. Our experiments on GPT-4 demonstrate that while token-based scaling can lead to improvements in Q&A accuracy, it may not provide uniform coverage of new knowledge. Fact-based scaling, on the other hand, offers a more systematic approach to ensure even coverage across all facts. We present a novel dataset generation process that leads to more effective knowledge ingestion through SFT, and our results show considerable performance improvements in Q&A tasks related to out-of-domain knowledge. This study contributes to the understanding of domain adaptation for LLMs and highlights the potential of SFT in enhancing the factuality of LLM responses in specific knowledge domains. △ Less

Submitted 2 April, 2024; v1 submitted 29 March, 2024; originally announced April 2024.

Comments: 16 pages; 7 figures. updated authors list

arXiv:2002.00842 [pdf, other]

Mi YouTube es Su YouTube? Analyzing the Cultures using YouTube Thumbnails of Popular Videos

Authors: Songyang Zhang, Tolga Aktas, Jiebo Luo

Abstract: YouTube, a world-famous video sharing website, maintains a list of the top trending videos on the platform. Due to its huge amount of users, it enables researchers to understand people's preference by analyzing the trending videos. Trending videos vary from country to country. By analyzing such differences and changes, we can tell how users' preferences differ over locations. Previous work focuses… ▽ More YouTube, a world-famous video sharing website, maintains a list of the top trending videos on the platform. Due to its huge amount of users, it enables researchers to understand people's preference by analyzing the trending videos. Trending videos vary from country to country. By analyzing such differences and changes, we can tell how users' preferences differ over locations. Previous work focuses on analyzing such culture preferences from videos' metadata, while the culture information hidden within the visual content has not been discovered. In this study, we explore culture preferences among countries using the thumbnails of YouTube trending videos. We first process the thumbnail images of the videos using object detectors. The collected object information is then used for various statistical analysis. In particular, we examine the data from three perspectives: geographical locations, video genres and users' reactions. Experimental results indicate that the users from similar cultures shares interests in watching similar videos on YouTube. Our study demonstrates that discovering the culture preference through the thumbnails can be an effective mechanism for video social media analysis. △ Less

Submitted 27 January, 2020; originally announced February 2020.

arXiv:1609.01322 [pdf, ps, other]

Mobile Relays for Smart Cities: Mathematical Proofs

Authors: Tugcan Aktas, Giorgio Quer, Tara Javidi, Ramesh R. Rao

Abstract: The increasing number of connected vehicles in densely populated urban areas provides an interesting opportunity to counteract the high wireless data demands in high density and highly mobile scenarios. The idea is to support the macro base station (BS) with a secondary communication tier composed of a set of smart and connected vehicles that are in movement in the urban area. As a first step to… ▽ More The increasing number of connected vehicles in densely populated urban areas provides an interesting opportunity to counteract the high wireless data demands in high density and highly mobile scenarios. The idea is to support the macro base station (BS) with a secondary communication tier composed of a set of smart and connected vehicles that are in movement in the urban area. As a first step towards a comprehensive cost-benefit analysis of this architecture, this paper considers the case where these vehicles are equipped with femto-mobile Access Points (fmAPs) and constitute a mobile out-of-band relay infrastructure. In particular, three techniques to select an fmAP (if more than one is available) are proposed and the maximal feasible gain in the packet delivery rate and data rate as a function of the vehicle density, average vehicle speeds, handoff overhead cost, as well as physical layer parameters is characterized. The analytical and simulation results provide a first benchmark characterizing this architecture and the definition of guidelines for its future realistic study and implementation. △ Less

Submitted 5 September, 2016; originally announced September 2016.

Comments: Technical Report of detailed mathematical proofs for supporting "From Connected Vehicles to Mobile Relays: Enhanced Wireless Infrastructure for Smarter Cities", which is to appear in IEEE Global Communications Conference, Dec. 2016

arXiv:1312.1593 [pdf, ps, other]

Performance Analysis of Network Coded Systems Under Quasi-static Rayleigh Fading Channels

Authors: Tugcan Aktas, A. Ozgur Yilmaz, Emre Aktas

Abstract: In the area of basic and network coded cooperative communication, the expected end-to-end bit error rate (BER) values are frequently required to compare the proposed coding, relaying, and decoding techniques. Instead of obtaining these values via time consuming Monte Carlo simulations, deriving closed form expressions using approximations is crucial. In this work, the ultimate goal is to derive an… ▽ More In the area of basic and network coded cooperative communication, the expected end-to-end bit error rate (BER) values are frequently required to compare the proposed coding, relaying, and decoding techniques. Instead of obtaining these values via time consuming Monte Carlo simulations, deriving closed form expressions using approximations is crucial. In this work, the ultimate goal is to derive an approximate average BER expression for a network coded system. While reaching this goal, we firstly consider the cooperative systems' instantaneous BER values that are commonly composed of Q-functions of more than one variables. For these Q-functions, we investigate the convergence characteristics of the sampling property and generalize this property to arbitrary functions of multiple variables. Second, we adapt the equivalent channel approach to the network coded scenario for the ease of analysis and propose a network decoder with reduced complexity. Finally, by combining these techniques, we show that the obtained closed form expressions well agree with simulation results in a wide SNR range. △ Less

Submitted 5 December, 2013; originally announced December 2013.

Comments: 22 pages, 7 figures, Submitted to IEEE Transactions on Communications. arXiv admin note: text overlap with arXiv:1301.6471

arXiv:1310.3381 [pdf, ps, other]

doi 10.1109/WCNC.2014.6952123

A Low-Complexity Graph-Based LMMSE Receiver Designed for Colored Noise Induced by FTN-Signaling

Authors: Pinar Sen, Tugcan Aktas, A. Ozgur Yilmaz

Abstract: We propose a low complexity graph-based linear minimum mean square error (LMMSE) equalizer which considers both the intersymbol interference (ISI) and the effect of non-white noise inherent in Faster-than-Nyquist (FTN) signaling. In order to incorporate the statistics of noise signal into the factor graph over which the LMMSE algorithm is implemented, we suggest a method that models it as an autor… ▽ More We propose a low complexity graph-based linear minimum mean square error (LMMSE) equalizer which considers both the intersymbol interference (ISI) and the effect of non-white noise inherent in Faster-than-Nyquist (FTN) signaling. In order to incorporate the statistics of noise signal into the factor graph over which the LMMSE algorithm is implemented, we suggest a method that models it as an autoregressive (AR) process. Furthermore, we develop a new mechanism for exchange of information between the proposed equalizer and the channel decoder through turbo iterations. Based on these improvements, we show that the proposed low complexity receiver structure performs close to the optimal decoder operating in ISI-free ideal scenario without FTN signaling through simulations. △ Less

Submitted 13 May, 2014; v1 submitted 12 October, 2013; originally announced October 2013.

Comments: 6 pages, 6 figures, IEEE Wireless Communications and Networking Conference 2014, Istanbul, Turkey

arXiv:1301.6471 [pdf, ps, other]

doi 10.1109/ISIT.2013.6620185

Generalizing the Sampling Property of the Q-function for Error Rate Analysis of Cooperative Communication in Fading Channels

Authors: Tugcan Aktas, Ali Ozgur Yilmaz, Emre Aktas

Abstract: This paper extends some approximation methods that are used to identify closed form Bit Error Rate (BER) expressions which are frequently utilized in investigation and comparison of performance for wireless communication systems in the literature. By using this group of approximation methods, some expectation integrals, which are complicated to analyze and have high computational complexity to eva… ▽ More This paper extends some approximation methods that are used to identify closed form Bit Error Rate (BER) expressions which are frequently utilized in investigation and comparison of performance for wireless communication systems in the literature. By using this group of approximation methods, some expectation integrals, which are complicated to analyze and have high computational complexity to evaluate through Monte Carlo simulations, are computed. For these integrals, by using the sampling property of the integrand functions of one or more arguments, reliable BER expressions revealing the diversity and coding gains are derived. Although the methods we present are valid for a larger class of integration problems, in this work we show the step by step derivation of the BER expressions for a canonical cooperative communication scenario in addition to a network coded system starting from basic building blocks. The derived expressions agree with the simulation results for a very wide range of signal-to-noise ratio (SNR) values. △ Less

Submitted 28 January, 2013; originally announced January 2013.

Comments: 5 pages, 5 figures, Submitted to IEEE International Symposium on Information Theory, ISIT 2013, Istanbul, Turkey

arXiv:1112.3208 [pdf, ps, other]

Practical Methods for Wireless Network Coding with Multiple Unicast Transmissions

Authors: Tugcan Aktas, A. Ozgur Yilmaz, Emre Aktas

Abstract: We propose a simple yet effective wireless network coding and decoding technique for a multiple unicast network. It utilizes spatial diversity through cooperation between nodes which carry out distributed encoding operations dictated by generator matrices of linear block codes. In order to exemplify the technique, we make use of greedy codes over the binary field and show that the arbitrary divers… ▽ More We propose a simple yet effective wireless network coding and decoding technique for a multiple unicast network. It utilizes spatial diversity through cooperation between nodes which carry out distributed encoding operations dictated by generator matrices of linear block codes. In order to exemplify the technique, we make use of greedy codes over the binary field and show that the arbitrary diversity orders can be flexibly assigned to nodes. Furthermore, we present the optimal detection rule for the given model that accounts for intermediate node errors and suggest a low-complexity network decoder using the sum-product (SP) algorithm. The proposed SP detector exhibits near optimal performance. We also show asymptotic superiority of network coding over a method that utilizes the wireless channel in a repetitive manner without network coding (NC) and give related rate-diversity trade-off curves. Finally, we extend the given encoding method through selective encoding in order to obtain extra coding gains. △ Less

Submitted 5 September, 2012; v1 submitted 14 December, 2011; originally announced December 2011.

Comments: 29 pages, 9 figures, Submitted to the IEEE Transactions on Communications on 14.12.2011, revised on 18.05.2012 and on 04.09.2012. arXiv admin note: text overlap with arXiv:1110.0594

arXiv:1110.0594 [pdf, ps, other]

doi 10.1109/WCNC.2012.6214460

Practical Wireless Network Coding and Decoding Methods for Multiple Unicast Transmissions

Authors: Tugcan Aktas, Ali Ozgur Yilmaz, Emre Aktas

Abstract: We propose a simple yet effective wireless network coding and decoding technique. It utilizes spatial diversity through cooperation between nodes which carry out distributed encoding operations dictated by generator matrices of linear block codes. For this purpose, we make use of greedy codes over the binary field and show that desired diversity orders can be flexibly assigned to nodes in a multip… ▽ More We propose a simple yet effective wireless network coding and decoding technique. It utilizes spatial diversity through cooperation between nodes which carry out distributed encoding operations dictated by generator matrices of linear block codes. For this purpose, we make use of greedy codes over the binary field and show that desired diversity orders can be flexibly assigned to nodes in a multiple unicast network, contrary to the previous findings in the literature. Furthermore, we present the optimal detection rule for the given model that accounts for intermediate node errors and suggest a network decoder using the sum-product algorithm. The proposed sum-product detector exhibits near optimal performance. △ Less

Submitted 4 October, 2011; originally announced October 2011.

Comments: 6 pages, 5 figures, Submitted to WCNC 2012, IEEE Wireless Communication and Networking Conference

Showing 1–9 of 9 results for author: Aktas, T