Search | arXiv e-print repository

Mitigating Domain Shift in Federated Learning via Intra- and Inter-Domain Prototypes

Authors: Huy Q. Le, Ye Lin Tun, Yu Qiao, Minh N. H. Nguyen, Keon Oh Kim, Choong Seon Hong

Abstract: Federated Learning (FL) has emerged as a decentralized machine learning technique, allowing clients to train a global model collaboratively without sharing private data. However, most FL studies ignore the crucial challenge of heterogeneous domains where each client has a distinct feature distribution, which is popular in real-world scenarios. Prototype learning, which leverages the mean feature v… ▽ More Federated Learning (FL) has emerged as a decentralized machine learning technique, allowing clients to train a global model collaboratively without sharing private data. However, most FL studies ignore the crucial challenge of heterogeneous domains where each client has a distinct feature distribution, which is popular in real-world scenarios. Prototype learning, which leverages the mean feature vectors within the same classes, has become a prominent solution for federated learning under domain shift. However, existing federated prototype learning methods focus soley on inter-domain prototypes and neglect intra-domain perspectives. In this work, we introduce a novel federated prototype learning method, namely I$^2$PFL, which incorporates $\textbf{I}$ntra-domain and $\textbf{I}$nter-domain $\textbf{P}$rototypes, to mitigate domain shift from both perspectives and learn a generalized global model across multiple domains in federated learning. To construct intra-domain prototypes, we propose feature alignment with MixUp-based augmented prototypes to capture the diversity within local domains and enhance the generalization of local features. Additionally, we introduce a reweighting mechanism for inter-domain prototypes to generate generalized prototypes that reduce domain shift while providing inter-domain knowledge across multiple clients. Extensive experiments on the Digits, Office-10, and PACS datasets illustrate the superior performance of our method compared to other baselines. △ Less

Submitted 9 March, 2025; v1 submitted 14 January, 2025; originally announced January 2025.

Comments: 13 pages, 11 figures, 7 tables

arXiv:2412.03871 [pdf, other]

CLIP-PING: Boosting Lightweight Vision-Language Models with Proximus Intrinsic Neighbors Guidance

Authors: Chu Myaet Thwal, Ye Lin Tun, Minh N. H. Nguyen, Eui-Nam Huh, Choong Seon Hong

Abstract: Beyond the success of Contrastive Language-Image Pre-training (CLIP), recent trends mark a shift toward exploring the applicability of lightweight vision-language models for resource-constrained scenarios. These models often deliver suboptimal performance when relying solely on a single image-text contrastive learning objective, spotlighting the need for more effective training mechanisms that gua… ▽ More Beyond the success of Contrastive Language-Image Pre-training (CLIP), recent trends mark a shift toward exploring the applicability of lightweight vision-language models for resource-constrained scenarios. These models often deliver suboptimal performance when relying solely on a single image-text contrastive learning objective, spotlighting the need for more effective training mechanisms that guarantee robust cross-modal feature alignment. In this work, we propose CLIP-PING: Contrastive Language-Image Pre-training with Proximus Intrinsic Neighbors Guidance, a novel yet simple and efficient training paradigm designed to boost the performance of lightweight vision-language models with minimal computational overhead and lower data demands. CLIP-PING bootstraps unimodal features extracted from arbitrary pre-trained encoders to obtain intrinsic guidance of proximus neighbor samples, i.e., nearest-neighbor (NN) and cross nearest-neighbor (XNN). We find that extra contrastive supervision from these neighbors substantially boosts cross-modal alignment, enabling lightweight models to learn more generic features with rich semantic diversity. Extensive experiments reveal that CLIP-PING notably surpasses its peers in zero-shot generalization and cross-modal retrieval tasks. Specifically, a 5.5% gain on zero-shot ImageNet1K classification with 10.7% (I2T) and 5.7% (T2I) on Flickr30K retrieval, compared to the original CLIP when using ViT-XS image encoder trained on 3 million (image, text) pairs. Moreover, CLIP-PING showcases a strong transferability under the linear evaluation protocol across several downstream tasks. △ Less

Submitted 18 March, 2025; v1 submitted 4 December, 2024; originally announced December 2024.

Comments: 14 pages, 5 figures, 24 tables

arXiv:2407.15426 [pdf, other]

Resource-Efficient Federated Multimodal Learning via Layer-wise and Progressive Training

Authors: Ye Lin Tun, Chu Myaet Thwal, Minh N. H. Nguyen, Choong Seon Hong

Abstract: Combining different data modalities enables deep neural networks to tackle complex tasks more effectively, making multimodal learning increasingly popular. To harness multimodal data closer to end users, it is essential to integrate multimodal learning with privacy-preserving approaches like federated learning (FL). However, compared to conventional unimodal learning, multimodal setting requires d… ▽ More Combining different data modalities enables deep neural networks to tackle complex tasks more effectively, making multimodal learning increasingly popular. To harness multimodal data closer to end users, it is essential to integrate multimodal learning with privacy-preserving approaches like federated learning (FL). However, compared to conventional unimodal learning, multimodal setting requires dedicated encoders for each modality, resulting in larger and more complex models. Training these models requires significant resources, presenting a substantial challenge for FL clients operating with limited computation and communication resources. To address these challenges, we introduce LW-FedMML, a layer-wise federated multimodal learning approach which decomposes the training process into multiple stages. Each stage focuses on training only a portion of the model, thereby significantly reducing the memory and computational requirements. Moreover, FL clients only need to exchange the trained model portion with the central server, lowering the resulting communication cost. We conduct extensive experiments across various FL and multimodal learning settings to validate the effectiveness of our proposed method. The results demonstrate that LW-FedMML can compete with conventional end-to-end federated multimodal learning (FedMML) while significantly reducing the resource burden on FL clients. Specifically, LW-FedMML reduces memory usage by up to $2.7\times$, computational operations (FLOPs) by $2.4\times$, and total communication cost by $2.3\times$. We also explore a progressive training approach called Prog-FedMML. While it offers lesser resource efficiency than LW-FedMML, Prog-FedMML has the potential to surpass the performance of end-to-end FedMML, making it a viable option for scenarios with fewer resource constraints. △ Less

Submitted 20 October, 2024; v1 submitted 22 July, 2024; originally announced July 2024.

arXiv:2406.03773 [pdf, other]

doi 10.1109/LCOMM.2024.3499956

Optimizing Multi-User Semantic Communication via Transfer Learning and Knowledge Distillation

Authors: Loc X. Nguyen, Kitae Kim, Ye Lin Tun, Sheikh Salman Hassan, Yan Kyaw Tun, Zhu Han, Choong Seon Hong

Abstract: Semantic communication, notable for ensuring quality of service by jointly optimizing source and channel coding, effectively extracts data semantics, reduces transmission length, and mitigates channel noise. However, most studies overlook multi-user scenarios and resource availability, limiting real-world application. This paper addresses this gap by focusing on downlink communication from a base… ▽ More Semantic communication, notable for ensuring quality of service by jointly optimizing source and channel coding, effectively extracts data semantics, reduces transmission length, and mitigates channel noise. However, most studies overlook multi-user scenarios and resource availability, limiting real-world application. This paper addresses this gap by focusing on downlink communication from a base station to multiple users with varying computing capacities. Users employ variants of Swin transformer models for source decoding and a simple architecture for channel decoding. We propose a novel training regimen, incorporating transfer learning and knowledge distillation to improve low-computing users' performance. Extensive simulations validate the proposed methods. △ Less

Submitted 6 June, 2024; originally announced June 2024.

Comments: 5 pages, 5 figures

Journal ref: in IEEE Communications Letters, vol. 29, no. 1, pp. 90-94, Jan. 2025

arXiv:2402.06638 [pdf, other]

doi 10.1109/ICOIN56518.2023.10048928

Transformers with Attentive Federated Aggregation for Time Series Stock Forecasting

Authors: Chu Myaet Thwal, Ye Lin Tun, Kitae Kim, Seong-Bae Park, Choong Seon Hong

Abstract: Recent innovations in transformers have shown their superior performance in natural language processing (NLP) and computer vision (CV). The ability to capture long-range dependencies and interactions in sequential data has also triggered a great interest in time series modeling, leading to the widespread use of transformers in many time series applications. However, being the most common and cruci… ▽ More Recent innovations in transformers have shown their superior performance in natural language processing (NLP) and computer vision (CV). The ability to capture long-range dependencies and interactions in sequential data has also triggered a great interest in time series modeling, leading to the widespread use of transformers in many time series applications. However, being the most common and crucial application, the adaptation of transformers to time series forecasting has remained limited, with both promising and inconsistent results. In contrast to the challenges in NLP and CV, time series problems not only add the complexity of order or temporal dependence among input sequences but also consider trend, level, and seasonality information that much of this data is valuable for decision making. The conventional training scheme has shown deficiencies regarding model overfitting, data scarcity, and privacy issues when working with transformers for a forecasting task. In this work, we propose attentive federated transformers for time series stock forecasting with better performance while preserving the privacy of participating enterprises. Empirical results on various stock data from the Yahoo! Finance website indicate the superiority of our proposed scheme in dealing with the above challenges and data heterogeneity in federated learning. △ Less

Submitted 22 January, 2024; originally announced February 2024.

Comments: Published in IEEE ICOIN 2023

arXiv:2401.13898 [pdf, other]

Cross-Modal Prototype based Multimodal Federated Learning under Severely Missing Modality

Authors: Huy Q. Le, Chu Myaet Thwal, Yu Qiao, Ye Lin Tun, Minh N. H. Nguyen, Eui-Nam Huh, Choong Seon Hong

Abstract: Multimodal federated learning (MFL) has emerged as a decentralized machine learning paradigm, allowing multiple clients with different modalities to collaborate on training a global model across diverse data sources without sharing their private data. However, challenges, such as data heterogeneity and severely missing modalities, pose crucial hindrances to the robustness of MFL, significantly imp… ▽ More Multimodal federated learning (MFL) has emerged as a decentralized machine learning paradigm, allowing multiple clients with different modalities to collaborate on training a global model across diverse data sources without sharing their private data. However, challenges, such as data heterogeneity and severely missing modalities, pose crucial hindrances to the robustness of MFL, significantly impacting the performance of global model. The occurrence of missing modalities in real-world applications, such as autonomous driving, often arises from factors like sensor failures, leading knowledge gaps during the training process. Specifically, the absence of a modality introduces misalignment during the local training phase, stemming from zero-filling in the case of clients with missing modalities. Consequently, achieving robust generalization in global model becomes imperative, especially when dealing with clients that have incomplete data. In this paper, we propose $\textbf{Multimodal Federated Cross Prototype Learning (MFCPL)}$, a novel approach for MFL under severely missing modalities. Our MFCPL leverages the complete prototypes to provide diverse modality knowledge in modality-shared level with the cross-modal regularization and modality-specific level with cross-modal contrastive mechanism. Additionally, our approach introduces the cross-modal alignment to provide regularization for modality-specific features, thereby enhancing the overall performance, particularly in scenarios involving severely missing modalities. Through extensive experiments on three multimodal datasets, we demonstrate the effectiveness of MFCPL in mitigating the challenges of data heterogeneity and severely missing modalities while improving the overall performance and robustness of MFL. △ Less

Submitted 6 March, 2025; v1 submitted 24 January, 2024; originally announced January 2024.

Comments: 14 pages, 8 figures, 11 tables

arXiv:2401.11736 [pdf, other]

doi 10.1109/BigComp51126.2021.00035

Attention on Personalized Clinical Decision Support System: Federated Learning Approach

Authors: Chu Myaet Thwal, Kyi Thar, Ye Lin Tun, Choong Seon Hong

Abstract: Health management has become a primary problem as new kinds of diseases and complex symptoms are introduced to a rapidly growing modern society. Building a better and smarter healthcare infrastructure is one of the ultimate goals of a smart city. To the best of our knowledge, neural network models are already employed to assist healthcare professionals in achieving this goal. Typically, training a… ▽ More Health management has become a primary problem as new kinds of diseases and complex symptoms are introduced to a rapidly growing modern society. Building a better and smarter healthcare infrastructure is one of the ultimate goals of a smart city. To the best of our knowledge, neural network models are already employed to assist healthcare professionals in achieving this goal. Typically, training a neural network requires a rich amount of data but heterogeneous and vulnerable properties of clinical data introduce a challenge for the traditional centralized network. Moreover, adding new inputs to a medical database requires re-training an existing model from scratch. To tackle these challenges, we proposed a deep learning-based clinical decision support system trained and managed under a federated learning paradigm. We focused on a novel strategy to guarantee the safety of patient privacy and overcome the risk of cyberattacks while enabling large-scale clinical data mining. As a result, we can leverage rich clinical data for training each local neural network without the need for exchanging the confidential data of patients. Moreover, we implemented the proposed scheme as a sequence-to-sequence model architecture integrating the attention mechanism. Thus, our objective is to provide a personalized clinical decision support system with evolvable characteristics that can deliver accurate solutions and assist healthcare professionals in medical diagnosing. △ Less

Submitted 22 January, 2024; originally announced January 2024.

Comments: Published in IEEE BigComp 2021

arXiv:2401.11652 [pdf, other]

doi 10.1016/j.neunet.2023.11.044

OnDev-LCT: On-Device Lightweight Convolutional Transformers towards federated learning

Authors: Chu Myaet Thwal, Minh N. H. Nguyen, Ye Lin Tun, Seong Tae Kim, My T. Thai, Choong Seon Hong

Abstract: Federated learning (FL) has emerged as a promising approach to collaboratively train machine learning models across multiple edge devices while preserving privacy. The success of FL hinges on the efficiency of participating models and their ability to handle the unique challenges of distributed learning. While several variants of Vision Transformer (ViT) have shown great potential as alternatives… ▽ More Federated learning (FL) has emerged as a promising approach to collaboratively train machine learning models across multiple edge devices while preserving privacy. The success of FL hinges on the efficiency of participating models and their ability to handle the unique challenges of distributed learning. While several variants of Vision Transformer (ViT) have shown great potential as alternatives to modern convolutional neural networks (CNNs) for centralized training, the unprecedented size and higher computational demands hinder their deployment on resource-constrained edge devices, challenging their widespread application in FL. Since client devices in FL typically have limited computing resources and communication bandwidth, models intended for such devices must strike a balance between model size, computational efficiency, and the ability to adapt to the diverse and non-IID data distributions encountered in FL. To address these challenges, we propose OnDev-LCT: Lightweight Convolutional Transformers for On-Device vision tasks with limited training data and resources. Our models incorporate image-specific inductive biases through the LCT tokenizer by leveraging efficient depthwise separable convolutions in residual linear bottleneck blocks to extract local features, while the multi-head self-attention (MHSA) mechanism in the LCT encoder implicitly facilitates capturing global representations of images. Extensive experiments on benchmark image datasets indicate that our models outperform existing lightweight vision models while having fewer parameters and lower computational demands, making them suitable for FL scenarios with data heterogeneity and communication bottlenecks. △ Less

Submitted 21 January, 2024; originally announced January 2024.

Comments: Published in Neural Networks

arXiv:2401.11647 [pdf, other]

LW-FedSSL: Resource-efficient Layer-wise Federated Self-supervised Learning

Authors: Ye Lin Tun, Chu Myaet Thwal, Huy Q. Le, Minh N. H. Nguyen, Choong Seon Hong

Abstract: Many studies integrate federated learning (FL) with self-supervised learning (SSL) to take advantage of raw data distributed across edge devices. However, edge devices often struggle with high computational and communication costs imposed by SSL and FL algorithms. With the deployment of more complex and large-scale models, such as Transformers, these challenges are exacerbated. To tackle this, we… ▽ More Many studies integrate federated learning (FL) with self-supervised learning (SSL) to take advantage of raw data distributed across edge devices. However, edge devices often struggle with high computational and communication costs imposed by SSL and FL algorithms. With the deployment of more complex and large-scale models, such as Transformers, these challenges are exacerbated. To tackle this, we propose the Layer-Wise Federated Self-Supervised Learning (LW-FedSSL) approach, which allows edge devices to incrementally train a small part of the model at a time. Specifically, in LW-FedSSL, training is decomposed into multiple stages, with each stage responsible for only a specific layer (or a block of layers) of the model. Since only a portion of the model is active for training at any given time, LW-FedSSL significantly reduces computational requirements. Additionally, only the active model portion needs to be exchanged between the FL server and clients, reducing the communication overhead. This enables LW-FedSSL to jointly address both computational and communication challenges in FL. Depending on the SSL algorithm used, it can achieve up to a $3.34 \times$ reduction in memory usage, $4.20 \times$ fewer computational operations (GFLOPs), and a $5.07 \times$ lower communication cost while maintaining performance comparable to its end-to-end training counterpart. Furthermore, we explore a progressive training strategy called Prog-FedSSL, which offers a $1.84\times$ reduction in GFLOPs and a $1.67\times$ reduction in communication costs while maintaining the same memory requirements as end-to-end training. While the resource efficiency of Prog-FedSSL is lower than that of LW-FedSSL, its performance improvements make it a viable candidate for FL environments with more lenient resource constraints. △ Less

Submitted 26 February, 2025; v1 submitted 21 January, 2024; originally announced January 2024.

arXiv:2311.16538 [pdf, other]

Federated Learning with Diffusion Models for Privacy-Sensitive Vision Tasks

Authors: Ye Lin Tun, Chu Myaet Thwal, Ji Su Yoon, Sun Moo Kang, Chaoning Zhang, Choong Seon Hong

Abstract: Diffusion models have shown great potential for vision-related tasks, particularly for image generation. However, their training is typically conducted in a centralized manner, relying on data collected from publicly available sources. This approach may not be feasible or practical in many domains, such as the medical field, which involves privacy concerns over data collection. Despite the challen… ▽ More Diffusion models have shown great potential for vision-related tasks, particularly for image generation. However, their training is typically conducted in a centralized manner, relying on data collected from publicly available sources. This approach may not be feasible or practical in many domains, such as the medical field, which involves privacy concerns over data collection. Despite the challenges associated with privacy-sensitive data, such domains could still benefit from valuable vision services provided by diffusion models. Federated learning (FL) plays a crucial role in enabling decentralized model training without compromising data privacy. Instead of collecting data, an FL system gathers model parameters, effectively safeguarding the private data of different parties involved. This makes FL systems vital for managing decentralized learning tasks, especially in scenarios where privacy-sensitive data is distributed across a network of clients. Nonetheless, FL presents its own set of challenges due to its distributed nature and privacy-preserving properties. Therefore, in this study, we explore the FL strategy to train diffusion models, paving the way for the development of federated diffusion models. We conduct experiments on various FL scenarios, and our findings demonstrate that federated diffusion models have great potential to deliver vision services to privacy-sensitive domains. △ Less

Submitted 28 November, 2023; originally announced November 2023.

arXiv:2311.16535 [pdf, other]

doi 10.1016/j.neunet.2023.06.010

Contrastive encoder pre-training-based clustered federated learning for heterogeneous data

Authors: Ye Lin Tun, Minh N. H. Nguyen, Chu Myaet Thwal, Jinwoo Choi, Choong Seon Hong

Abstract: Federated learning (FL) is a promising approach that enables distributed clients to collaboratively train a global model while preserving their data privacy. However, FL often suffers from data heterogeneity problems, which can significantly affect its performance. To address this, clustered federated learning (CFL) has been proposed to construct personalized models for different client clusters.… ▽ More Federated learning (FL) is a promising approach that enables distributed clients to collaboratively train a global model while preserving their data privacy. However, FL often suffers from data heterogeneity problems, which can significantly affect its performance. To address this, clustered federated learning (CFL) has been proposed to construct personalized models for different client clusters. One effective client clustering strategy is to allow clients to choose their own local models from a model pool based on their performance. However, without pre-trained model parameters, such a strategy is prone to clustering failure, in which all clients choose the same model. Unfortunately, collecting a large amount of labeled data for pre-training can be costly and impractical in distributed environments. To overcome this challenge, we leverage self-supervised contrastive learning to exploit unlabeled data for the pre-training of FL systems. Together, self-supervised pre-training and client clustering can be crucial components for tackling the data heterogeneity issues of FL. Leveraging these two crucial strategies, we propose contrastive pre-training-based clustered federated learning (CP-CFL) to improve the model convergence and overall performance of FL systems. In this work, we demonstrate the effectiveness of CP-CFL through extensive experiments in heterogeneous FL settings, and present various interesting observations. △ Less

Submitted 28 November, 2023; originally announced November 2023.

Comments: Published in Neural Networks

arXiv:2310.13236 [pdf, other]

doi 10.1109/TVT.2024.3401140

An Efficient Federated Learning Framework for Training Semantic Communication System

Authors: Loc X. Nguyen, Huy Q. Le, Ye Lin Tun, Pyae Sone Aung, Yan Kyaw Tun, Zhu Han, Choong Seon Hong

Abstract: Semantic communication has emerged as a pillar for the next generation of communication systems due to its capabilities in alleviating data redundancy. Most semantic communication systems are built upon advanced deep learning models whose training performance heavily relies on data availability. Existing studies often make unrealistic assumptions of a readily accessible data source, where in pract… ▽ More Semantic communication has emerged as a pillar for the next generation of communication systems due to its capabilities in alleviating data redundancy. Most semantic communication systems are built upon advanced deep learning models whose training performance heavily relies on data availability. Existing studies often make unrealistic assumptions of a readily accessible data source, where in practice, data is mainly created on the client side. Due to privacy and security concerns, the transmission of data is restricted, which is necessary for conventional centralized training schemes. To address this challenge, we explore semantic communication in a federated learning (FL) setting that utilizes client data without leaking privacy. Additionally, we design our system to tackle the communication overhead by reducing the quantity of information delivered in each global round. In this way, we can save significant bandwidth for resource-limited devices and reduce overall network traffic. Finally, we introduce a mechanism to aggregate the global model from clients, called FedLol. Extensive simulation results demonstrate the effectiveness of our proposed technique compared to baseline methods. △ Less

Submitted 9 November, 2023; v1 submitted 19 October, 2023; originally announced October 2023.

Comments: 5 pages, 3 figures

Journal ref: in IEEE Transactions on Vehicular Technology, vol. 73, no. 10, pp. 15872-15877, Oct. 2024

arXiv:2307.03402 [pdf, other]

doi 10.1109/TVT.2024.3362328

Swin Transformer-Based Dynamic Semantic Communication for Multi-User with Different Computing Capacity

Authors: Loc X. Nguyen, Ye Lin Tun, Yan Kyaw Tun, Minh N. H. Nguyen, Chaoning Zhang, Zhu Han, Choong Seon Hong

Abstract: Semantic communication has gained significant attention from researchers as a promising technique to replace conventional communication in the next generation of communication systems, primarily due to its ability to reduce communication costs. However, little literature has studied its effectiveness in multi-user scenarios, particularly when there are variations in the model architectures used by… ▽ More Semantic communication has gained significant attention from researchers as a promising technique to replace conventional communication in the next generation of communication systems, primarily due to its ability to reduce communication costs. However, little literature has studied its effectiveness in multi-user scenarios, particularly when there are variations in the model architectures used by users and their computing capacities. To address this issue, we explore a semantic communication system that caters to multiple users with different model architectures by using a multi-purpose transmitter at the base station (BS). Specifically, the BS in the proposed framework employs semantic and channel encoders to encode the image for transmission, while the receiver utilizes its local channel and semantic decoder to reconstruct the original image. Our joint source-channel encoder at the BS can effectively extract and compress semantic features for specific users by considering the signal-to-noise ratio (SNR) and computing capacity of the user. Based on the network status, the joint source-channel encoder at the BS can adaptively adjust the length of the transmitted signal. A longer signal ensures more information for high-quality image reconstruction for the user, while a shorter signal helps avoid network congestion. In addition, we propose a hybrid loss function for training, which enhances the perceptual details of reconstructed images. Finally, we conduct a series of extensive evaluations and ablation studies to validate the effectiveness of the proposed system. △ Less

Submitted 7 July, 2023; originally announced July 2023.

Comments: 14 pages, 10 figures

Journal ref: in IEEE Transactions on Vehicular Technology, vol. 73, no. 6, pp. 8957-8972, June 2024

arXiv:2303.11717 [pdf, other]

A Complete Survey on Generative AI (AIGC): Is ChatGPT from GPT-4 to GPT-5 All You Need?

Authors: Chaoning Zhang, Chenshuang Zhang, Sheng Zheng, Yu Qiao, Chenghao Li, Mengchun Zhang, Sumit Kumar Dam, Chu Myaet Thwal, Ye Lin Tun, Le Luang Huy, Donguk kim, Sung-Ho Bae, Lik-Hang Lee, Yang Yang, Heng Tao Shen, In So Kweon, Choong Seon Hong

Abstract: As ChatGPT goes viral, generative AI (AIGC, a.k.a AI-generated content) has made headlines everywhere because of its ability to analyze and create text, images, and beyond. With such overwhelming media coverage, it is almost impossible for us to miss the opportunity to glimpse AIGC from a certain angle. In the era of AI transitioning from pure analysis to creation, it is worth noting that ChatGPT,… ▽ More As ChatGPT goes viral, generative AI (AIGC, a.k.a AI-generated content) has made headlines everywhere because of its ability to analyze and create text, images, and beyond. With such overwhelming media coverage, it is almost impossible for us to miss the opportunity to glimpse AIGC from a certain angle. In the era of AI transitioning from pure analysis to creation, it is worth noting that ChatGPT, with its most recent language model GPT-4, is just a tool out of numerous AIGC tasks. Impressed by the capability of the ChatGPT, many people are wondering about its limits: can GPT-5 (or other future GPT variants) help ChatGPT unify all AIGC tasks for diversified content creation? Toward answering this question, a comprehensive review of existing AIGC tasks is needed. As such, our work comes to fill this gap promptly by offering a first look at AIGC, ranging from its techniques to applications. Modern generative AI relies on various technical foundations, ranging from model architecture and self-supervised pretraining to generative modeling methods (like GAN and diffusion models). After introducing the fundamental techniques, this work focuses on the technological development of various AIGC tasks based on their output type, including text, images, videos, 3D content, etc., which depicts the full potential of ChatGPT's future. Moreover, we summarize their significant applications in some mainstream industries, such as education and creativity content. Finally, we discuss the challenges currently faced and present an outlook on how generative AI might evolve in the near future. △ Less

Submitted 21 March, 2023; originally announced March 2023.

Comments: 56 pages, 548 citations

arXiv:2210.15850 [pdf, other]

doi 10.1109/BigComp51126.2021.00039

Federated Learning based Energy Demand Prediction with Clustered Aggregation

Authors: Ye Lin Tun, Kyi Thar, Chu Myaet Thwal, Choong Seon Hong

Abstract: To reduce negative environmental impacts, power stations and energy grids need to optimize the resources required for power production. Thus, predicting the energy consumption of clients is becoming an important part of every energy management system. Energy usage information collected by the clients' smart homes can be used to train a deep neural network to predict the future energy demand. Colle… ▽ More To reduce negative environmental impacts, power stations and energy grids need to optimize the resources required for power production. Thus, predicting the energy consumption of clients is becoming an important part of every energy management system. Energy usage information collected by the clients' smart homes can be used to train a deep neural network to predict the future energy demand. Collecting data from a large number of distributed clients for centralized model training is expensive in terms of communication resources. To take advantage of distributed data in edge systems, centralized training can be replaced by federated learning where each client only needs to upload model updates produced by training on its local data. These model updates are aggregated into a single global model by the server. But since different clients can have different attributes, model updates can have diverse weights and as a result, it can take a long time for the aggregated global model to converge. To speed up the convergence process, we can apply clustering to group clients based on their properties and aggregate model updates from the same cluster together to produce a cluster specific global model. In this paper, we propose a recurrent neural network based energy demand predictor, trained with federated learning on clustered clients to take advantage of distributed data and speed up the convergence process. △ Less

Submitted 27 October, 2022; originally announced October 2022.

Comments: Accepted by BigComp 2021

arXiv:2210.15827 [pdf, other]

doi 10.1109/BigComp57234.2023.00017

Federated Learning with Intermediate Representation Regularization

Authors: Ye Lin Tun, Chu Myaet Thwal, Yu Min Park, Seong-Bae Park, Choong Seon Hong

Abstract: In contrast to centralized model training that involves data collection, federated learning (FL) enables remote clients to collaboratively train a model without exposing their private data. However, model performance usually degrades in FL due to the heterogeneous data generated by clients of diverse characteristics. One promising strategy to maintain good performance is by limiting the local trai… ▽ More In contrast to centralized model training that involves data collection, federated learning (FL) enables remote clients to collaboratively train a model without exposing their private data. However, model performance usually degrades in FL due to the heterogeneous data generated by clients of diverse characteristics. One promising strategy to maintain good performance is by limiting the local training from drifting far away from the global model. Previous studies accomplish this by regularizing the distance between the representations learned by the local and global models. However, they only consider representations from the early layers of a model or the layer preceding the output layer. In this study, we introduce FedIntR, which provides a more fine-grained regularization by integrating the representations of intermediate layers into the local training process. Specifically, FedIntR computes a regularization term that encourages the closeness between the intermediate layer representations of the local and global models. Additionally, FedIntR automatically determines the contribution of each layer's representation to the regularization term based on the similarity between local and global representations. We conduct extensive experiments on various datasets to show that FedIntR can achieve equivalent or higher performance compared to the state-of-the-art approaches. Our code is available at https://github.com/YLTun/FedIntR. △ Less

Submitted 20 April, 2023; v1 submitted 27 October, 2022; originally announced October 2022.

Comments: IEEE BigComp 2023

Showing 1–16 of 16 results for author: Tun, Y L