Search | arXiv e-print repository

Permissioned LLMs: Enforcing Access Control in Large Language Models

Authors: Bargav Jayaraman, Virendra J. Marathe, Hamid Mozaffari, William F. Shen, Krishnaram Kenthapadi

Abstract: In enterprise settings, organizational data is segregated, siloed and carefully protected by elaborate access control frameworks. These access control structures can completely break down if an LLM fine-tuned on the siloed data serves requests, for downstream tasks, from individuals with disparate access privileges. We propose Permissioned LLMs (PermLLM), a new class of LLMs that superimpose the o… ▽ More In enterprise settings, organizational data is segregated, siloed and carefully protected by elaborate access control frameworks. These access control structures can completely break down if an LLM fine-tuned on the siloed data serves requests, for downstream tasks, from individuals with disparate access privileges. We propose Permissioned LLMs (PermLLM), a new class of LLMs that superimpose the organizational data access control structures on query responses they generate. We formalize abstractions underpinning the means to determine whether access control enforcement happens correctly over LLM query responses. Our formalism introduces the notion of a relevant response that can be used to prove whether a PermLLM mechanism has been implemented correctly. We also introduce a novel metric, called access advantage, to empirically evaluate the efficacy of a PermLLM mechanism. We introduce three novel PermLLM mechanisms that build on Parameter Efficient Fine-Tuning to achieve the desired access control. We furthermore present two instantiations of access advantage--(i) Domain Distinguishability Index (DDI) based on Membership Inference Attacks, and (ii) Utility Gap Index (UGI) based on LLM utility evaluation. We demonstrate the efficacy of our PermLLM mechanisms through extensive experiments on four public datasets (GPQA, RCV1, SimpleQA, and WMDP), in addition to evaluating the validity of DDI and UGI metrics themselves for quantifying access control in LLMs. △ Less

Submitted 28 May, 2025; originally announced May 2025.

arXiv:2406.10218 [pdf, other]

Semantic Membership Inference Attack against Large Language Models

Authors: Hamid Mozaffari, Virendra J. Marathe

Abstract: Membership Inference Attacks (MIAs) determine whether a specific data point was included in the training set of a target model. In this paper, we introduce the Semantic Membership Inference Attack (SMIA), a novel approach that enhances MIA performance by leveraging the semantic content of inputs and their perturbations. SMIA trains a neural network to analyze the target model's behavior on perturb… ▽ More Membership Inference Attacks (MIAs) determine whether a specific data point was included in the training set of a target model. In this paper, we introduce the Semantic Membership Inference Attack (SMIA), a novel approach that enhances MIA performance by leveraging the semantic content of inputs and their perturbations. SMIA trains a neural network to analyze the target model's behavior on perturbed inputs, effectively capturing variations in output probability distributions between members and non-members. We conduct comprehensive evaluations on the Pythia and GPT-Neo model families using the Wikipedia dataset. Our results show that SMIA significantly outperforms existing MIAs; for instance, SMIA achieves an AUC-ROC of 67.39% on Pythia-12B, compared to 58.90% by the second-best attack. △ Less

Submitted 14 June, 2024; originally announced June 2024.

arXiv:2403.06319 [pdf, other]

Fake or Compromised? Making Sense of Malicious Clients in Federated Learning

Authors: Hamid Mozaffari, Sunav Choudhary, Amir Houmansadr

Abstract: Federated learning (FL) is a distributed machine learning paradigm that enables training models on decentralized data. The field of FL security against poisoning attacks is plagued with confusion due to the proliferation of research that makes different assumptions about the capabilities of adversaries and the adversary models they operate under. Our work aims to clarify this confusion by presenti… ▽ More Federated learning (FL) is a distributed machine learning paradigm that enables training models on decentralized data. The field of FL security against poisoning attacks is plagued with confusion due to the proliferation of research that makes different assumptions about the capabilities of adversaries and the adversary models they operate under. Our work aims to clarify this confusion by presenting a comprehensive analysis of the various poisoning attacks and defensive aggregation rules (AGRs) proposed in the literature, and connecting them under a common framework. To connect existing adversary models, we present a hybrid adversary model, which lies in the middle of the spectrum of adversaries, where the adversary compromises a few clients, trains a generative (e.g., DDPM) model with their compromised samples, and generates new synthetic data to solve an optimization for a stronger (e.g., cheaper, more practical) attack against different robust aggregation rules. By presenting the spectrum of FL adversaries, we aim to provide practitioners and researchers with a clear understanding of the different types of threats they need to consider when designing FL systems, and identify areas where further research is needed. △ Less

Submitted 10 March, 2024; originally announced March 2024.

arXiv:2208.07922 [pdf, other]

FedPerm: Private and Robust Federated Learning by Parameter Permutation

Authors: Hamid Mozaffari, Virendra J. Marathe, Dave Dice

Abstract: Federated Learning (FL) is a distributed learning paradigm that enables mutually untrusting clients to collaboratively train a common machine learning model. Client data privacy is paramount in FL. At the same time, the model must be protected from poisoning attacks from adversarial clients. Existing solutions address these two problems in isolation. We present FedPerm, a new FL algorithm that add… ▽ More Federated Learning (FL) is a distributed learning paradigm that enables mutually untrusting clients to collaboratively train a common machine learning model. Client data privacy is paramount in FL. At the same time, the model must be protected from poisoning attacks from adversarial clients. Existing solutions address these two problems in isolation. We present FedPerm, a new FL algorithm that addresses both these problems by combining a novel intra-model parameter shuffling technique that amplifies data privacy, with Private Information Retrieval (PIR) based techniques that permit cryptographic aggregation of clients' model updates. The combination of these techniques further helps the federation server constrain parameter updates from clients so as to curtail effects of model poisoning attacks by adversarial clients. We further present FedPerm's unique hyperparameters that can be used effectively to trade off computation overheads with model utility. Our empirical evaluation on the MNIST dataset demonstrates FedPerm's effectiveness over existing Differential Privacy (DP) enforcement solutions in FL. △ Less

Submitted 16 August, 2022; originally announced August 2022.

arXiv:2205.10454 [pdf, other]

E2FL: Equal and Equitable Federated Learning

Authors: Hamid Mozaffari, Amir Houmansadr

Abstract: Federated Learning (FL) enables data owners to train a shared global model without sharing their private data. Unfortunately, FL is susceptible to an intrinsic fairness issue: due to heterogeneity in clients' data distributions, the final trained model can give disproportionate advantages across the participating clients. In this work, we present Equal and Equitable Federated Learning (E2FL) to pr… ▽ More Federated Learning (FL) enables data owners to train a shared global model without sharing their private data. Unfortunately, FL is susceptible to an intrinsic fairness issue: due to heterogeneity in clients' data distributions, the final trained model can give disproportionate advantages across the participating clients. In this work, we present Equal and Equitable Federated Learning (E2FL) to produce fair federated learning models by preserving two main fairness properties, equity and equality, concurrently. We validate the efficiency and fairness of E2FL in different real-world FL applications, and show that E2FL outperforms existing baselines in terms of the resulting efficiency, fairness of different groups, and fairness among all individual clients. △ Less

Submitted 16 August, 2022; v1 submitted 20 May, 2022; originally announced May 2022.

arXiv:2110.13189 [pdf]

Spectral unmixing of Raman microscopic images of single human cells using Independent Component Analysis

Authors: M. Hamed Mozaffari, Li-Lin Tay

Abstract: Application of independent component analysis (ICA) as an unmixing and image clustering technique for high spatial resolution Raman maps is reported. A hyperspectral map of a fixed human cell was collected by a Raman micro spectrometer in a raster pattern on a 0.5um grid. Unlike previously used unsupervised machine learning techniques such as principal component analysis, ICA is based on non-Gauss… ▽ More Application of independent component analysis (ICA) as an unmixing and image clustering technique for high spatial resolution Raman maps is reported. A hyperspectral map of a fixed human cell was collected by a Raman micro spectrometer in a raster pattern on a 0.5um grid. Unlike previously used unsupervised machine learning techniques such as principal component analysis, ICA is based on non-Gaussianity and statistical independence of data which is the case for mixture Raman spectra. Hence, ICA is a great candidate for assembling pseudo-colour maps from the spectral hypercube of Raman spectra. Our experimental results revealed that ICA is capable of reconstructing false colour maps of Raman hyperspectral data of human cells, showing the nuclear region constituents as well as subcellular organelle in the cytoplasm and distribution of mitochondria in the perinuclear region. Minimum preprocessing requirements and label-free nature of the ICA method make it a great unmixed method for extraction of endmembers in Raman hyperspectral maps of living cells. △ Less

Submitted 25 October, 2021; originally announced October 2021.

Comments: 10 pages, 5 figures

arXiv:2110.04350 [pdf, other]

FRL: Federated Rank Learning

Authors: Hamid Mozaffari, Virat Shejwalkar, Amir Houmansadr

Abstract: Federated learning (FL) allows mutually untrusted clients to collaboratively train a common machine learning model without sharing their private/proprietary training data among each other. FL is unfortunately susceptible to poisoning by malicious clients who aim to hamper the accuracy of the commonly trained model through sending malicious model updates during FL's training process. We argue tha… ▽ More Federated learning (FL) allows mutually untrusted clients to collaboratively train a common machine learning model without sharing their private/proprietary training data among each other. FL is unfortunately susceptible to poisoning by malicious clients who aim to hamper the accuracy of the commonly trained model through sending malicious model updates during FL's training process. We argue that the key factor to the success of poisoning attacks against existing FL systems is the large space of model updates available to the clients, allowing malicious clients to search for the most poisonous model updates, e.g., by solving an optimization problem. To address this, we propose Federated Rank Learning (FRL). FRL reduces the space of client updates from model parameter updates (a continuous space of float numbers) in standard FL to the space of parameter rankings (a discrete space of integer values). To be able to train the global model using parameter ranks (instead of parameter weights), FRL leverage ideas from recent supermasks training mechanisms. Specifically, FRL clients rank the parameters of a randomly initialized neural network (provided by the server) based on their local training data. The FRL server uses a voting mechanism to aggregate the parameter rankings submitted by clients in each training epoch to generate the global ranking of the next training epoch. Intuitively, our voting-based aggregation mechanism prevents poisoning clients from making significant adversarial modifications to the global model, as each client will have a single vote! We demonstrate the robustness of FRL to poisoning through analytical proofs and experimentation. We also show FRL's high communication efficiency. Our experiments demonstrate the superiority of FRL in real-world FL settings. △ Less

Submitted 16 August, 2022; v1 submitted 8 October, 2021; originally announced October 2021.

arXiv:2106.05316 [pdf]

Raman spectral analysis of mixtures with one-dimensional convolutional neural network

Authors: M. Hamed Mozaffari, Li-Lin Tay

Abstract: Recently, the combination of robust one-dimensional convolutional neural networks (1-D CNNs) and Raman spectroscopy has shown great promise in rapid identification of unknown substances with good accuracy. Using this technique, researchers can recognize a pure compound and distinguish it from unknown substances in a mixture. The novelty of this approach is that the trained neural network operates… ▽ More Recently, the combination of robust one-dimensional convolutional neural networks (1-D CNNs) and Raman spectroscopy has shown great promise in rapid identification of unknown substances with good accuracy. Using this technique, researchers can recognize a pure compound and distinguish it from unknown substances in a mixture. The novelty of this approach is that the trained neural network operates automatically without any pre- or post-processing of data. Some studies have attempted to extend this technique to the classification of pure compounds in an unknown mixture. However, the application of 1-D CNNs has typically been restricted to binary classifications of pure compounds. Here we will highlight a new approach in spectral recognition and quantification of chemical components in a multicomponent mixture. Two 1-D CNN models, RaMixNet I and II, have been developed for this purpose. The former is for rapid classification of components in a mixture while the latter is for quantitative determination of those constituents. In the proposed method, there is no limit to the number of compounds in a mixture. A data augmentation method is also introduced by adding random baselines to the Raman spectra. The experimental results revealed that the classification accuracy of RaMixNet I and II is 100% for analysis of unknown test mixtures; at the same time, the RaMixNet II model may achieve a regression accuracy of 88% for the quantification of each component. △ Less

Submitted 1 June, 2021; originally announced June 2021.

Comments: 9 pages, 5 tables, 3 figures

arXiv:2104.12839 [pdf]

One-dimensional Active Contour Models for Raman Spectrum Baseline Correction

Authors: M. Hamed Mozaffari, Li-Lin Tay

Abstract: Raman spectroscopy is a powerful and non-invasive method for analysis of chemicals and detection of unknown substances. However, Raman signal is so weak that background noise can distort the actual Raman signal. These baseline shifts that exist in the Raman spectrum might deteriorate analytical results. In this paper, a modified version of active contour models in one-dimensional space has been pr… ▽ More Raman spectroscopy is a powerful and non-invasive method for analysis of chemicals and detection of unknown substances. However, Raman signal is so weak that background noise can distort the actual Raman signal. These baseline shifts that exist in the Raman spectrum might deteriorate analytical results. In this paper, a modified version of active contour models in one-dimensional space has been proposed for the baseline correction of Raman spectra. Our technique, inspired by principles of physics and heuristic optimization methods, iteratively deforms an initialized curve toward the desired baseline. The performance of the proposed algorithm was evaluated and compared with similar techniques using simulated Raman spectra. The results showed that the 1D active contour model outperforms many iterative baseline correction methods. The proposed algorithm was successfully applied to experimental Raman spectral data, and the results indicate that the baseline of Raman spectra can be automatically subtracted. △ Less

Submitted 26 April, 2021; originally announced April 2021.

Comments: 4 figures, and 9 pages

arXiv:2009.10380 [pdf, other]

PS8-Net: A Deep Convolutional Neural Network to Predict the Eight-State Protein Secondary Structure

Authors: Md Aminur Rab Ratul, Maryam Tavakol Elahi, M. Hamed Mozaffari, WonSook Lee

Abstract: Protein secondary structure is crucial to creating an information bridge between the primary and tertiary (3D) structures. Precise prediction of eight-state protein secondary structure (PSS) has significantly utilized in the structural and functional analysis of proteins in bioinformatics. Deep learning techniques have been recently applied in this research area and raised the eight-state (Q8) pro… ▽ More Protein secondary structure is crucial to creating an information bridge between the primary and tertiary (3D) structures. Precise prediction of eight-state protein secondary structure (PSS) has significantly utilized in the structural and functional analysis of proteins in bioinformatics. Deep learning techniques have been recently applied in this research area and raised the eight-state (Q8) protein secondary structure prediction accuracy remarkably. Nevertheless, from a theoretical standpoint, there are still lots of rooms for improvement, specifically in the eight-state PSS prediction. In this study, we have presented a new deep convolutional neural network (DCNN), namely PS8-Net, to enhance the accuracy of eight-class PSS prediction. The input of this architecture is a carefully constructed feature matrix from the proteins sequence features and profile features. We introduce a new PS8 module in the network, which is applied with skip connection to extracting the long-term inter-dependencies from higher layers, obtaining local contexts in earlier layers, and achieving global information during secondary structure prediction. Our proposed PS8-Net achieves 76.89%, 71.94%, 76.86%, and 75.26% Q8 accuracy respectively on benchmark CullPdb6133, CB513, CASP10, and CASP11 datasets. This architecture enables the efficient processing of local and global interdependencies between amino acids to make an accurate prediction of each class. To the best of our knowledge, PS8-Net experiment results demonstrate that it outperforms all the state-of-the-art methods on the aforementioned benchmark datasets. △ Less

Submitted 22 September, 2020; originally announced September 2020.

arXiv:2006.10575 [pdf]

A Review of 1D Convolutional Neural Networks toward Unknown Substance Identification in Portable Raman Spectrometer

Authors: M. Hamed Mozaffari, Li-Lin Tay

Abstract: Raman spectroscopy is a powerful analytical tool with applications ranging from quality control to cutting edge biomedical research. One particular area which has seen tremendous advances in the past decade is the development of powerful handheld Raman spectrometers. They have been adopted widely by first responders and law enforcement agencies for the field analysis of unknown substances. Field d… ▽ More Raman spectroscopy is a powerful analytical tool with applications ranging from quality control to cutting edge biomedical research. One particular area which has seen tremendous advances in the past decade is the development of powerful handheld Raman spectrometers. They have been adopted widely by first responders and law enforcement agencies for the field analysis of unknown substances. Field detection and identification of unknown substances with Raman spectroscopy rely heavily on the spectral matching capability of the devices on hand. Conventional spectral matching algorithms (such as correlation, dot product, etc.) have been used in identifying unknown Raman spectrum by comparing the unknown to a large reference database. This is typically achieved through brute-force summation of pixel-by-pixel differences between the reference and the unknown spectrum. Conventional algorithms have noticeable drawbacks. For example, they tend to work well with identifying pure compounds but less so for mixture compounds. For instance, limited reference spectra inaccessible databases with a large number of classes relative to the number of samples have been a setback for the widespread usage of Raman spectroscopy for field analysis applications. State-of-the-art deep learning methods (specifically convolutional neural networks CNNs), as an alternative approach, presents a number of advantages over conventional spectral comparison algorism. With optimization, they are ideal to be deployed in handheld spectrometers for field detection of unknown substances. In this study, we present a comprehensive survey in the use of one-dimensional CNNs for Raman spectrum identification. Specifically, we highlight the use of this powerful deep learning technique for handheld Raman spectrometers taking into consideration the potential limit in power consumption and computation ability of handheld systems. △ Less

Submitted 18 June, 2020; originally announced June 2020.

Comments: 19 pages, 1 figure, 5 tables

arXiv:2003.08808 [pdf, other]

Deep Learning for Automatic Tracking of Tongue Surface in Real-time Ultrasound Videos, Landmarks instead of Contours

Authors: M. Hamed Mozaffari, Won-Sook Lee

Abstract: One usage of medical ultrasound imaging is to visualize and characterize human tongue shape and motion during a real-time speech to study healthy or impaired speech production. Due to the low-contrast characteristic and noisy nature of ultrasound images, it might require expertise for non-expert users to recognize tongue gestures in applications such as visual training of a second language. Moreov… ▽ More One usage of medical ultrasound imaging is to visualize and characterize human tongue shape and motion during a real-time speech to study healthy or impaired speech production. Due to the low-contrast characteristic and noisy nature of ultrasound images, it might require expertise for non-expert users to recognize tongue gestures in applications such as visual training of a second language. Moreover, quantitative analysis of tongue motion needs the tongue dorsum contour to be extracted, tracked, and visualized. Manual tongue contour extraction is a cumbersome, subjective, and error-prone task. Furthermore, it is not a feasible solution for real-time applications. The growth of deep learning has been vigorously exploited in various computer vision tasks, including ultrasound tongue contour tracking. In the current methods, the process of tongue contour extraction comprises two steps of image segmentation and post-processing. This paper presents a new novel approach of automatic and real-time tongue contour tracking using deep neural networks. In the proposed method, instead of the two-step procedure, landmarks of the tongue surface are tracked. This novel idea enables researchers in this filed to benefits from available previously annotated databases to achieve high accuracy results. Our experiment disclosed the outstanding performances of the proposed technique in terms of generalization, performance, and accuracy. △ Less

Submitted 15 March, 2020; originally announced March 2020.

Comments: 8 pages, 5 figures

arXiv:1911.09840 [pdf, other]

Real-time Ultrasound-enhanced Multimodal Imaging of Tongue using 3D Printable Stabilizer System: A Deep Learning Approach

Authors: M. Hamed Mozaffari, Won-Sook Lee

Abstract: Despite renewed awareness of the importance of articulation, it remains a challenge for instructors to handle the pronunciation needs of language learners. There are relatively scarce pedagogical tools for pronunciation teaching and learning. Unlike inefficient, traditional pronunciation instructions like listening and repeating, electronic visual feedback (EVF) systems such as ultrasound technolo… ▽ More Despite renewed awareness of the importance of articulation, it remains a challenge for instructors to handle the pronunciation needs of language learners. There are relatively scarce pedagogical tools for pronunciation teaching and learning. Unlike inefficient, traditional pronunciation instructions like listening and repeating, electronic visual feedback (EVF) systems such as ultrasound technology have been employed in new approaches. Recently, an ultrasound-enhanced multimodal method has been developed for visualizing tongue movements of a language learner overlaid on the face-side of the speaker's head. That system was evaluated for several language courses via a blended learning paradigm at the university level. The result was asserted that visualizing the articulator's system as biofeedback to language learners will significantly improve articulation learning efficiency. In spite of the successful usage of multimodal techniques for pronunciation training, it still requires manual works and human manipulation. In this article, we aim to contribute to this growing body of research by addressing difficulties of the previous approaches by proposing a new comprehensive, automatic, real-time multimodal pronunciation training system, benefits from powerful artificial intelligence techniques. The main objective of this research was to combine the advantages of ultrasound technology, three-dimensional printing, and deep learning algorithms to enhance the performance of previous systems. Our preliminary pedagogical evaluation of the proposed system revealed a significant improvement in flexibility, control, robustness, and autonomy. △ Less

Submitted 21 November, 2019; originally announced November 2019.

Comments: 12 figures, 1 table

Journal ref: Canadian Acoustics. 48, 1 (Mar. 2020)

arXiv:1911.03972 [pdf]

IrisNet: Deep Learning for Automatic and Real-time Tongue Contour Tracking in Ultrasound Video Data using Peripheral Vision

Authors: M. Hamed Mozaffari, Md. Aminur Rab Ratul, Won-Sook Lee

Abstract: The progress of deep convolutional neural networks has been successfully exploited in various real-time computer vision tasks such as image classification and segmentation. Owing to the development of computational units, availability of digital datasets, and improved performance of deep learning models, fully automatic and accurate tracking of tongue contours in real-time ultrasound data became p… ▽ More The progress of deep convolutional neural networks has been successfully exploited in various real-time computer vision tasks such as image classification and segmentation. Owing to the development of computational units, availability of digital datasets, and improved performance of deep learning models, fully automatic and accurate tracking of tongue contours in real-time ultrasound data became practical only in recent years. Recent studies have shown that the performance of deep learning techniques is significant in the tracking of ultrasound tongue contours in real-time applications such as pronunciation training using multimodal ultrasound-enhanced approaches. Due to the high correlation between ultrasound tongue datasets, it is feasible to have a general model that accomplishes automatic tongue tracking for almost all datasets. In this paper, we proposed a deep learning model comprises of a convolutional module mimicking the peripheral vision ability of the human eye to handle real-time, accurate, and fully automatic tongue contour tracking tasks, applicable for almost all primary ultrasound tongue datasets. Qualitative and quantitative assessment of IrisNet on different ultrasound tongue datasets and PASCAL VOC2012 revealed its outstanding generalization achievement in compare with similar techniques. △ Less

Submitted 17 April, 2020; v1 submitted 10 November, 2019; originally announced November 2019.

arXiv:1906.04301 [pdf, other]

doi 10.1121/1.5137211

Transfer Learning for Ultrasound Tongue Contour Extraction with Different Domains

Authors: M. Hamed Mozaffari, Won-Sook Lee

Abstract: Medical ultrasound technology is widely used in routine clinical applications such as disease diagnosis and treatment as well as other applications like real-time monitoring of human tongue shapes and motions as visual feedback in second language training. Due to the low-contrast characteristic and noisy nature of ultrasound images, it might require expertise for non-expert users to recognize tong… ▽ More Medical ultrasound technology is widely used in routine clinical applications such as disease diagnosis and treatment as well as other applications like real-time monitoring of human tongue shapes and motions as visual feedback in second language training. Due to the low-contrast characteristic and noisy nature of ultrasound images, it might require expertise for non-expert users to recognize tongue gestures. Manual tongue segmentation is a cumbersome, subjective, and error-prone task. Furthermore, it is not a feasible solution for real-time applications. In the last few years, deep learning methods have been used for delineating and tracking tongue dorsum. Deep convolutional neural networks (DCNNs), which have shown to be successful in medical image analysis tasks, are typically weak for the same task on different domains. In many cases, DCNNs trained on data acquired with one ultrasound device, do not perform well on data of varying ultrasound device or acquisition protocol. Domain adaptation is an alternative solution for this difficulty by transferring the weights from the model trained on a large annotated legacy dataset to a new model for adapting on another different dataset using fine-tuning. In this study, after conducting extensive experiments, we addressed the problem of domain adaptation on small ultrasound datasets for tongue contour extraction. We trained a U-net network comprises of an encoder-decoder path from scratch, and then with several surrogate scenarios, some parts of the trained network were fine-tuned on another dataset as the domain-adapted networks. We repeat scenarios from target to source domains to find a balance point for knowledge transfer from source to target and vice versa. The performance of new fine-tuned networks was evaluated on the same task with images from different domains. △ Less

Submitted 10 June, 2019; originally announced June 2019.

Comments: 3 figures, 9 pages, 1 table, 16 references

Journal ref: The Journal of the Acoustical Society of America 146, 2940 (2019)

arXiv:1906.04232 [pdf]

doi 10.1121/1.5137212

BowNet: Dilated Convolution Neural Network for Ultrasound Tongue Contour Extraction

Authors: M. Hamed Mozaffari, Won-Sook Lee

Abstract: Ultrasound imaging is safe, relatively affordable, and capable of real-time performance. One application of this technology is to visualize and to characterize human tongue shape and motion during a real-time speech to study healthy or impaired speech production. Due to the noisy nature of ultrasound images with low-contrast characteristic, it might require expertise for non-expert users to recogn… ▽ More Ultrasound imaging is safe, relatively affordable, and capable of real-time performance. One application of this technology is to visualize and to characterize human tongue shape and motion during a real-time speech to study healthy or impaired speech production. Due to the noisy nature of ultrasound images with low-contrast characteristic, it might require expertise for non-expert users to recognize organ shape such as tongue surface (dorsum). To alleviate this difficulty for quantitative analysis of tongue shape and motion, tongue surface can be extracted, tracked, and visualized instead of the whole tongue region. Delineating the tongue surface from each frame is a cumbersome, subjective, and error-prone task. Furthermore, the rapidity and complexity of tongue gestures have made it a challenging task, and manual segmentation is not a feasible solution for real-time applications. Employing the power of state-of-the-art deep neural network models and training techniques, it is feasible to implement new fully-automatic, accurate, and robust segmentation methods with the capability of real-time performance, applicable for tracking of the tongue contours during the speech. This paper presents two novel deep neural network models named BowNet and wBowNet benefits from the ability of global prediction of decoding-encoding models, with integrated multi-scale contextual information, and capability of full-resolution (local) extraction of dilated convolutions. Experimental results using several ultrasound tongue image datasets revealed that the combination of both localization and globalization searching could improve prediction result significantly. Assessment of BowNet models using both qualitatively and quantitatively studies showed them outstanding achievements in terms of accuracy and robustness in comparison with similar techniques. △ Less

Submitted 10 June, 2019; originally announced June 2019.

Comments: 23 pages, 15 figures, 10 tables

Journal ref: BowNet: Dilated convolutional neural network for ultrasound tongue contour extraction, 2019, The Journal of the Acoustical Society of America, pages 2940-2941, volume 146, number 4

arXiv:1611.09811 [pdf, other]

3D Ultrasound image segmentation: A Survey

Authors: Mohammad Hamed Mozaffari, WonSook Lee

Abstract: Three-dimensional Ultrasound image segmentation methods are surveyed in this paper. The focus of this report is to investigate applications of these techniques and a review of the original ideas and concepts. Although many two-dimensional image segmentation in the literature have been considered as a three-dimensional approach by mistake but we review them as a three-dimensional technique. We sele… ▽ More Three-dimensional Ultrasound image segmentation methods are surveyed in this paper. The focus of this report is to investigate applications of these techniques and a review of the original ideas and concepts. Although many two-dimensional image segmentation in the literature have been considered as a three-dimensional approach by mistake but we review them as a three-dimensional technique. We select the studies that have addressed the problem of medical three-dimensional Ultrasound image segmentation utilizing their proposed techniques. The evaluation methods and comparison between them are presented and tabulated in terms of evaluation techniques, interactivity, and robustness. △ Less

Submitted 29 November, 2016; originally announced November 2016.

arXiv:1605.04806 [pdf]

Multilevel Thresholding Segmentation of T2 weighted Brain MRI images using Convergent Heterogeneous Particle Swarm Optimization

Authors: Mohammad Hamed Mozaffari, Won-Sook Lee

Abstract: This paper proposes a new image thresholding segmentation approach using the heuristic method, Convergent Heterogeneous Particle Swarm Optimization algorithm. The proposed algorithm incorporates a new strategy of searching the problem space by dividing the swarm into subswarms. Each subswarm particles search for better solution separately lead to better exploitation while they cooperate with each… ▽ More This paper proposes a new image thresholding segmentation approach using the heuristic method, Convergent Heterogeneous Particle Swarm Optimization algorithm. The proposed algorithm incorporates a new strategy of searching the problem space by dividing the swarm into subswarms. Each subswarm particles search for better solution separately lead to better exploitation while they cooperate with each other to find the best global position. The consequence of the aforementioned cooperation is better exploration, convergence and it able the algorithm to jump from local optimal solution to the better spots. A practical application of this method is demonstrated for the problem of medical image thresholding segmentation. We considered two classical thresholding techniques of Otsu and Kapur separately as the objective function for the optimization method and applied on a set of brain MR images. Comparative experimental results reveal that the proposed method outperforms another state of the art method from the literature in terms of accuracy, computation time and stable results. △ Less

Submitted 16 May, 2016; originally announced May 2016.

Comments: Journal

Showing 1–18 of 18 results for author: Mozaffari, H