-
FAIRE: Assessing Racial and Gender Bias in AI-Driven Resume Evaluations
Authors:
Athena Wen,
Tanush Patil,
Ansh Saxena,
Yicheng Fu,
Sean O'Brien,
Kevin Zhu
Abstract:
In an era where AI-driven hiring is transforming recruitment practices, concerns about fairness and bias have become increasingly important. To explore these issues, we introduce a benchmark, FAIRE (Fairness Assessment In Resume Evaluation), to test for racial and gender bias in large language models (LLMs) used to evaluate resumes across different industries. We use two methods-direct scoring and…
▽ More
In an era where AI-driven hiring is transforming recruitment practices, concerns about fairness and bias have become increasingly important. To explore these issues, we introduce a benchmark, FAIRE (Fairness Assessment In Resume Evaluation), to test for racial and gender bias in large language models (LLMs) used to evaluate resumes across different industries. We use two methods-direct scoring and ranking-to measure how model performance changes when resumes are slightly altered to reflect different racial or gender identities. Our findings reveal that while every model exhibits some degree of bias, the magnitude and direction vary considerably. This benchmark provides a clear way to examine these differences and offers valuable insights into the fairness of AI-based hiring tools. It highlights the urgent need for strategies to reduce bias in AI-driven recruitment. Our benchmark code and dataset are open-sourced at our repository: https://github.com/athenawen/FAIRE-Fairness-Assessment-In-Resume-Evaluation.git.
△ Less
Submitted 2 April, 2025;
originally announced April 2025.
-
Stop Overthinking: A Survey on Efficient Reasoning for Large Language Models
Authors:
Yang Sui,
Yu-Neng Chuang,
Guanchu Wang,
Jiamu Zhang,
Tianyi Zhang,
Jiayi Yuan,
Hongyi Liu,
Andrew Wen,
Shaochen Zhong,
Hanjie Chen,
Xia Hu
Abstract:
Large Language Models (LLMs) have demonstrated remarkable capabilities in complex tasks. Recent advancements in Large Reasoning Models (LRMs), such as OpenAI o1 and DeepSeek-R1, have further improved performance in System-2 reasoning domains like mathematics and programming by harnessing supervised fine-tuning (SFT) and reinforcement learning (RL) techniques to enhance the Chain-of-Thought (CoT) r…
▽ More
Large Language Models (LLMs) have demonstrated remarkable capabilities in complex tasks. Recent advancements in Large Reasoning Models (LRMs), such as OpenAI o1 and DeepSeek-R1, have further improved performance in System-2 reasoning domains like mathematics and programming by harnessing supervised fine-tuning (SFT) and reinforcement learning (RL) techniques to enhance the Chain-of-Thought (CoT) reasoning. However, while longer CoT reasoning sequences improve performance, they also introduce significant computational overhead due to verbose and redundant outputs, known as the "overthinking phenomenon". In this paper, we provide the first structured survey to systematically investigate and explore the current progress toward achieving efficient reasoning in LLMs. Overall, relying on the inherent mechanism of LLMs, we categorize existing works into several key directions: (1) model-based efficient reasoning, which considers optimizing full-length reasoning models into more concise reasoning models or directly training efficient reasoning models; (2) reasoning output-based efficient reasoning, which aims to dynamically reduce reasoning steps and length during inference; (3) input prompts-based efficient reasoning, which seeks to enhance reasoning efficiency based on input prompt properties such as difficulty or length control. Additionally, we introduce the use of efficient data for training reasoning models, explore the reasoning capabilities of small language models, and discuss evaluation methods and benchmarking.
△ Less
Submitted 23 April, 2025; v1 submitted 20 March, 2025;
originally announced March 2025.
-
The Science of Evaluating Foundation Models
Authors:
Jiayi Yuan,
Jiamu Zhang,
Andrew Wen,
Xia Hu
Abstract:
The emergent phenomena of large foundation models have revolutionized natural language processing. However, evaluating these models presents significant challenges due to their size, capabilities, and deployment across diverse applications. Existing literature often focuses on individual aspects, such as benchmark performance or specific tasks, but fails to provide a cohesive process that integrat…
▽ More
The emergent phenomena of large foundation models have revolutionized natural language processing. However, evaluating these models presents significant challenges due to their size, capabilities, and deployment across diverse applications. Existing literature often focuses on individual aspects, such as benchmark performance or specific tasks, but fails to provide a cohesive process that integrates the nuances of diverse use cases with broader ethical and operational considerations. This work focuses on three key aspects: (1) Formalizing the Evaluation Process by providing a structured framework tailored to specific use-case contexts, (2) Offering Actionable Tools and Frameworks such as checklists and templates to ensure thorough, reproducible, and practical evaluations, and (3) Surveying Recent Work with a targeted review of advancements in LLM evaluation, emphasizing real-world applications.
△ Less
Submitted 12 February, 2025;
originally announced February 2025.
-
Explainable Diagnosis Prediction through Neuro-Symbolic Integration
Authors:
Qiuhao Lu,
Rui Li,
Elham Sagheb,
Andrew Wen,
Jinlian Wang,
Liwei Wang,
Jungwei W. Fan,
Hongfang Liu
Abstract:
Diagnosis prediction is a critical task in healthcare, where timely and accurate identification of medical conditions can significantly impact patient outcomes. Traditional machine learning and deep learning models have achieved notable success in this domain but often lack interpretability which is a crucial requirement in clinical settings. In this study, we explore the use of neuro-symbolic met…
▽ More
Diagnosis prediction is a critical task in healthcare, where timely and accurate identification of medical conditions can significantly impact patient outcomes. Traditional machine learning and deep learning models have achieved notable success in this domain but often lack interpretability which is a crucial requirement in clinical settings. In this study, we explore the use of neuro-symbolic methods, specifically Logical Neural Networks (LNNs), to develop explainable models for diagnosis prediction. Essentially, we design and implement LNN-based models that integrate domain-specific knowledge through logical rules with learnable thresholds. Our models, particularly $M_{\text{multi-pathway}}$ and $M_{\text{comprehensive}}$, demonstrate superior performance over traditional models such as Logistic Regression, SVM, and Random Forest, achieving higher accuracy (up to 80.52\%) and AUROC scores (up to 0.8457) in the case study of diabetes prediction. The learned weights and thresholds within the LNN models provide direct insights into feature contributions, enhancing interpretability without compromising predictive power. These findings highlight the potential of neuro-symbolic approaches in bridging the gap between accuracy and explainability in healthcare AI applications. By offering transparent and adaptable diagnostic models, our work contributes to the advancement of precision medicine and supports the development of equitable healthcare solutions. Future research will focus on extending these methods to larger and more diverse datasets to further validate their applicability across different medical conditions and populations.
△ Less
Submitted 7 January, 2025; v1 submitted 1 October, 2024;
originally announced October 2024.
-
Accelerating the Low-Rank Decomposed Models
Authors:
Habib Hajimolahoseini,
Walid Ahmed,
Austin Wen,
Yang Liu
Abstract:
Tensor decomposition is a mathematically supported technique for data compression. It consists of applying some kind of a Low Rank Decomposition technique on the tensors or matrices in order to reduce the redundancy of the data. However, it is not a popular technique for compressing the AI models duo to the high number of new layers added to the architecture after decomposition. Although the numbe…
▽ More
Tensor decomposition is a mathematically supported technique for data compression. It consists of applying some kind of a Low Rank Decomposition technique on the tensors or matrices in order to reduce the redundancy of the data. However, it is not a popular technique for compressing the AI models duo to the high number of new layers added to the architecture after decomposition. Although the number of parameters could shrink significantly, it could result in the model be more than twice deeper which could add some latency to the training or inference. In this paper, we present a comprehensive study about how to modify low rank decomposition technique in AI models so that we could benefit from both high accuracy and low memory consumption as well as speeding up the training and inference
△ Less
Submitted 24 July, 2024;
originally announced July 2024.
-
Is 3D Convolution with 5D Tensors Really Necessary for Video Analysis?
Authors:
Habib Hajimolahoseini,
Walid Ahmed,
Austin Wen,
Yang Liu
Abstract:
In this paper, we present a comprehensive study and propose several novel techniques for implementing 3D convolutional blocks using 2D and/or 1D convolutions with only 4D and/or 3D tensors. Our motivation is that 3D convolutions with 5D tensors are computationally very expensive and they may not be supported by some of the edge devices used in real-time applications such as robots. The existing ap…
▽ More
In this paper, we present a comprehensive study and propose several novel techniques for implementing 3D convolutional blocks using 2D and/or 1D convolutions with only 4D and/or 3D tensors. Our motivation is that 3D convolutions with 5D tensors are computationally very expensive and they may not be supported by some of the edge devices used in real-time applications such as robots. The existing approaches mitigate this by splitting the 3D kernels into spatial and temporal domains, but they still use 3D convolutions with 5D tensors in their implementations. We resolve this issue by introducing some appropriate 4D/3D tensor reshaping as well as new combination techniques for spatial and temporal splits. The proposed implementation methods show significant improvement both in terms of efficiency and accuracy. The experimental results confirm that the proposed spatio-temporal processing structure outperforms the original model in terms of speed and accuracy using only 4D tensors with fewer parameters.
△ Less
Submitted 23 July, 2024;
originally announced July 2024.
-
The Collection of a Human Robot Collaboration Dataset for Cooperative Assembly in Glovebox Environments
Authors:
Shivansh Sharma,
Mathew Huang,
Sanat Nair,
Alan Wen,
Christina Petlowany,
Juston Moore,
Selma Wanna,
Mitch Pryor
Abstract:
Industry 4.0 introduced AI as a transformative solution for modernizing manufacturing processes. Its successor, Industry 5.0, envisions humans as collaborators and experts guiding these AI-driven manufacturing solutions. Developing these techniques necessitates algorithms capable of safe, real-time identification of human positions in a scene, particularly their hands, during collaborative assembl…
▽ More
Industry 4.0 introduced AI as a transformative solution for modernizing manufacturing processes. Its successor, Industry 5.0, envisions humans as collaborators and experts guiding these AI-driven manufacturing solutions. Developing these techniques necessitates algorithms capable of safe, real-time identification of human positions in a scene, particularly their hands, during collaborative assembly. Although substantial efforts have curated datasets for hand segmentation, most focus on residential or commercial domains. Existing datasets targeting industrial settings predominantly rely on synthetic data, which we demonstrate does not effectively transfer to real-world operations. Moreover, these datasets lack uncertainty estimations critical for safe collaboration. Addressing these gaps, we present HAGS: Hand and Glove Segmentation Dataset. This dataset provides challenging examples to build applications toward hand and glove segmentation in industrial human-robot collaboration scenarios as well as assess out-of-distribution images, constructed via green screen augmentations, to determine ML-classifier robustness. We study state-of-the-art, real-time segmentation models to evaluate existing methods. Our dataset and baselines are publicly available.
△ Less
Submitted 13 January, 2025; v1 submitted 19 July, 2024;
originally announced July 2024.
-
Large Language Models Struggle in Token-Level Clinical Named Entity Recognition
Authors:
Qiuhao Lu,
Rui Li,
Andrew Wen,
Jinlian Wang,
Liwei Wang,
Hongfang Liu
Abstract:
Large Language Models (LLMs) have revolutionized various sectors, including healthcare where they are employed in diverse applications. Their utility is particularly significant in the context of rare diseases, where data scarcity, complexity, and specificity pose considerable challenges. In the clinical domain, Named Entity Recognition (NER) stands out as an essential task and it plays a crucial…
▽ More
Large Language Models (LLMs) have revolutionized various sectors, including healthcare where they are employed in diverse applications. Their utility is particularly significant in the context of rare diseases, where data scarcity, complexity, and specificity pose considerable challenges. In the clinical domain, Named Entity Recognition (NER) stands out as an essential task and it plays a crucial role in extracting relevant information from clinical texts. Despite the promise of LLMs, current research mostly concentrates on document-level NER, identifying entities in a more general context across entire documents, without extracting their precise location. Additionally, efforts have been directed towards adapting ChatGPT for token-level NER. However, there is a significant research gap when it comes to employing token-level NER for clinical texts, especially with the use of local open-source LLMs. This study aims to bridge this gap by investigating the effectiveness of both proprietary and local LLMs in token-level clinical NER. Essentially, we delve into the capabilities of these models through a series of experiments involving zero-shot prompting, few-shot prompting, retrieval-augmented generation (RAG), and instruction-fine-tuning. Our exploration reveals the inherent challenges LLMs face in token-level NER, particularly in the context of rare diseases, and suggests possible improvements for their application in healthcare. This research contributes to narrowing a significant gap in healthcare informatics and offers insights that could lead to a more refined application of LLMs in the healthcare sector.
△ Less
Submitted 16 August, 2024; v1 submitted 30 June, 2024;
originally announced July 2024.
-
Experimental investigation of trans-scale displacement responses of wrinkle defects in fiber reinforced composite laminates
Authors:
Li Ma,
Shoulong Wang,
Changchen Liu,
Ange Wen,
Kaidi Ying,
Jing Guo
Abstract:
Wrinkle defects were found widely exist in the field of industrial products, i.e. wind turbine blades and filament-wound composite pressure vessels. The magnitude of wrinkle wavelength varies from several millimeters to over one hundred millimeters. Locating the wrinkle defects and measuring their responses are very important to the assessment of the structures that containing wrinkle defects. A m…
▽ More
Wrinkle defects were found widely exist in the field of industrial products, i.e. wind turbine blades and filament-wound composite pressure vessels. The magnitude of wrinkle wavelength varies from several millimeters to over one hundred millimeters. Locating the wrinkle defects and measuring their responses are very important to the assessment of the structures that containing wrinkle defects. A meso-mechanical modeling is presented based on the homogenization method to obtain the effective stiffness of a graded wrinkle. The finite element simulation predicts the trans-scale response of out-of-plane displacement of wrinkled laminates, where the maximum displacement ranges from nanoscale to millimeter scale. Such trans-scale effect requires different measurement approaches to observe the displacement responses. Here we employed Shearography (Speckle Pattern Shearing Interferometry) and fringe projection profilometry (FPP) method respectively according to the different magnitude of displacement. In FPP method, a displacement extraction algorithm was presented to obtain the out-of-plane displacement. The measurement sensitivity and accuracy of Shearography and FPP are compared, which provides a quantitative reference for industrial non-destructive test.
△ Less
Submitted 21 May, 2024;
originally announced May 2024.
-
SkipViT: Speeding Up Vision Transformers with a Token-Level Skip Connection
Authors:
Foozhan Ataiefard,
Walid Ahmed,
Habib Hajimolahoseini,
Saina Asani,
Farnoosh Javadi,
Mohammad Hassanpour,
Omar Mohamed Awad,
Austin Wen,
Kangling Liu,
Yang Liu
Abstract:
Vision transformers are known to be more computationally and data-intensive than CNN models. These transformer models such as ViT, require all the input image tokens to learn the relationship among them. However, many of these tokens are not informative and may contain irrelevant information such as unrelated background or unimportant scenery. These tokens are overlooked by the multi-head self-att…
▽ More
Vision transformers are known to be more computationally and data-intensive than CNN models. These transformer models such as ViT, require all the input image tokens to learn the relationship among them. However, many of these tokens are not informative and may contain irrelevant information such as unrelated background or unimportant scenery. These tokens are overlooked by the multi-head self-attention (MHSA), resulting in many redundant and unnecessary computations in MHSA and the feed-forward network (FFN). In this work, we propose a method to optimize the amount of unnecessary interactions between unimportant tokens by separating and sending them through a different low-cost computational path. Our method does not add any parameters to the ViT model and aims to find the best trade-off between training throughput and achieving a 0% loss in the Top-1 accuracy of the final model. Our experimental results on training ViT-small from scratch show that SkipViT is capable of effectively dropping 55% of the tokens while gaining more than 13% training throughput and maintaining classification accuracy at the level of the baseline model on Huawei Ascend910A.
△ Less
Submitted 26 January, 2024;
originally announced January 2024.
-
SwiftLearn: A Data-Efficient Training Method of Deep Learning Models using Importance Sampling
Authors:
Habib Hajimolahoseini,
Omar Mohamed Awad,
Walid Ahmed,
Austin Wen,
Saina Asani,
Mohammad Hassanpour,
Farnoosh Javadi,
Mehdi Ahmadi,
Foozhan Ataiefard,
Kangling Liu,
Yang Liu
Abstract:
In this paper, we present SwiftLearn, a data-efficient approach to accelerate training of deep learning models using a subset of data samples selected during the warm-up stages of training. This subset is selected based on an importance criteria measured over the entire dataset during warm-up stages, aiming to preserve the model performance with fewer examples during the rest of training. The impo…
▽ More
In this paper, we present SwiftLearn, a data-efficient approach to accelerate training of deep learning models using a subset of data samples selected during the warm-up stages of training. This subset is selected based on an importance criteria measured over the entire dataset during warm-up stages, aiming to preserve the model performance with fewer examples during the rest of training. The importance measure we propose could be updated during training every once in a while, to make sure that all of the data samples have a chance to return to the training loop if they show a higher importance. The model architecture is unchanged but since the number of data samples controls the number of forward and backward passes during training, we can reduce the training time by reducing the number of training samples used in each epoch of training. Experimental results on a variety of CV and NLP models during both pretraining and finetuning show that the model performance could be preserved while achieving a significant speed-up during training. More specifically, BERT finetuning on GLUE benchmark shows that almost 90% of the data can be dropped achieving an end-to-end average speedup of 3.36x while keeping the average accuracy drop less than 0.92%.
△ Less
Submitted 25 November, 2023;
originally announced November 2023.
-
GQKVA: Efficient Pre-training of Transformers by Grouping Queries, Keys, and Values
Authors:
Farnoosh Javadi,
Walid Ahmed,
Habib Hajimolahoseini,
Foozhan Ataiefard,
Mohammad Hassanpour,
Saina Asani,
Austin Wen,
Omar Mohamed Awad,
Kangling Liu,
Yang Liu
Abstract:
Massive transformer-based models face several challenges, including slow and computationally intensive pre-training and over-parametrization. This paper addresses these challenges by proposing a versatile method called GQKVA, which generalizes query, key, and value grouping techniques. GQKVA is designed to speed up transformer pre-training while reducing the model size. Our experiments with variou…
▽ More
Massive transformer-based models face several challenges, including slow and computationally intensive pre-training and over-parametrization. This paper addresses these challenges by proposing a versatile method called GQKVA, which generalizes query, key, and value grouping techniques. GQKVA is designed to speed up transformer pre-training while reducing the model size. Our experiments with various GQKVA variants highlight a clear trade-off between performance and model size, allowing for customized choices based on resource and time limitations. Our findings also indicate that the conventional multi-head attention approach is not always the best choice, as there are lighter and faster alternatives available. We tested our method on ViT, which achieved an approximate 0.3% increase in accuracy while reducing the model size by about 4% in the task of image classification. Additionally, our most aggressive model reduction experiment resulted in a reduction of approximately 15% in model size, with only around a 1% drop in accuracy.
△ Less
Submitted 13 December, 2023; v1 submitted 6 November, 2023;
originally announced November 2023.
-
Speeding up Resnet Architecture with Layers Targeted Low Rank Decomposition
Authors:
Walid Ahmed,
Habib Hajimolahoseini,
Austin Wen,
Yang Liu
Abstract:
Compression of a neural network can help in speeding up both the training and the inference of the network. In this research, we study applying compression using low rank decomposition on network layers. Our research demonstrates that to acquire a speed up, the compression methodology should be aware of the underlying hardware as analysis should be done to choose which layers to compress. The adva…
▽ More
Compression of a neural network can help in speeding up both the training and the inference of the network. In this research, we study applying compression using low rank decomposition on network layers. Our research demonstrates that to acquire a speed up, the compression methodology should be aware of the underlying hardware as analysis should be done to choose which layers to compress. The advantage of our approach is demonstrated via a case study of compressing ResNet50 and training on full ImageNet-ILSVRC2012. We tested on two different hardware systems Nvidia V100 and Huawei Ascend910. With hardware targeted compression, results on Ascend910 showed 5.36% training speedup and 15.79% inference speed on Ascend310 with only 1% drop in accuracy compared to the original uncompressed model
△ Less
Submitted 21 September, 2023;
originally announced September 2023.
-
Test-time Detection and Repair of Adversarial Samples via Masked Autoencoder
Authors:
Yun-Yun Tsai,
Ju-Chin Chao,
Albert Wen,
Zhaoyuan Yang,
Chengzhi Mao,
Tapan Shah,
Junfeng Yang
Abstract:
Training-time defenses, known as adversarial training, incur high training costs and do not generalize to unseen attacks. Test-time defenses solve these issues but most existing test-time defenses require adapting the model weights, therefore they do not work on frozen models and complicate model memory management. The only test-time defense that does not adapt model weights aims to adapt the inpu…
▽ More
Training-time defenses, known as adversarial training, incur high training costs and do not generalize to unseen attacks. Test-time defenses solve these issues but most existing test-time defenses require adapting the model weights, therefore they do not work on frozen models and complicate model memory management. The only test-time defense that does not adapt model weights aims to adapt the input with self-supervision tasks. However, we empirically found these self-supervision tasks are not sensitive enough to detect adversarial attacks accurately. In this paper, we propose DRAM, a novel defense method to detect and repair adversarial samples at test time via Masked autoencoder (MAE). We demonstrate how to use MAE losses to build a Kolmogorov-Smirnov test to detect adversarial samples. Moreover, we use the MAE losses to calculate input reversal vectors that repair adversarial samples resulting from previously unseen attacks. Results on large-scale ImageNet dataset show that, compared to all detection baselines evaluated, DRAM achieves the best detection rate (82% on average) on all eight adversarial attacks evaluated. For attack repair, DRAM improves the robust accuracy by 6% ~ 41% for standard ResNet50 and 3% ~ 8% for robust ResNet50 compared with the baselines that use contrastive learning and rotation prediction.
△ Less
Submitted 2 April, 2023; v1 submitted 22 March, 2023;
originally announced March 2023.
-
Understanding the Challenges of Team-Based Live Streaming for First-person Shooter Games
Authors:
Jiaye Li,
Minghao Li,
Zikai Alex Wen,
Wei Cai
Abstract:
First-person shooter (FPS) game tournaments take place across the globe. A growing number of people choose to watch FPS games online instead of attending the game events in person. However, live streaming might miss critical highlight moments in the game, including kills and tactics. We identify how and why the live streaming team fails to capture highlight moments to reduce such live streaming mi…
▽ More
First-person shooter (FPS) game tournaments take place across the globe. A growing number of people choose to watch FPS games online instead of attending the game events in person. However, live streaming might miss critical highlight moments in the game, including kills and tactics. We identify how and why the live streaming team fails to capture highlight moments to reduce such live streaming mistakes. We named such mistakes jarring observations. We conducted a field study of live streaming competitions of Game For Peace, a popular FPS mobile game, to summarize five typical jarring observations and identify three primary reasons that caused the issues. We further studied how to improve the live streaming system to prevent jarring observations from happening by doing semi-structured interviews with two professional streaming teams for Game For Peace. The study showed that a better system should (1) add a new sub-team role to share the director's responsibility of managing observers; (2) provide interfaces customized for three roles of live streamers in the team; (3) abstract more geographical info; (4) predict the priority of observation targets; and (5) provide non-verbal interfaces for sync-up between sub-teams. Our work provides insights for esports streaming system researchers and developers to improve the system for a smoother audience experience.
△ Less
Submitted 16 August, 2022;
originally announced August 2022.
-
What Features Influence Impact Feel? A Study of Impact Feedback in Action Games
Authors:
Zhonghao Lin,
Haihan Duan,
Zikai Alex Wen,
Wei Cai
Abstract:
Making the hit effect satisfy players is a long-standing problem faced by action game designers. However, no research systematically analyzed which game design elements affect such game feel. There is not even a term to describe it. So, we propose to use impact feel to describe the player's feeling when receiving juicy impact feedback. After collecting player's comments on action games from Steam'…
▽ More
Making the hit effect satisfy players is a long-standing problem faced by action game designers. However, no research systematically analyzed which game design elements affect such game feel. There is not even a term to describe it. So, we propose to use impact feel to describe the player's feeling when receiving juicy impact feedback. After collecting player's comments on action games from Steam's top seller list, we trained a natural language processing (NLP) model to rank action games with their performance on impact feel. We presented a 19-feature framework of impact feedback design and examined it in the top eight and last eight games. We listed an inventory of the usage of features and found that hit stop, sound coherence, and camera control may strongly influence players' impact feel. A lack of dedicated design on one of these three features may ruin players' impact feel. Our findings may become an evaluation metric for future studies.
△ Less
Submitted 22 August, 2022; v1 submitted 12 August, 2022;
originally announced August 2022.
-
New Differential Privacy Communication Pipeline and Design Framework
Authors:
Jingyu Jia,
Zikai Alex Wen,
Zheli Liu,
Changyu Dong
Abstract:
Organizations started to adopt differential privacy (DP) techniques hoping to persuade more users to share personal data with them. However, many users do not understand DP techniques, thus may not be willing to share. Previous research suggested that the design of DP mechanism communication could influence users' willingness to share data. Based on the prior work, we propose a new communication p…
▽ More
Organizations started to adopt differential privacy (DP) techniques hoping to persuade more users to share personal data with them. However, many users do not understand DP techniques, thus may not be willing to share. Previous research suggested that the design of DP mechanism communication could influence users' willingness to share data. Based on the prior work, we propose a new communication pipeline that starts by asking users about their privacy concerns and then provides a customized DP mechanism and communication. We also propose a design framework that systemically explores effective communication designs ranging from a text-based high-level description to a step-by-step interactive storyboard. Based on the framework, we created 17 designs and recruited five people to evaluate. Our user study showed that text-based descriptions have the highest clarity in all scenarios, while the step-by-step interactive storyboards have the potential to persuade users to trust central DP. Our future work will optimize the design and conduct a large-scale efficacy study.
△ Less
Submitted 4 August, 2022;
originally announced August 2022.
-
An Open Natural Language Processing Development Framework for EHR-based Clinical Research: A case demonstration using the National COVID Cohort Collaborative (N3C)
Authors:
Sijia Liu,
Andrew Wen,
Liwei Wang,
Huan He,
Sunyang Fu,
Robert Miller,
Andrew Williams,
Daniel Harris,
Ramakanth Kavuluru,
Mei Liu,
Noor Abu-el-rub,
Dalton Schutte,
Rui Zhang,
Masoud Rouhizadeh,
John D. Osborne,
Yongqun He,
Umit Topaloglu,
Stephanie S Hong,
Joel H Saltz,
Thomas Schaffter,
Emily Pfaff,
Christopher G. Chute,
Tim Duong,
Melissa A. Haendel,
Rafael Fuentes
, et al. (7 additional authors not shown)
Abstract:
While we pay attention to the latest advances in clinical natural language processing (NLP), we can notice some resistance in the clinical and translational research community to adopt NLP models due to limited transparency, interpretability, and usability. In this study, we proposed an open natural language processing development framework. We evaluated it through the implementation of NLP algori…
▽ More
While we pay attention to the latest advances in clinical natural language processing (NLP), we can notice some resistance in the clinical and translational research community to adopt NLP models due to limited transparency, interpretability, and usability. In this study, we proposed an open natural language processing development framework. We evaluated it through the implementation of NLP algorithms for the National COVID Cohort Collaborative (N3C). Based on the interests in information extraction from COVID-19 related clinical notes, our work includes 1) an open data annotation process using COVID-19 signs and symptoms as the use case, 2) a community-driven ruleset composing platform, and 3) a synthetic text data generation workflow to generate texts for information extraction tasks without involving human subjects. The corpora were derived from texts from three different institutions (Mayo Clinic, University of Kentucky, University of Minnesota). The gold standard annotations were tested with a single institution's (Mayo) ruleset. This resulted in performances of 0.876, 0.706, and 0.694 in F-scores for Mayo, Minnesota, and Kentucky test datasets, respectively. The study as a consortium effort of the N3C NLP subgroup demonstrates the feasibility of creating a federated NLP algorithm development and benchmarking platform to enhance multi-institution clinical NLP study and adoption. Although we use COVID-19 as a use case in this effort, our framework is general enough to be applied to other domains of interest in clinical NLP.
△ Less
Submitted 21 March, 2022; v1 submitted 20 October, 2021;
originally announced October 2021.
-
Leveraging a Joint of Phenotypic and Genetic Features on Cancer Patient Subgrouping
Authors:
David Oniani,
Chen Wang,
Yiqing Zhao,
Andrew Wen,
Hongfang Liu,
Feichen Shen
Abstract:
Cancer is responsible for millions of deaths worldwide every year. Although significant progress has been achieved in cancer medicine, many issues remain to be addressed for improving cancer therapy. Appropriate cancer patient stratification is the prerequisite for selecting appropriate treatment plan, as cancer patients are of known heterogeneous genetic make-ups and phenotypic differences. In th…
▽ More
Cancer is responsible for millions of deaths worldwide every year. Although significant progress has been achieved in cancer medicine, many issues remain to be addressed for improving cancer therapy. Appropriate cancer patient stratification is the prerequisite for selecting appropriate treatment plan, as cancer patients are of known heterogeneous genetic make-ups and phenotypic differences. In this study, built upon deep phenotypic characterizations extractable from Mayo Clinic electronic health records (EHRs) and genetic test reports for a collection of cancer patients, we developed a system leveraging a joint of phenotypic and genetic features for cancer patient subgrouping.
The workflow is roughly divided into three parts: feature preprocessing, cancer patient classification, and cancer patient clustering based. In feature preprocessing step, we performed filtering, retaining the most relevant features. In cancer patient classification, we utilized joint categorical features to build a patient-feature matrix and applied nine different machine learning models, Random Forests (RF), Decision Tree (DT), Support Vector Machine (SVM), Naive Bayes (NB), Logistic Regression (LR), Multilayer Perceptron (MLP), Gradient Boosting (GB), Convolutional Neural Network (CNN), and Feedforward Neural Network (FNN), for classification purposes. Finally, in the cancer patient clustering step, we leveraged joint embeddings features and patient-feature associations to build an undirected feature graph and then trained the cancer feature node embeddings.
△ Less
Submitted 30 March, 2021;
originally announced March 2021.
-
Comparisons of Graph Neural Networks on Cancer Classification Leveraging a Joint of Phenotypic and Genetic Features
Authors:
David Oniani,
Chen Wang,
Yiqing Zhao,
Andrew Wen,
Hongfang Liu,
Feichen Shen
Abstract:
Cancer is responsible for millions of deaths worldwide every year. Although significant progress hasbeen achieved in cancer medicine, many issues remain to be addressed for improving cancer therapy.Appropriate cancer patient stratification is the prerequisite for selecting appropriate treatment plan, ascancer patients are of known heterogeneous genetic make-ups and phenotypic differences. In thiss…
▽ More
Cancer is responsible for millions of deaths worldwide every year. Although significant progress hasbeen achieved in cancer medicine, many issues remain to be addressed for improving cancer therapy.Appropriate cancer patient stratification is the prerequisite for selecting appropriate treatment plan, ascancer patients are of known heterogeneous genetic make-ups and phenotypic differences. In thisstudy, built upon deep phenotypic characterizations extractable from Mayo Clinic electronic healthrecords (EHRs) and genetic test reports for a collection of cancer patients, we evaluated variousgraph neural networks (GNNs) leveraging a joint of phenotypic and genetic features for cancer typeclassification. Models were applied and fine-tuned on the Mayo Clinic cancer disease dataset. Theassessment was done through the reported accuracy, precision, recall, and F1 values as well as throughF1 scores based on the disease class. Per our evaluation results, GNNs on average outperformed thebaseline models with mean statistics always being higher that those of the baseline models (0.849 vs0.772 for accuracy, 0.858 vs 0.794 for precision, 0.843 vs 0.759 for recall, and 0.843 vs 0.855 for F1score). Among GNNs, ChebNet, GraphSAGE, and TAGCN showed the best performance, while GATshowed the worst. We applied and compared eight GNN models including AGNN, ChebNet, GAT,GCN, GIN, GraphSAGE, SGC, and TAGCN on the Mayo Clinic cancer disease dataset and assessedtheir performance as well as compared them with each other and with more conventional machinelearning models such as decision tree, gradient boosting, multi-layer perceptron, naive bayes, andrandom forest which we used as the baselines.
△ Less
Submitted 14 January, 2021;
originally announced January 2021.
-
Adapting and evaluating a deep learning language model for clinical why-question answering
Authors:
Andrew Wen,
Mohamed Y. Elwazir,
Sungrim Moon,
Jungwei Fan
Abstract:
Objectives: To adapt and evaluate a deep learning language model for answering why-questions based on patient-specific clinical text. Materials and Methods: Bidirectional encoder representations from transformers (BERT) models were trained with varying data sources to perform SQuAD 2.0 style why-question answering (why-QA) on clinical notes. The evaluation focused on: 1) comparing the merits from…
▽ More
Objectives: To adapt and evaluate a deep learning language model for answering why-questions based on patient-specific clinical text. Materials and Methods: Bidirectional encoder representations from transformers (BERT) models were trained with varying data sources to perform SQuAD 2.0 style why-question answering (why-QA) on clinical notes. The evaluation focused on: 1) comparing the merits from different training data, 2) error analysis. Results: The best model achieved an accuracy of 0.707 (or 0.760 by partial match). Training toward customization for the clinical language helped increase 6% in accuracy. Discussion: The error analysis suggested that the model did not really perform deep reasoning and that clinical why-QA might warrant more sophisticated solutions. Conclusion: The BERT model achieved moderate accuracy in clinical why-QA and should benefit from the rapidly evolving technology. Despite the identified limitations, it could serve as a competent proxy for question-driven clinical information extraction.
△ Less
Submitted 13 November, 2019;
originally announced November 2019.
-
Clinical Concept Extraction: a Methodology Review
Authors:
Sunyang Fu,
David Chen,
Huan He,
Sijia Liu,
Sungrim Moon,
Kevin J Peterson,
Feichen Shen,
Liwei Wang,
Yanshan Wang,
Andrew Wen,
Yiqing Zhao,
Sunghwan Sohn,
Hongfang Liu
Abstract:
Background Concept extraction, a subdomain of natural language processing (NLP) with a focus on extracting concepts of interest, has been adopted to computationally extract clinical information from text for a wide range of applications ranging from clinical decision support to care quality improvement.
Objectives In this literature review, we provide a methodology review of clinical concept ext…
▽ More
Background Concept extraction, a subdomain of natural language processing (NLP) with a focus on extracting concepts of interest, has been adopted to computationally extract clinical information from text for a wide range of applications ranging from clinical decision support to care quality improvement.
Objectives In this literature review, we provide a methodology review of clinical concept extraction, aiming to catalog development processes, available methods and tools, and specific considerations when developing clinical concept extraction applications.
Methods Based on the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines, a literature search was conducted for retrieving EHR-based information extraction articles written in English and published from January 2009 through June 2019 from Ovid MEDLINE In-Process & Other Non-Indexed Citations, Ovid MEDLINE, Ovid EMBASE, Scopus, Web of Science, and the ACM Digital Library.
Results A total of 6,686 publications were retrieved. After title and abstract screening, 228 publications were selected. The methods used for developing clinical concept extraction applications were discussed in this review.
△ Less
Submitted 10 August, 2020; v1 submitted 24 October, 2019;
originally announced October 2019.
-
Cross-lingual Data Transformation and Combination for Text Classification
Authors:
Jun Jiang,
Shumao Pang,
Xia Zhao,
Liwei Wang,
Andrew Wen,
Hongfang Liu,
Qianjin Feng
Abstract:
Text classification is a fundamental task for text data mining. In order to train a generalizable model, a large volume of text must be collected. To address data insufficiency, cross-lingual data may occasionally be necessary. Cross-lingual data sources may however suffer from data incompatibility, as text written in different languages can hold distinct word sequences and semantic patterns. Mach…
▽ More
Text classification is a fundamental task for text data mining. In order to train a generalizable model, a large volume of text must be collected. To address data insufficiency, cross-lingual data may occasionally be necessary. Cross-lingual data sources may however suffer from data incompatibility, as text written in different languages can hold distinct word sequences and semantic patterns. Machine translation and word embedding alignment provide an effective way to transform and combine data for cross-lingual data training. To the best of our knowledge, there has been little work done on evaluating how the methodology used to conduct semantic space transformation and data combination affects the performance of classification models trained from cross-lingual resources. In this paper, we systematically evaluated the performance of two commonly used CNN (Convolutional Neural Network) and RNN (Recurrent Neural Network) text classifiers with differing data transformation and combination strategies. Monolingual models were trained from English and French alongside their translated and aligned embeddings. Our results suggested that semantic space transformation may conditionally promote the performance of monolingual models. Bilingual models were trained from a combination of both English and French. Our results indicate that a cross-lingual classification model can significantly benefit from cross-lingual data by learning from translated or aligned embedding spaces.
△ Less
Submitted 22 June, 2019;
originally announced June 2019.
-
CREATE: Cohort Retrieval Enhanced by Analysis of Text from Electronic Health Records using OMOP Common Data Model
Authors:
Sijia Liu,
Yanshan Wang,
Andrew Wen,
Liwei Wang,
Na Hong,
Feichen Shen,
Steven Bedrick,
William Hersh,
Hongfang Liu
Abstract:
Background: Widespread adoption of electronic health records (EHRs) has enabled secondary use of EHR data for clinical research and healthcare delivery. Natural language processing (NLP) techniques have shown promise in their capability to extract the embedded information in unstructured clinical data, and information retrieval (IR) techniques provide flexible and scalable solutions that can augme…
▽ More
Background: Widespread adoption of electronic health records (EHRs) has enabled secondary use of EHR data for clinical research and healthcare delivery. Natural language processing (NLP) techniques have shown promise in their capability to extract the embedded information in unstructured clinical data, and information retrieval (IR) techniques provide flexible and scalable solutions that can augment the NLP systems for retrieving and ranking relevant records. Methods: In this paper, we present the implementation of Cohort Retrieval Enhanced by Analysis of Text from EHRs (CREATE), a cohort retrieval system that can execute textual cohort selection queries on both structured and unstructured EHR data. CREATE is a proof-of-concept system that leverages a combination of structured queries and IR techniques on NLP results to improve cohort retrieval performance while adopting the Observational Medical Outcomes Partnership (OMOP) Common Data Model (CDM) to enhance model portability. The NLP component empowered by cTAKES is used to extract CDM concepts from textual queries. We design a hierarchical index in Elasticsearch to support CDM concept search utilizing IR techniques and frameworks. Results: Our case study on 5 cohort identification queries evaluated using the IR metric, P@5 (Precision at 5) at both the patient-level and document-level, demonstrates that CREATE achieves an average P@5 of 0.90, which outperforms systems using only structured data or only unstructured data with average P@5s of 0.54 and 0.74, respectively.
△ Less
Submitted 22 January, 2019;
originally announced January 2019.