Search | arXiv e-print repository

arXiv:2501.00520 [pdf, other]

Innovative Silicosis and Pneumonia Classification: Leveraging Graph Transformer Post-hoc Modeling and Ensemble Techniques

Authors: Bao Q. Bui, Tien T. T. Nguyen, Duy M. Le, Cong Tran, Cuong Pham

Abstract: This paper presents a comprehensive study on the classification and detection of Silicosis-related lung inflammation. Our main contributions include 1) the creation of a newly curated chest X-ray (CXR) image dataset named SVBCX that is tailored to the nuances of lung inflammation caused by distinct agents, providing a valuable resource for silicosis and pneumonia research community; and 2) we prop… ▽ More This paper presents a comprehensive study on the classification and detection of Silicosis-related lung inflammation. Our main contributions include 1) the creation of a newly curated chest X-ray (CXR) image dataset named SVBCX that is tailored to the nuances of lung inflammation caused by distinct agents, providing a valuable resource for silicosis and pneumonia research community; and 2) we propose a novel deep-learning architecture that integrates graph transformer networks alongside a traditional deep neural network module for the effective classification of silicosis and pneumonia. Additionally, we employ the Balanced Cross-Entropy (BalCE) as a loss function to ensure more uniform learning across different classes, enhancing the model's ability to discern subtle differences in lung conditions. The proposed model architecture and loss function selection aim to improve the accuracy and reliability of inflammation detection, particularly in the context of Silicosis. Furthermore, our research explores the efficacy of an ensemble approach that combines the strengths of diverse model architectures. Experimental results on the constructed dataset demonstrate promising outcomes, showcasing substantial enhancements compared to baseline models. The ensemble of models achieves a macro-F1 score of 0.9749 and AUC ROC scores exceeding 0.99 for each class, underscoring the effectiveness of our approach in accurate and robust lung inflammation classification. △ Less

Submitted 31 December, 2024; originally announced January 2025.

arXiv:2412.19606 [pdf, other]

Enhancing Fine-grained Image Classification through Attentive Batch Training

Authors: Duy M. Le, Bao Q. Bui, Anh Tran, Cong Tran, Cuong Pham

Abstract: Fine-grained image classification, which is a challenging task in computer vision, requires precise differentiation among visually similar object categories. In this paper, we propose 1) a novel module called Residual Relationship Attention (RRA) that leverages the relationships between images within each training batch to effectively integrate visual feature vectors of batch images and 2) a novel… ▽ More Fine-grained image classification, which is a challenging task in computer vision, requires precise differentiation among visually similar object categories. In this paper, we propose 1) a novel module called Residual Relationship Attention (RRA) that leverages the relationships between images within each training batch to effectively integrate visual feature vectors of batch images and 2) a novel technique called Relationship Position Encoding (RPE), which encodes the positions of relationships between original images in a batch and effectively preserves the relationship information between images within the batch. Additionally, we design a novel framework, namely Relationship Batch Integration (RBI), which utilizes RRA in conjunction with RPE, allowing the discernment of vital visual features that may remain elusive when examining a singular image representative of a particular class. Through extensive experiments, our proposed method demonstrates significant improvements in the accuracy of different fine-grained classifiers, with an average increase of $(+2.78\%)$ and $(+3.83\%)$ on the CUB200-2011 and Stanford Dog datasets, respectively, while achieving a state-of-the-art results $(95.79\%)$ on the Stanford Dog dataset. Despite not achieving the same level of improvement as in fine-grained image classification, our method still demonstrates its prowess in leveraging general image classification by attaining a state-of-the-art result of $(93.71\%)$ on the Tiny-Imagenet dataset. Furthermore, our method serves as a plug-in refinement module and can be easily integrated into different networks. △ Less

Submitted 27 December, 2024; originally announced December 2024.

arXiv:2402.03131 [pdf, other]

Constrained Decoding for Cross-lingual Label Projection

Authors: Duong Minh Le, Yang Chen, Alan Ritter, Wei Xu

Abstract: Zero-shot cross-lingual transfer utilizing multilingual LLMs has become a popular learning paradigm for low-resource languages with no labeled training data. However, for NLP tasks that involve fine-grained predictions on words and phrases, the performance of zero-shot cross-lingual transfer learning lags far behind supervised fine-tuning methods. Therefore, it is common to exploit translation and… ▽ More Zero-shot cross-lingual transfer utilizing multilingual LLMs has become a popular learning paradigm for low-resource languages with no labeled training data. However, for NLP tasks that involve fine-grained predictions on words and phrases, the performance of zero-shot cross-lingual transfer learning lags far behind supervised fine-tuning methods. Therefore, it is common to exploit translation and label projection to further improve the performance by (1) translating training data that is available in a high-resource language (e.g., English) together with the gold labels into low-resource languages, and/or (2) translating test data in low-resource languages to a high-source language to run inference on, then projecting the predicted span-level labels back onto the original test data. However, state-of-the-art marker-based label projection methods suffer from translation quality degradation due to the extra label markers injected in the input to the translation model. In this work, we explore a new direction that leverages constrained decoding for label projection to overcome the aforementioned issues. Our new method not only can preserve the quality of translated texts but also has the versatility of being applicable to both translating training and translating test data strategies. This versatility is crucial as our experiments reveal that translating test data can lead to a considerable boost in performance compared to translating only training data. We evaluate on two cross-lingual transfer tasks, namely Named Entity Recognition and Event Argument Extraction, spanning 20 languages. The results demonstrate that our approach outperforms the state-of-the-art marker-based method by a large margin and also shows better performance than other label projection methods that rely on external word alignment. △ Less

Submitted 5 February, 2024; originally announced February 2024.

Comments: Accepted at ICLR 2024

arXiv:2305.17280 [pdf, other]

Improved Instruction Ordering in Recipe-Grounded Conversation

Authors: Duong Minh Le, Ruohao Guo, Wei Xu, Alan Ritter

Abstract: In this paper, we study the task of instructional dialogue and focus on the cooking domain. Analyzing the generated output of the GPT-J model, we reveal that the primary challenge for a recipe-grounded dialog system is how to provide the instructions in the correct order. We hypothesize that this is due to the model's lack of understanding of user intent and inability to track the instruction stat… ▽ More In this paper, we study the task of instructional dialogue and focus on the cooking domain. Analyzing the generated output of the GPT-J model, we reveal that the primary challenge for a recipe-grounded dialog system is how to provide the instructions in the correct order. We hypothesize that this is due to the model's lack of understanding of user intent and inability to track the instruction state (i.e., which step was last instructed). Therefore, we propose to explore two auxiliary subtasks, namely User Intent Detection and Instruction State Tracking, to support Response Generation with improved instruction grounding. Experimenting with our newly collected dataset, ChattyChef, shows that incorporating user intent and instruction state information helps the response generation model mitigate the incorrect order issue. Furthermore, to investigate whether ChatGPT has completely solved this task, we analyze its outputs and find that it also makes mistakes (10.7% of the responses), about half of which are out-of-order instructions. We will release ChattyChef to facilitate further research in this area at: https://github.com/octaviaguo/ChattyChef. △ Less

Submitted 26 May, 2023; originally announced May 2023.

Comments: Accepted at ACL 2023 main conference

arXiv:2202.07177 [pdf, other]

Tombo Propeller: Bio-Inspired Deformable Structure toward Collision-Accommodated Control for Drones

Authors: Son Tien Bui, Quan Khanh Luu, Dinh Quang Nguyen, Nhat Dinh Minh Le, Giuseppe Loianno, Van Anh Ho

Abstract: There is a growing need for vertical take-off and landing vehicles, including drones, which are safe to use and can adapt to collisions. The risks of damage by collision, to humans, obstacles in the environment, and drones themselves, are significant. This has prompted a search into nature for a highly resilient structure that can inform a design of propellers to reduce those risks and enhance saf… ▽ More There is a growing need for vertical take-off and landing vehicles, including drones, which are safe to use and can adapt to collisions. The risks of damage by collision, to humans, obstacles in the environment, and drones themselves, are significant. This has prompted a search into nature for a highly resilient structure that can inform a design of propellers to reduce those risks and enhance safety. Inspired by the flexibility and resilience of dragonfly wings, we propose a novel design for a biomimetic drone propeller called Tombo propeller. Here, we report on the design and fabrication process of this biomimetic propeller that can accommodate collisions and recover quickly, while maintaining sufficient thrust force to hover and fly. We describe the development of an aerodynamic model and experiments conducted to investigate performance characteristics for various configurations of the propeller morphology, and related properties, such as generated thrust force, thrust force deviation, collision force, recovery time, lift-to-drag ratio, and noise. Finally, we design and showcase a control strategy for a drone equipped with Tombo propellers that collides in mid-air with an obstacle and recovers from collision continuing flying. The results show that the maximum collision force generated by the proposed Tombo propeller is less than two-thirds that of a traditional rigid propeller, which suggests the concrete possibility to employ deformable propellers for drones flying in a cluttered environment. This research can contribute to morphological design of flying vehicles for agile and resilient performance. △ Less

Submitted 14 February, 2022; originally announced February 2022.

arXiv:2109.09701 [pdf, other]

BARTpho: Pre-trained Sequence-to-Sequence Models for Vietnamese

Authors: Nguyen Luong Tran, Duong Minh Le, Dat Quoc Nguyen

Abstract: We present BARTpho with two versions, BARTpho-syllable and BARTpho-word, which are the first public large-scale monolingual sequence-to-sequence models pre-trained for Vietnamese. BARTpho uses the "large" architecture and the pre-training scheme of the sequence-to-sequence denoising autoencoder BART, thus it is especially suitable for generative NLP tasks. We conduct experiments to compare our BAR… ▽ More We present BARTpho with two versions, BARTpho-syllable and BARTpho-word, which are the first public large-scale monolingual sequence-to-sequence models pre-trained for Vietnamese. BARTpho uses the "large" architecture and the pre-training scheme of the sequence-to-sequence denoising autoencoder BART, thus it is especially suitable for generative NLP tasks. We conduct experiments to compare our BARTpho with its competitor mBART on a downstream task of Vietnamese text summarization and show that: in both automatic and human evaluations, BARTpho outperforms the strong baseline mBART and improves the state-of-the-art. We further evaluate and compare BARTpho and mBART on the Vietnamese capitalization and punctuation restoration tasks and also find that BARTpho is more effective than mBART on these two tasks. We publicly release BARTpho to facilitate future research and applications of generative Vietnamese NLP tasks. Our BARTpho models are available at https://github.com/VinAIResearch/BARTpho △ Less

Submitted 27 June, 2022; v1 submitted 20 September, 2021; originally announced September 2021.

Comments: In Proceedings of INTERSPEECH 2022 (to appear)

arXiv:2104.08432 [pdf, other]

Architectural Archipelagos: Technical Debt in Long-Lived Software Research Platforms

Authors: Marcelo Schmitt Laser, Duc Minh Le, Joshua Garcia, Nenad Medvidović

Abstract: This paper identifies a model of software evolution that is prevalent in large, long-lived academic research tool suites (3L-ARTS). This model results in an "archipelago" of related but haphazardly organized architectural "islands", and inherently induces technical debt. We illustrate the archipelago model with examples from two 3L-ARTS archipelagos identified in literature. This paper identifies a model of software evolution that is prevalent in large, long-lived academic research tool suites (3L-ARTS). This model results in an "archipelago" of related but haphazardly organized architectural "islands", and inherently induces technical debt. We illustrate the archipelago model with examples from two 3L-ARTS archipelagos identified in literature. △ Less

Submitted 16 April, 2021; originally announced April 2021.

arXiv:2102.09835 [pdf, other]

Architectural Decay as Predictor of Issue- and Change-Proneness

Authors: Duc Minh Le, Suhrid Karthik, Marcelo Schmitt Laser, Nenad Medvidovic

Abstract: Architectural decay imposes real costs in terms of developer effort, system correctness, and performance. Over time, those problems are likely to be revealed as explicit implementation issues (defects, feature changes, etc.). Recent empirical studies have demonstrated that there is a significant correlation between architectural "smells" -- manifestations of architectural decay -- and implementati… ▽ More Architectural decay imposes real costs in terms of developer effort, system correctness, and performance. Over time, those problems are likely to be revealed as explicit implementation issues (defects, feature changes, etc.). Recent empirical studies have demonstrated that there is a significant correlation between architectural "smells" -- manifestations of architectural decay -- and implementation issues. In this paper, we take a step further in exploring this phenomenon. We analyze the available development data from 10 open-source software systems and show that information regarding current architectural decay in these systems can be used to build models that accurately predict future issue-proneness and change-proneness of the systems' implementations. As a less intuitive result, we also show that, in cases where historical data for a system is unavailable, such data from other, unrelated systems can provide reasonably accurate issue- and change-proneness prediction capabilities. △ Less

Submitted 19 February, 2021; originally announced February 2021.

Showing 1–8 of 8 results for author: Le, D M