-
Vision-Language Models Can't See the Obvious
Authors:
Yasser Dahou,
Ngoc Dung Huynh,
Phuc H. Le-Khac,
Wamiq Reyaz Para,
Ankit Singh,
Sanath Narayan
Abstract:
We present Saliency Benchmark (SalBench), a novel benchmark designed to assess the capability of Large Vision-Language Models (LVLM) in detecting visually salient features that are readily apparent to humans, such as a large circle amidst a grid of smaller ones. This benchmark focuses on low-level features including color, intensity, and orientation, which are fundamental to human visual processin…
▽ More
We present Saliency Benchmark (SalBench), a novel benchmark designed to assess the capability of Large Vision-Language Models (LVLM) in detecting visually salient features that are readily apparent to humans, such as a large circle amidst a grid of smaller ones. This benchmark focuses on low-level features including color, intensity, and orientation, which are fundamental to human visual processing. Our SalBench consists of images that highlight rare, unusual, or unexpected elements within scenes, and naturally draw human attention. It comprises three novel tasks for evaluating the perceptual capabilities of LVLM: Odd-One-Out Detection, Referring Odd-One-Out, and Visual Referring Odd-One-Out. We perform a comprehensive evaluation of state-of-the-art LVLM using SalBench and our findings reveal a surprising limitation: LVLM struggle to identify seemingly obvious visual anomalies, with even the advanced GPT-4o achieving only 47.6\% accuracy on such a simple task. SalBench will be an important step in measuring the capabilities of LVLM that align with the subtle definition of human attention.
△ Less
Submitted 7 July, 2025;
originally announced July 2025.
-
Seeing the Trees for the Forest: Rethinking Weakly-Supervised Medical Visual Grounding
Authors:
Ta Duc Huy,
Duy Anh Huynh,
Yutong Xie,
Yuankai Qi,
Qi Chen,
Phi Le Nguyen,
Sen Kim Tran,
Son Lam Phung,
Anton van den Hengel,
Zhibin Liao,
Minh-Son To,
Johan W. Verjans,
Vu Minh Hieu Phan
Abstract:
Visual grounding (VG) is the capability to identify the specific regions in an image associated with a particular text description. In medical imaging, VG enhances interpretability by highlighting relevant pathological features corresponding to textual descriptions, improving model transparency and trustworthiness for wider adoption of deep learning models in clinical practice. Current models stru…
▽ More
Visual grounding (VG) is the capability to identify the specific regions in an image associated with a particular text description. In medical imaging, VG enhances interpretability by highlighting relevant pathological features corresponding to textual descriptions, improving model transparency and trustworthiness for wider adoption of deep learning models in clinical practice. Current models struggle to associate textual descriptions with disease regions due to inefficient attention mechanisms and a lack of fine-grained token representations. In this paper, we empirically demonstrate two key observations. First, current VLMs assign high norms to background tokens, diverting the model's attention from regions of disease. Second, the global tokens used for cross-modal learning are not representative of local disease tokens. This hampers identifying correlations between the text and disease tokens. To address this, we introduce simple, yet effective Disease-Aware Prompting (DAP) process, which uses the explainability map of a VLM to identify the appropriate image features. This simple strategy amplifies disease-relevant regions while suppressing background interference. Without any additional pixel-level annotations, DAP improves visual grounding accuracy by 20.74% compared to state-of-the-art methods across three major chest X-ray datasets.
△ Less
Submitted 21 May, 2025;
originally announced May 2025.
-
DocSpiral: A Platform for Integrated Assistive Document Annotation through Human-in-the-Spiral
Authors:
Qiang Sun,
Sirui Li,
Tingting Bi,
Du Huynh,
Mark Reynolds,
Yuanyi Luo,
Wei Liu
Abstract:
Acquiring structured data from domain-specific, image-based documents such as scanned reports is crucial for many downstream tasks but remains challenging due to document variability. Many of these documents exist as images rather than as machine-readable text, which requires human annotation to train automated extraction systems. We present DocSpiral, the first Human-in-the-Spiral assistive docum…
▽ More
Acquiring structured data from domain-specific, image-based documents such as scanned reports is crucial for many downstream tasks but remains challenging due to document variability. Many of these documents exist as images rather than as machine-readable text, which requires human annotation to train automated extraction systems. We present DocSpiral, the first Human-in-the-Spiral assistive document annotation platform, designed to address the challenge of extracting structured information from domain-specific, image-based document collections. Our spiral design establishes an iterative cycle in which human annotations train models that progressively require less manual intervention. DocSpiral integrates document format normalization, comprehensive annotation interfaces, evaluation metrics dashboard, and API endpoints for the development of AI / ML models into a unified workflow. Experiments demonstrate that our framework reduces annotation time by at least 41\% while showing consistent performance gains across three iterations during model training. By making this annotation platform freely accessible, we aim to lower barriers to AI/ML models development in document processing, facilitating the adoption of large language models in image-based, document-intensive fields such as geoscience and healthcare. The system is freely available at: https://app.ai4wa.com. The demonstration video is available: https://app.ai4wa.com/docs/docspiral/demo.
△ Less
Submitted 6 May, 2025;
originally announced May 2025.
-
Enhancing Tropical Cyclone Path Forecasting with an Improved Transformer Network
Authors:
Nguyen Van Thanh,
Nguyen Dang Huynh,
Nguyen Ngoc Tan,
Nguyen Thai Minh,
Nguyen Nam Hoang
Abstract:
A storm is a type of extreme weather. Therefore, forecasting the path of a storm is extremely important for protecting human life and property. However, storm forecasting is very challenging because storm trajectories frequently change. In this study, we propose an improved deep learning method using a Transformer network to predict the movement trajectory of a storm over the next 6 hours. The sto…
▽ More
A storm is a type of extreme weather. Therefore, forecasting the path of a storm is extremely important for protecting human life and property. However, storm forecasting is very challenging because storm trajectories frequently change. In this study, we propose an improved deep learning method using a Transformer network to predict the movement trajectory of a storm over the next 6 hours. The storm data used to train the model was obtained from the National Oceanic and Atmospheric Administration (NOAA) [1]. Simulation results show that the proposed method is more accurate than traditional methods. Moreover, the proposed method is faster and more cost-effective
△ Less
Submitted 1 May, 2025;
originally announced May 2025.
-
SVLA: A Unified Speech-Vision-Language Assistant with Multimodal Reasoning and Speech Generation
Authors:
Ngoc Dung Huynh,
Mohamed Reda Bouadjenek,
Imran Razzak,
Hakim Hacid,
Sunil Aryal
Abstract:
Large vision and language models show strong performance in tasks like image captioning, visual question answering, and retrieval. However, challenges remain in integrating speech, text, and vision into a unified model, especially for spoken tasks. Speech generation methods vary (some produce speech directly), others through text (but their impact on quality is unclear). Evaluation often relies on…
▽ More
Large vision and language models show strong performance in tasks like image captioning, visual question answering, and retrieval. However, challenges remain in integrating speech, text, and vision into a unified model, especially for spoken tasks. Speech generation methods vary (some produce speech directly), others through text (but their impact on quality is unclear). Evaluation often relies on automatic speech recognition, which may introduce bias. We propose SVLA, a unified speech vision language model based on a transformer architecture that handles multimodal inputs and outputs. We train it on 38.2 million speech text image examples, including 64.1 hours of synthetic speech. We also introduce Speech VQA Accuracy, a new metric for evaluating spoken responses. SVLA improves multimodal understanding and generation by better combining speech, vision, and language.
△ Less
Submitted 7 July, 2025; v1 submitted 31 March, 2025;
originally announced March 2025.
-
Knowledge Consultation for Semi-Supervised Semantic Segmentation
Authors:
Thuan Than,
Nhat-Anh Nguyen-Dang,
Dung Nguyen,
Salwa K. Al Khatib,
Ahmed Elhagry,
Hai Phan,
Yihui He,
Zhiqiang Shen,
Marios Savvides,
Dang Huynh
Abstract:
Semi-Supervised Semantic Segmentation reduces reliance on extensive annotations by using unlabeled data and state-of-the-art models to improve overall performance. Despite the success of deep co-training methods, their underlying mechanisms remain underexplored. This work revisits Cross Pseudo Supervision with dual heterogeneous backbones and introduces Knowledge Consultation (SegKC) to further en…
▽ More
Semi-Supervised Semantic Segmentation reduces reliance on extensive annotations by using unlabeled data and state-of-the-art models to improve overall performance. Despite the success of deep co-training methods, their underlying mechanisms remain underexplored. This work revisits Cross Pseudo Supervision with dual heterogeneous backbones and introduces Knowledge Consultation (SegKC) to further enhance segmentation performance. The proposed SegKC achieves significant improvements on Pascal and Cityscapes benchmarks, with mIoU scores of 87.1%, 89.2%, and 89.8% on Pascal VOC with the 1/4, 1/2, and full split partition, respectively, while maintaining a compact model architecture.
△ Less
Submitted 12 March, 2025;
originally announced March 2025.
-
TimelineKGQA: A Comprehensive Question-Answer Pair Generator for Temporal Knowledge Graphs
Authors:
Qiang Sun,
Sirui Li,
Du Huynh,
Mark Reynolds,
Wei Liu
Abstract:
Question answering over temporal knowledge graphs (TKGs) is crucial for understanding evolving facts and relationships, yet its development is hindered by limited datasets and difficulties in generating custom QA pairs. We propose a novel categorization framework based on timeline-context relationships, along with \textbf{TimelineKGQA}, a universal temporal QA generator applicable to any TKGs. The…
▽ More
Question answering over temporal knowledge graphs (TKGs) is crucial for understanding evolving facts and relationships, yet its development is hindered by limited datasets and difficulties in generating custom QA pairs. We propose a novel categorization framework based on timeline-context relationships, along with \textbf{TimelineKGQA}, a universal temporal QA generator applicable to any TKGs. The code is available at: \url{https://github.com/PascalSun/TimelineKGQA} as an open source Python package.
△ Less
Submitted 8 January, 2025;
originally announced January 2025.
-
Visual question answering: from early developments to recent advances -- a survey
Authors:
Ngoc Dung Huynh,
Mohamed Reda Bouadjenek,
Sunil Aryal,
Imran Razzak,
Hakim Hacid
Abstract:
Visual Question Answering (VQA) is an evolving research field aimed at enabling machines to answer questions about visual content by integrating image and language processing techniques such as feature extraction, object detection, text embedding, natural language understanding, and language generation. With the growth of multimodal data research, VQA has gained significant attention due to its br…
▽ More
Visual Question Answering (VQA) is an evolving research field aimed at enabling machines to answer questions about visual content by integrating image and language processing techniques such as feature extraction, object detection, text embedding, natural language understanding, and language generation. With the growth of multimodal data research, VQA has gained significant attention due to its broad applications, including interactive educational tools, medical image diagnosis, customer service, entertainment, and social media captioning. Additionally, VQA plays a vital role in assisting visually impaired individuals by generating descriptive content from images. This survey introduces a taxonomy of VQA architectures, categorizing them based on design choices and key components to facilitate comparative analysis and evaluation. We review major VQA approaches, focusing on deep learning-based methods, and explore the emerging field of Large Visual Language Models (LVLMs) that have demonstrated success in multimodal tasks like VQA. The paper further examines available datasets and evaluation metrics essential for measuring VQA system performance, followed by an exploration of real-world VQA applications. Finally, we highlight ongoing challenges and future directions in VQA research, presenting open questions and potential areas for further development. This survey serves as a comprehensive resource for researchers and practitioners interested in the latest advancements and future
△ Less
Submitted 11 January, 2025; v1 submitted 7 January, 2025;
originally announced January 2025.
-
Serial Scammers and Attack of the Clones: How Scammers Coordinate Multiple Rug Pulls on Decentralized Exchanges
Authors:
Phuong Duy Huynh,
Son Hoang Dau,
Nicholas Huppert,
Joshua Cervenjak,
Hoonie Sun,
Hong Yen Tran,
Xiaodong Li,
Emanuele Viterbo
Abstract:
We explored the ubiquitous phenomenon of serial scammers, each of whom deployed dozens to thousands of addresses to conduct a series of similar Rug Pulls on popular decentralized exchanges. We first constructed two datasets of around 384,000 scammer addresses behind all one-day Simple Rug Pulls on Uniswap (Ethereum) and Pancakeswap (BSC), and identified distinctive scam patterns including star, ch…
▽ More
We explored the ubiquitous phenomenon of serial scammers, each of whom deployed dozens to thousands of addresses to conduct a series of similar Rug Pulls on popular decentralized exchanges. We first constructed two datasets of around 384,000 scammer addresses behind all one-day Simple Rug Pulls on Uniswap (Ethereum) and Pancakeswap (BSC), and identified distinctive scam patterns including star, chain, and major (scam-funding) flow. These patterns, which collectively cover about $40\%$ of all scammer addresses in our datasets, reveal typical ways scammers run multiple Rug Pulls and organize the money flow among different addresses. We then studied the more general concept of scam cluster, which comprises scammer addresses linked together via direct ETH/BNB transfers or behind the same scam pools. We found that scam token contracts are highly similar within each cluster (average similarities $>70\%$) and dissimilar across different clusters (average similarities $<30\%$), corroborating our view that each cluster belongs to the same scammer/scam organization. Lastly, we analyze the scam profit of individual scam pools and clusters, employing a novel cluster-aware profit formula that takes into account the important role of wash traders. The analysis shows that the existing formula inflates the profit by at least $32\%$ on Uniswap and $24\%$ on Pancakeswap.
△ Less
Submitted 10 February, 2025; v1 submitted 14 December, 2024;
originally announced December 2024.
-
SimpsonsVQA: Enhancing Inquiry-Based Learning with a Tailored Dataset
Authors:
Ngoc Dung Huynh,
Mohamed Reda Bouadjenek,
Sunil Aryal,
Imran Razzak,
Hakim Hacid
Abstract:
Visual Question Answering (VQA) has emerged as a promising area of research to develop AI-based systems for enabling interactive and immersive learning. Numerous VQA datasets have been introduced to facilitate various tasks, such as answering questions or identifying unanswerable ones. However, most of these datasets are constructed using real-world images, leaving the performance of existing mode…
▽ More
Visual Question Answering (VQA) has emerged as a promising area of research to develop AI-based systems for enabling interactive and immersive learning. Numerous VQA datasets have been introduced to facilitate various tasks, such as answering questions or identifying unanswerable ones. However, most of these datasets are constructed using real-world images, leaving the performance of existing models on cartoon images largely unexplored. Hence, in this paper, we present "SimpsonsVQA", a novel dataset for VQA derived from The Simpsons TV show, designed to promote inquiry-based learning. Our dataset is specifically designed to address not only the traditional VQA task but also to identify irrelevant questions related to images, as well as the reverse scenario where a user provides an answer to a question that the system must evaluate (e.g., as correct, incorrect, or ambiguous). It aims to cater to various visual applications, harnessing the visual content of "The Simpsons" to create engaging and informative interactive systems. SimpsonsVQA contains approximately 23K images, 166K QA pairs, and 500K judgments (https://simpsonsvqa.org). Our experiments show that current large vision-language models like ChatGPT4o underperform in zero-shot settings across all three tasks, highlighting the dataset's value for improving model performance on cartoon images. We anticipate that SimpsonsVQA will inspire further research, innovation, and advancements in inquiry-based learning VQA.
△ Less
Submitted 29 October, 2024;
originally announced October 2024.
-
Spherical Linear Interpolation and Text-Anchoring for Zero-shot Composed Image Retrieval
Authors:
Young Kyun Jang,
Dat Huynh,
Ashish Shah,
Wen-Kai Chen,
Ser-Nam Lim
Abstract:
Composed Image Retrieval (CIR) is a complex task that retrieves images using a query, which is configured with an image and a caption that describes desired modifications to that image. Supervised CIR approaches have shown strong performance, but their reliance on expensive manually-annotated datasets restricts their scalability and broader applicability. To address these issues, previous studies…
▽ More
Composed Image Retrieval (CIR) is a complex task that retrieves images using a query, which is configured with an image and a caption that describes desired modifications to that image. Supervised CIR approaches have shown strong performance, but their reliance on expensive manually-annotated datasets restricts their scalability and broader applicability. To address these issues, previous studies have proposed pseudo-word token-based Zero-Shot CIR (ZS-CIR) methods, which utilize a projection module to map images to word tokens. However, we conjecture that this approach has a downside: the projection module distorts the original image representation and confines the resulting composed embeddings to the text-side. In order to resolve this, we introduce a novel ZS-CIR method that uses Spherical Linear Interpolation (Slerp) to directly merge image and text representations by identifying an intermediate embedding of both. Furthermore, we introduce Text-Anchored-Tuning (TAT), a method that fine-tunes the image encoder while keeping the text encoder fixed. TAT closes the modality gap between images and text, making the Slerp process much more effective. Notably, the TAT method is not only efficient in terms of the scale of the training dataset and training time, but it also serves as an excellent initial checkpoint for training supervised CIR models, thereby highlighting its wider potential. The integration of the Slerp-based ZS-CIR with a TAT-tuned model enables our approach to deliver state-of-the-art retrieval performance across CIR benchmarks.
△ Less
Submitted 1 May, 2024;
originally announced May 2024.
-
Visual Delta Generator with Large Multi-modal Models for Semi-supervised Composed Image Retrieval
Authors:
Young Kyun Jang,
Donghyun Kim,
Zihang Meng,
Dat Huynh,
Ser-Nam Lim
Abstract:
Composed Image Retrieval (CIR) is a task that retrieves images similar to a query, based on a provided textual modification. Current techniques rely on supervised learning for CIR models using labeled triplets of the reference image, text, target image. These specific triplets are not as commonly available as simple image-text pairs, limiting the widespread use of CIR and its scalability. On the o…
▽ More
Composed Image Retrieval (CIR) is a task that retrieves images similar to a query, based on a provided textual modification. Current techniques rely on supervised learning for CIR models using labeled triplets of the reference image, text, target image. These specific triplets are not as commonly available as simple image-text pairs, limiting the widespread use of CIR and its scalability. On the other hand, zero-shot CIR can be relatively easily trained with image-caption pairs without considering the image-to-image relation, but this approach tends to yield lower accuracy. We propose a new semi-supervised CIR approach where we search for a reference and its related target images in auxiliary data and learn our large language model-based Visual Delta Generator (VDG) to generate text describing the visual difference (i.e., visual delta) between the two. VDG, equipped with fluent language knowledge and being model agnostic, can generate pseudo triplets to boost the performance of CIR models. Our approach significantly improves the existing supervised learning approaches and achieves state-of-the-art results on the CIR benchmarks.
△ Less
Submitted 23 April, 2024;
originally announced April 2024.
-
Unveiling Comparative Sentiments in Vietnamese Product Reviews: A Sequential Classification Framework
Authors:
Ha Le,
Bao Tran,
Phuong Le,
Tan Nguyen,
Dac Nguyen,
Ngoan Pham,
Dang Huynh
Abstract:
Comparative opinion mining is a specialized field of sentiment analysis that aims to identify and extract sentiments expressed comparatively. To address this task, we propose an approach that consists of solving three sequential sub-tasks: (i) identifying comparative sentence, i.e., if a sentence has a comparative meaning, (ii) extracting comparative elements, i.e., what are comparison subjects, o…
▽ More
Comparative opinion mining is a specialized field of sentiment analysis that aims to identify and extract sentiments expressed comparatively. To address this task, we propose an approach that consists of solving three sequential sub-tasks: (i) identifying comparative sentence, i.e., if a sentence has a comparative meaning, (ii) extracting comparative elements, i.e., what are comparison subjects, objects, aspects, predicates, and (iii) classifying comparison types which contribute to a deeper comprehension of user sentiments in Vietnamese product reviews. Our method is ranked fifth at the Vietnamese Language and Speech Processing (VLSP) 2023 challenge on Comparative Opinion Mining (ComOM) from Vietnamese Product Reviews.
△ Less
Submitted 2 January, 2024;
originally announced January 2024.
-
The Impact of COVID-19 on Chronic Pain: Multidimensional Clustering Reveals Deep Insights into Spinal Cord Stimulation Patients
Authors:
Sara Berger,
Carla Agurto,
Guillermo Cecchi,
Elif Eyigoz,
Brad Hershey,
Kristen Lechleiter,
NAVITAS/ENVISION studies physician author group,
Dat Huynh,
Matt McDonald,
Jeffrey L Rogers
Abstract:
The emergence of COVID-19 offered a unique opportunity to study chronic pain patients as they responded to sudden changes in social environments, increased community stress, and reduced access to care. We report findings from n=70 Spinal Cord Stimulation (SCS) patients before and during initial pandemic stages resulting from advances in home monitoring and artificial intelligence that produced nov…
▽ More
The emergence of COVID-19 offered a unique opportunity to study chronic pain patients as they responded to sudden changes in social environments, increased community stress, and reduced access to care. We report findings from n=70 Spinal Cord Stimulation (SCS) patients before and during initial pandemic stages resulting from advances in home monitoring and artificial intelligence that produced novel insights despite pandemic-related disruptions. From a multi-dimensional array of frequently monitored signals-including mobility, sleep, voice, and psychological assessments-we found that while the overall patient cohort appeared unaffected by the pandemic onset, patients had significantly different individual experiences. Three distinct patient responses (sub-cohorts) were revealed, those with: worsened pain, reduced activities, or improved quality of life. Remarkably, none of the specific measures by themselves were significantly affected; instead, it was their synergy that exposed the effects elicited by the pandemic onset. Partial correlations illustrating linked dimensions by sub-cohort during the pandemic and those associations were different for each sub-cohort before COVID-19, suggesting that daily at-home tele-monitoring of chronic conditions may reveal novel patient types. This work highlights the opportunities afforded by applying modern analytic techniques to more holistic and longitudinal patient outcomes, which might aid clinicians in making more informed treatment decisions in the future.
△ Less
Submitted 2 October, 2023;
originally announced October 2023.
-
Polytopal composite finite elements for modeling concrete fracture based on nonlocal damage models
Authors:
Hai D. Huynh,
S. Natarajan,
H. Nguyen-Xuan,
Xiaoying Zhuang
Abstract:
The paper presents an assumed strain formulation over polygonal meshes to accurately evaluate the strain fields in nonlocal damage models. An assume strained technique based on the Hu-Washizu variational principle is employed to generate a new strain approximation instead of direct derivation from the basis functions and the displacement fields. The underlying idea embedded in arbitrary finite pol…
▽ More
The paper presents an assumed strain formulation over polygonal meshes to accurately evaluate the strain fields in nonlocal damage models. An assume strained technique based on the Hu-Washizu variational principle is employed to generate a new strain approximation instead of direct derivation from the basis functions and the displacement fields. The underlying idea embedded in arbitrary finite polygons is named as Polytopal composite finite elements (PCFEM). The PCFEM is accordingly applied within the framework of the nonlocal model of continuum damage mechanics to enhance the description of damage behaviours in which highly localized deformations must be captured accurately. This application is helpful to reduce the mesh-sensitivity and elaborate the process-zone of damage models. Several numerical examples are designed for various cases of fracture to discuss and validate the computational capability of the present method through comparison with published numerical results and experimental data from the literature.
△ Less
Submitted 11 July, 2023;
originally announced September 2023.
-
From Programming Bugs to Multimillion-Dollar Scams: An Analysis of Trapdoor Tokens on Uniswap
Authors:
Phuong Duy Huynh,
Thisal De Silva,
Son Hoang Dau,
Xiaodong Li,
Iqbal Gondal,
Emanuele Viterbo
Abstract:
We investigate in this work a recently emerged type of scam ERC-20 token called Trapdoor, which has cost investors billions of US dollars on Uniswap, the largest decentralised exchange on Ethereum, from 2020 to 2023. In essence, Trapdoor tokens allow users to buy but preventing them from selling by embedding logical bugs and/or owner-only features in their smart contracts. By manually inspecting a…
▽ More
We investigate in this work a recently emerged type of scam ERC-20 token called Trapdoor, which has cost investors billions of US dollars on Uniswap, the largest decentralised exchange on Ethereum, from 2020 to 2023. In essence, Trapdoor tokens allow users to buy but preventing them from selling by embedding logical bugs and/or owner-only features in their smart contracts. By manually inspecting a number of Trapdoor samples, we established the first systematic classification of Trapdoor tokens and a comprehensive list of techniques that scammers used to embed and conceal malicious codes, accompanied by a detailed analysis of representative scam contracts. In particular, we developed TrapdoorAnalyser, a fine-grained detection tool that generates and crosschecks the error-log of a buy-and-sell test and the list of embedded Trapdoor indicators from a contract-semantic check to reliably identify a Trapdoor token. TrapdoorAnalyser not only outperforms the state-of-the-art commercial tool GoPlus in accuracy, but also provides traces of malicious code with a full explanation, which most of the existing tools lack. Using TrapdoorAnalyser, we constructed the very first dataset of about 30,000 Trapdoor and non-Trapdoor tokens on UniswapV2, which allows us to train several machine learning algorithms that can detect with very high accuracy even Trapdoor tokens with no available Solidity source codes.
△ Less
Submitted 19 December, 2024; v1 submitted 9 September, 2023;
originally announced September 2023.
-
A recommender for the management of chronic pain in patients undergoing spinal cord stimulation
Authors:
Tigran Tchrakian,
Mykhaylo Zayats,
Alessandra Pascale,
Dat Huynh,
Pritish Parida,
Carla Agurto Rios,
Sergiy Zhuk,
Jeffrey L. Rogers,
ENVISION Studies Physician Author Group,
Boston Scientific Research Scientists Consortium
Abstract:
Spinal cord stimulation (SCS) is a therapeutic approach used for the management of chronic pain. It involves the delivery of electrical impulses to the spinal cord via an implanted device, which when given suitable stimulus parameters can mask or block pain signals. Selection of optimal stimulation parameters usually happens in the clinic under the care of a provider whereas at-home SCS optimizati…
▽ More
Spinal cord stimulation (SCS) is a therapeutic approach used for the management of chronic pain. It involves the delivery of electrical impulses to the spinal cord via an implanted device, which when given suitable stimulus parameters can mask or block pain signals. Selection of optimal stimulation parameters usually happens in the clinic under the care of a provider whereas at-home SCS optimization is managed by the patient. In this paper, we propose a recommender system for the management of pain in chronic pain patients undergoing SCS. In particular, we use a contextual multi-armed bandit (CMAB) approach to develop a system that recommends SCS settings to patients with the aim of improving their condition. These recommendations, sent directly to patients though a digital health ecosystem, combined with a patient monitoring system closes the therapeutic loop around a chronic pain patient over their entire patient journey. We evaluated the system in a cohort of SCS-implanted ENVISION study subjects (Clinicaltrials.gov ID: NCT03240588) using a combination of quality of life metrics and Patient States (PS), a novel measure of holistic outcomes. SCS recommendations provided statistically significant improvement in clinical outcomes (pain and/or QoL) in 85\% of all subjects (N=21). Among subjects in moderate PS (N=7) prior to receiving recommendations, 100\% showed statistically significant improvements and 5/7 had improved PS dwell time. This analysis suggests SCS patients may benefit from SCS recommendations, resulting in additional clinical improvement on top of benefits already received from SCS therapy.
△ Less
Submitted 6 September, 2023;
originally announced September 2023.
-
Improving the Accuracy of Transaction-Based Ponzi Detection on Ethereum
Authors:
Phuong Duy Huynh,
Son Hoang Dau,
Xiaodong Li,
Phuc Luong,
Emanuele Viterbo
Abstract:
The Ponzi scheme, an old-fashioned fraud, is now popular on the Ethereum blockchain, causing considerable financial losses to many crypto investors. A few Ponzi detection methods have been proposed in the literature, most of which detect a Ponzi scheme based on its smart contract source code. This contract-code-based approach, while achieving very high accuracy, is not robust because a Ponzi devel…
▽ More
The Ponzi scheme, an old-fashioned fraud, is now popular on the Ethereum blockchain, causing considerable financial losses to many crypto investors. A few Ponzi detection methods have been proposed in the literature, most of which detect a Ponzi scheme based on its smart contract source code. This contract-code-based approach, while achieving very high accuracy, is not robust because a Ponzi developer can fool a detection model by obfuscating the opcode or inventing a new profit distribution logic that cannot be detected. On the contrary, a transaction-based approach could improve the robustness of detection because transactions, unlike smart contracts, are harder to be manipulated. However, the current transaction-based detection models achieve fairly low accuracy. In this paper, we aim to improve the accuracy of the transaction-based models by employing time-series features, which turn out to be crucial in capturing the life-time behaviour a Ponzi application but were completely overlooked in previous works. We propose a new set of 85 features (22 known account-based and 63 new time-series features), which allows off-the-shelf machine learning algorithms to achieve up to 30% higher F1-scores compared to existing works.
△ Less
Submitted 17 July, 2024; v1 submitted 30 August, 2023;
originally announced August 2023.
-
HUMS2023 Data Challenge Result Submission
Authors:
Dhiraj Neupane,
Lakpa Dorje Tamang,
Ngoc Dung Huynh,
Mohamed Reda Bouadjenek,
Sunil Aryal
Abstract:
We implemented a simple method for early detection in this research. The implemented methods are plotting the given mat files and analyzing scalogram images generated by performing Continuous Wavelet Transform (CWT) on the samples. Also, finding the mean, standard deviation (STD), and peak-to-peak (P2P) values from each signal also helped detect faulty signs. We have implemented the autoregressive…
▽ More
We implemented a simple method for early detection in this research. The implemented methods are plotting the given mat files and analyzing scalogram images generated by performing Continuous Wavelet Transform (CWT) on the samples. Also, finding the mean, standard deviation (STD), and peak-to-peak (P2P) values from each signal also helped detect faulty signs. We have implemented the autoregressive integrated moving average (ARIMA) method to track the progression.
△ Less
Submitted 14 July, 2023; v1 submitted 7 July, 2023;
originally announced July 2023.
-
The STOIC2021 COVID-19 AI challenge: applying reusable training methodologies to private data
Authors:
Luuk H. Boulogne,
Julian Lorenz,
Daniel Kienzle,
Robin Schon,
Katja Ludwig,
Rainer Lienhart,
Simon Jegou,
Guang Li,
Cong Chen,
Qi Wang,
Derik Shi,
Mayug Maniparambil,
Dominik Muller,
Silvan Mertes,
Niklas Schroter,
Fabio Hellmann,
Miriam Elia,
Ine Dirks,
Matias Nicolas Bossa,
Abel Diaz Berenguer,
Tanmoy Mukherjee,
Jef Vandemeulebroucke,
Hichem Sahli,
Nikos Deligiannis,
Panagiotis Gonidakis
, et al. (13 additional authors not shown)
Abstract:
Challenges drive the state-of-the-art of automated medical image analysis. The quantity of public training data that they provide can limit the performance of their solutions. Public access to the training methodology for these solutions remains absent. This study implements the Type Three (T3) challenge format, which allows for training solutions on private data and guarantees reusable training m…
▽ More
Challenges drive the state-of-the-art of automated medical image analysis. The quantity of public training data that they provide can limit the performance of their solutions. Public access to the training methodology for these solutions remains absent. This study implements the Type Three (T3) challenge format, which allows for training solutions on private data and guarantees reusable training methodologies. With T3, challenge organizers train a codebase provided by the participants on sequestered training data. T3 was implemented in the STOIC2021 challenge, with the goal of predicting from a computed tomography (CT) scan whether subjects had a severe COVID-19 infection, defined as intubation or death within one month. STOIC2021 consisted of a Qualification phase, where participants developed challenge solutions using 2000 publicly available CT scans, and a Final phase, where participants submitted their training methodologies with which solutions were trained on CT scans of 9724 subjects. The organizers successfully trained six of the eight Final phase submissions. The submitted codebases for training and running inference were released publicly. The winning solution obtained an area under the receiver operating characteristic curve for discerning between severe and non-severe COVID-19 of 0.815. The Final phase solutions of all finalists improved upon their Qualification phase solutions.HSUXJM-TNZF9CHSUXJM-TNZF9C
△ Less
Submitted 25 June, 2023; v1 submitted 18 June, 2023;
originally announced June 2023.
-
Why Are Conversational Assistants Still Black Boxes? The Case For Transparency
Authors:
Trung Dong Huynh,
William Seymour,
Luc Moreau,
Jose Such
Abstract:
Much has been written about privacy in the context of conversational and voice assistants. Yet, there have been remarkably few developments in terms of the actual privacy offered by these devices. But how much of this is due to the technical and design limitations of speech as an interaction modality? In this paper, we set out to reframe the discussion on why commercial conversational assistants d…
▽ More
Much has been written about privacy in the context of conversational and voice assistants. Yet, there have been remarkably few developments in terms of the actual privacy offered by these devices. But how much of this is due to the technical and design limitations of speech as an interaction modality? In this paper, we set out to reframe the discussion on why commercial conversational assistants do not offer meaningful privacy and transparency by demonstrating how they \emph{could}. By instrumenting the open-source voice assistant Mycroft to capture audit trails for data access, we demonstrate how such functionality could be integrated into big players in the sector like Alexa and Google Assistant. We show that this problem can be solved with existing technology and open standards and is thus fundamentally a business decision rather than a technical limitation.
△ Less
Submitted 8 June, 2023;
originally announced June 2023.
-
#REVAL: a semantic evaluation framework for hashtag recommendation
Authors:
Areej Alsini,
Du Q. Huynh,
Amitava Datta
Abstract:
Automatic evaluation of hashtag recommendation models is a fundamental task in many online social network systems. In the traditional evaluation method, the recommended hashtags from an algorithm are firstly compared with the ground truth hashtags for exact correspondences. The number of exact matches is then used to calculate the hit rate, hit ratio, precision, recall, or F1-score. This way of ev…
▽ More
Automatic evaluation of hashtag recommendation models is a fundamental task in many online social network systems. In the traditional evaluation method, the recommended hashtags from an algorithm are firstly compared with the ground truth hashtags for exact correspondences. The number of exact matches is then used to calculate the hit rate, hit ratio, precision, recall, or F1-score. This way of evaluating hashtag similarities is inadequate as it ignores the semantic correlation between the recommended and ground truth hashtags. To tackle this problem, we propose a novel semantic evaluation framework for hashtag recommendation, called #REval. This framework includes an internal module referred to as BERTag, which automatically learns the hashtag embeddings. We investigate on how the #REval framework performs under different word embedding methods and different numbers of synonyms and hashtags in the recommendation using our proposed #REval-hit-ratio measure. Our experiments of the proposed framework on three large datasets show that #REval gave more meaningful hashtag synonyms for hashtag recommendation evaluation. Our analysis also highlights the sensitivity of the framework to the word embedding technique, with #REval based on BERTag more superior over #REval based on FastText and Word2Vec.
△ Less
Submitted 24 May, 2023;
originally announced May 2023.
-
A Methodology and Software Architecture to Support Explainability-by-Design
Authors:
Trung Dong Huynh,
Niko Tsakalakis,
Ayah Helal,
Sophie Stalla-Bourdillon,
Luc Moreau
Abstract:
Algorithms play a crucial role in many technological systems that control or affect various aspects of our lives. As a result, providing explanations for their decisions to address the needs of users and organisations is increasingly expected by laws, regulations, codes of conduct, and the public. However, as laws and regulations do not prescribe how to meet such expectations, organisations are of…
▽ More
Algorithms play a crucial role in many technological systems that control or affect various aspects of our lives. As a result, providing explanations for their decisions to address the needs of users and organisations is increasingly expected by laws, regulations, codes of conduct, and the public. However, as laws and regulations do not prescribe how to meet such expectations, organisations are often left to devise their own approaches to explainability, inevitably increasing the cost of compliance and good governance. Hence, we envision Explainability-by-Design, a holistic methodology characterised by proactive measures to include explanation capability in the design of decision-making systems. The methodology consists of three phases: (A) Explanation Requirement Analysis, (B) Explanation Technical Design, and (C) Explanation Validation. This paper describes phase (B), a technical workflow to implement explanation capability from requirements elicited by domain experts for a specific application context. Outputs of this phase are a set of configurations, allowing a reusable explanation service to exploit logs provided by the target application to create provenance traces of the application's decisions. The provenance then can be queried to extract relevant data points, which can be used in explanation plans to construct explanations personalised to their consumers. Following the workflow, organisations can design their decision-making systems to produce explanations that meet the specified requirements. To facilitate the process, we present a software architecture with reusable components to incorporate the resulting explanation capability into an application. Finally, we applied the workflow to two application scenarios and measured the associated development costs. It was shown that the approach is tractable in terms of development time, which can be as low as two hours per sentence.
△ Less
Submitted 25 May, 2023; v1 submitted 13 June, 2022;
originally announced June 2022.
-
A taxonomy of explanations to support Explainability-by-Design
Authors:
Niko Tsakalakis,
Sophie Stalla-Bourdillon,
Trung Dong Huynh,
Luc Moreau
Abstract:
As automated decision-making solutions are increasingly applied to all aspects of everyday life, capabilities to generate meaningful explanations for a variety of stakeholders (i.e., decision-makers, recipients of decisions, auditors, regulators...) become crucial. In this paper, we present a taxonomy of explanations that was developed as part of a holistic 'Explainability-by-Design' approach for…
▽ More
As automated decision-making solutions are increasingly applied to all aspects of everyday life, capabilities to generate meaningful explanations for a variety of stakeholders (i.e., decision-makers, recipients of decisions, auditors, regulators...) become crucial. In this paper, we present a taxonomy of explanations that was developed as part of a holistic 'Explainability-by-Design' approach for the purposes of the project PLEAD. The taxonomy was built with a view to produce explanations for a wide range of requirements stemming from a variety of regulatory frameworks or policies set at the organizational level either to translate high-level compliance requirements or to meet business needs. The taxonomy comprises nine dimensions. It is used as a stand-alone classifier of explanations conceived as detective controls, in order to aid supportive automated compliance strategies. A machinereadable format of the taxonomy is provided in the form of a light ontology and the benefits of starting the Explainability-by-Design journey with such a taxonomy are demonstrated through a series of examples.
△ Less
Submitted 14 November, 2024; v1 submitted 9 June, 2022;
originally announced June 2022.
-
TreePIR: Efficient Private Retrieval of Merkle Proofs via Tree Colorings with Fast Indexing and Zero Storage Overhead
Authors:
Son Hoang Dau,
Quang Cao,
Rinaldo Gagiano,
Duy Huynh,
Xun Yi,
Phuc Lu Le,
Quang-Hung Luu,
Emanuele Viterbo,
Yu-Chih Huang,
Jingge Zhu,
Mohammad M. Jalalzai,
Chen Feng
Abstract:
A Batch Private Information Retrieval (batch-PIR) scheme allows a client to retrieve multiple data items from a database without revealing them to the storage server(s). Most existing approaches for batch-PIR are based on batch codes, in particular, probabilistic batch codes (PBC) (Angel et al. S&P'18), which incur large storage overheads. In this work, we show that \textit{zero} storage overhead…
▽ More
A Batch Private Information Retrieval (batch-PIR) scheme allows a client to retrieve multiple data items from a database without revealing them to the storage server(s). Most existing approaches for batch-PIR are based on batch codes, in particular, probabilistic batch codes (PBC) (Angel et al. S&P'18), which incur large storage overheads. In this work, we show that \textit{zero} storage overhead is achievable for tree-shaped databases. In particular, we develop TreePIR, a novel approach tailored made for private retrieval of the set of nodes along an arbitrary root-to-leaf path in a Merkle tree with no storage redundancy. This type of trees has been widely implemented in many real-world systems such as Amazon DynamoDB, Google's Certificate Transparency, and blockchains. Tree nodes along a root-to-leaf path forms the well-known Merkle proof. TreePIR, which employs a novel tree coloring, outperforms PBC, a fundamental component in state-of-the-art batch-PIR schemes (Angel et al. S&P'18, Mughees-Ren S&P'23, Liu et al. S&P'24), in all metrics, achieving $3\times$ lower total storage and $1.5$-$2\times$ lower computation and communication costs. Most notably, TreePIR has $8$-$160\times$ lower setup time and its polylog-complexity indexing algorithm is $19$-$160\times$ faster than PBC for trees of $2^{10}$-$2^{24}$ leaves.
△ Less
Submitted 4 June, 2024; v1 submitted 10 May, 2022;
originally announced May 2022.
-
Adversarial Attacks on Speech Recognition Systems for Mission-Critical Applications: A Survey
Authors:
Ngoc Dung Huynh,
Mohamed Reda Bouadjenek,
Imran Razzak,
Kevin Lee,
Chetan Arora,
Ali Hassani,
Arkady Zaslavsky
Abstract:
A Machine-Critical Application is a system that is fundamentally necessary to the success of specific and sensitive operations such as search and recovery, rescue, military, and emergency management actions. Recent advances in Machine Learning, Natural Language Processing, voice recognition, and speech processing technologies have naturally allowed the development and deployment of speech-based co…
▽ More
A Machine-Critical Application is a system that is fundamentally necessary to the success of specific and sensitive operations such as search and recovery, rescue, military, and emergency management actions. Recent advances in Machine Learning, Natural Language Processing, voice recognition, and speech processing technologies have naturally allowed the development and deployment of speech-based conversational interfaces to interact with various machine-critical applications. While these conversational interfaces have allowed users to give voice commands to carry out strategic and critical activities, their robustness to adversarial attacks remains uncertain and unclear. Indeed, Adversarial Artificial Intelligence (AI) which refers to a set of techniques that attempt to fool machine learning models with deceptive data, is a growing threat in the AI and machine learning research community, in particular for machine-critical applications. The most common reason of adversarial attacks is to cause a malfunction in a machine learning model. An adversarial attack might entail presenting a model with inaccurate or fabricated samples as it's training data, or introducing maliciously designed data to deceive an already trained model. While focusing on speech recognition for machine-critical applications, in this paper, we first review existing speech recognition techniques, then, we investigate the effectiveness of adversarial attacks and defenses against these systems, before outlining research challenges, defense recommendations, and future work. This paper is expected to serve researchers and practitioners as a reference to help them in understanding the challenges, position themselves and, ultimately, help them to improve existing models of speech recognition for mission-critical applications. Keywords: Mission-Critical Applications, Adversarial AI, Speech Recognition Systems.
△ Less
Submitted 21 February, 2022;
originally announced February 2022.
-
Open-Vocabulary Instance Segmentation via Robust Cross-Modal Pseudo-Labeling
Authors:
Dat Huynh,
Jason Kuen,
Zhe Lin,
Jiuxiang Gu,
Ehsan Elhamifar
Abstract:
Open-vocabulary instance segmentation aims at segmenting novel classes without mask annotations. It is an important step toward reducing laborious human supervision. Most existing works first pretrain a model on captioned images covering many novel classes and then finetune it on limited base classes with mask annotations. However, the high-level textual information learned from caption pretrainin…
▽ More
Open-vocabulary instance segmentation aims at segmenting novel classes without mask annotations. It is an important step toward reducing laborious human supervision. Most existing works first pretrain a model on captioned images covering many novel classes and then finetune it on limited base classes with mask annotations. However, the high-level textual information learned from caption pretraining alone cannot effectively encode the details required for pixel-wise segmentation. To address this, we propose a cross-modal pseudo-labeling framework, which generates training pseudo masks by aligning word semantics in captions with visual features of object masks in images. Thus, our framework is capable of labeling novel classes in captions via their word semantics to self-train a student model. To account for noises in pseudo masks, we design a robust student model that selectively distills mask knowledge by estimating the mask noise levels, hence mitigating the adverse impact of noisy pseudo masks. By extensive experiments, we show the effectiveness of our framework, where we significantly improve mAP score by 4.5% on MS-COCO and 5.1% on the large-scale Open Images & Conceptual Captions datasets compared to the state-of-the-art.
△ Less
Submitted 19 April, 2022; v1 submitted 24 November, 2021;
originally announced November 2021.
-
Leaf Recognition Using Convolutional Neural Networks Based Features
Authors:
Boi M. Quach,
Dinh V. Cuong,
Nhung Pham,
Dang Huynh,
Binh T. Nguyen
Abstract:
There is a warning light for the loss of plant habitats worldwide that entails concerted efforts to conserve plant biodiversity. Thus, plant species classification is of crucial importance to address this environmental challenge. In recent years, there is a considerable increase in the number of studies related to plant taxonomy. While some researchers try to improve their recognition performance…
▽ More
There is a warning light for the loss of plant habitats worldwide that entails concerted efforts to conserve plant biodiversity. Thus, plant species classification is of crucial importance to address this environmental challenge. In recent years, there is a considerable increase in the number of studies related to plant taxonomy. While some researchers try to improve their recognition performance using novel approaches, others concentrate on computational optimization for their framework. In addition, a few studies are diving into feature extraction to gain significantly in terms of accuracy. In this paper, we propose an effective method for the leaf recognition problem. In our proposed approach, a leaf goes through some pre-processing to extract its refined color image, vein image, xy-projection histogram, handcrafted shape, texture features, and Fourier descriptors. These attributes are then transformed into a better representation by neural network-based encoders before a support vector machine (SVM) model is utilized to classify different leaves. Overall, our approach performs a state-of-the-art result on the Flavia leaf dataset, achieving the accuracy of 99.58\% on test sets under random 10-fold cross-validation and bypassing the previous methods. We also release our codes (Scripts are available at https://github.com/dinhvietcuong1996/LeafRecognition) for contributing to the research community in the leaf classification problem.
△ Less
Submitted 2 September, 2021; v1 submitted 3 August, 2021;
originally announced August 2021.
-
Compositional Fine-Grained Low-Shot Learning
Authors:
Dat Huynh,
Ehsan Elhamifar
Abstract:
We develop a novel compositional generative model for zero- and few-shot learning to recognize fine-grained classes with a few or no training samples. Our key observation is that generating holistic features for fine-grained classes fails to capture small attribute differences between classes. Therefore, we propose a feature composition framework that learns to extract attribute features from trai…
▽ More
We develop a novel compositional generative model for zero- and few-shot learning to recognize fine-grained classes with a few or no training samples. Our key observation is that generating holistic features for fine-grained classes fails to capture small attribute differences between classes. Therefore, we propose a feature composition framework that learns to extract attribute features from training samples and combines them to construct fine-grained features for rare and unseen classes. Feature composition allows us to not only selectively compose features of every class from only relevant training samples, but also obtain diversity among composed features via changing samples used for the composition. In addition, instead of building holistic features for classes, we use our attribute features to form dense representations capable of capturing fine-grained attribute details of classes. We propose a training scheme that uses a discriminative model to construct features that are subsequently used to train the model itself. Therefore, we directly train the discriminative model on the composed features without learning a separate generative model. We conduct experiments on four popular datasets of DeepFashion, AWA2, CUB, and SUN, showing the effectiveness of our method.
△ Less
Submitted 21 May, 2021;
originally announced May 2021.
-
Provenance Graph Kernel
Authors:
David Kohan Marzagão,
Trung Dong Huynh,
Ayah Helal,
Sean Baccas,
Luc Moreau
Abstract:
Provenance is a record that describes how entities, activities, and agents have influenced a piece of data; it is commonly represented as graphs with relevant labels on both their nodes and edges. With the growing adoption of provenance in a wide range of application domains, users are increasingly confronted with an abundance of graph data, which may prove challenging to process. Graph kernels, o…
▽ More
Provenance is a record that describes how entities, activities, and agents have influenced a piece of data; it is commonly represented as graphs with relevant labels on both their nodes and edges. With the growing adoption of provenance in a wide range of application domains, users are increasingly confronted with an abundance of graph data, which may prove challenging to process. Graph kernels, on the other hand, have been successfully used to efficiently analyse graphs. In this paper, we introduce a novel graph kernel called provenance kernel, which is inspired by and tailored for provenance data. It decomposes a provenance graph into tree-patterns rooted at a given node and considers the labels of edges and nodes up to a certain distance from the root. We employ provenance kernels to classify provenance graphs from three application domains. Our evaluation shows that they perform well in terms of classification accuracy and yield competitive results when compared against existing graph kernel methods and the provenance network analytics method while more efficient in computing time. Moreover, the provenance types used by provenance kernels also help improve the explainability of predictive models built on them.
△ Less
Submitted 14 September, 2021; v1 submitted 20 October, 2020;
originally announced October 2020.
-
Deep learning for detection and segmentation of artefact and disease instances in gastrointestinal endoscopy
Authors:
Sharib Ali,
Mariia Dmitrieva,
Noha Ghatwary,
Sophia Bano,
Gorkem Polat,
Alptekin Temizel,
Adrian Krenzer,
Amar Hekalo,
Yun Bo Guo,
Bogdan Matuszewski,
Mourad Gridach,
Irina Voiculescu,
Vishnusai Yoganand,
Arnav Chavan,
Aryan Raj,
Nhan T. Nguyen,
Dat Q. Tran,
Le Duy Huynh,
Nicolas Boutry,
Shahadate Rezvy,
Haijian Chen,
Yoon Ho Choi,
Anand Subramanian,
Velmurugan Balasubramanian,
Xiaohong W. Gao
, et al. (12 additional authors not shown)
Abstract:
The Endoscopy Computer Vision Challenge (EndoCV) is a crowd-sourcing initiative to address eminent problems in developing reliable computer aided detection and diagnosis endoscopy systems and suggest a pathway for clinical translation of technologies. Whilst endoscopy is a widely used diagnostic and treatment tool for hollow-organs, there are several core challenges often faced by endoscopists, ma…
▽ More
The Endoscopy Computer Vision Challenge (EndoCV) is a crowd-sourcing initiative to address eminent problems in developing reliable computer aided detection and diagnosis endoscopy systems and suggest a pathway for clinical translation of technologies. Whilst endoscopy is a widely used diagnostic and treatment tool for hollow-organs, there are several core challenges often faced by endoscopists, mainly: 1) presence of multi-class artefacts that hinder their visual interpretation, and 2) difficulty in identifying subtle precancerous precursors and cancer abnormalities. Artefacts often affect the robustness of deep learning methods applied to the gastrointestinal tract organs as they can be confused with tissue of interest. EndoCV2020 challenges are designed to address research questions in these remits. In this paper, we present a summary of methods developed by the top 17 teams and provide an objective comparison of state-of-the-art methods and methods designed by the participants for two sub-challenges: i) artefact detection and segmentation (EAD2020), and ii) disease detection and segmentation (EDD2020). Multi-center, multi-organ, multi-class, and multi-modal clinical endoscopy datasets were compiled for both EAD2020 and EDD2020 sub-challenges. The out-of-sample generalization ability of detection algorithms was also evaluated. Whilst most teams focused on accuracy improvements, only a few methods hold credibility for clinical usability. The best performing teams provided solutions to tackle class imbalance, and variabilities in size, origin, modality and occurrences by exploring data augmentation, data fusion, and optimal class thresholding techniques.
△ Less
Submitted 17 February, 2021; v1 submitted 12 October, 2020;
originally announced October 2020.
-
Scene Gated Social Graph: Pedestrian Trajectory Prediction Based on Dynamic Social Graphs and Scene Constraints
Authors:
Hao Xue,
Du Q. Huynh,
Mark Reynolds
Abstract:
Pedestrian trajectory prediction is valuable for understanding human motion behaviors and it is challenging because of the social influence from other pedestrians, the scene constraints and the multimodal possibilities of predicted trajectories. Most existing methods only focus on two of the above three key elements. In order to jointly consider all these elements, we propose a novel trajectory pr…
▽ More
Pedestrian trajectory prediction is valuable for understanding human motion behaviors and it is challenging because of the social influence from other pedestrians, the scene constraints and the multimodal possibilities of predicted trajectories. Most existing methods only focus on two of the above three key elements. In order to jointly consider all these elements, we propose a novel trajectory prediction method named Scene Gated Social Graph (SGSG). In the proposed SGSG, dynamic graphs are used to describe the social relationship among pedestrians. The social and scene influences are taken into account through the scene gated social graph features which combine the encoded social graph features and semantic scene features. In addition, a VAE module is incorporated to learn the scene gated social feature and sample latent variables for generating multiple trajectories that are socially and environmentally acceptable. We compare our SGSG against twenty state-of-the-art pedestrian trajectory prediction methods and the results show that the proposed method achieves superior performance on two widely used trajectory prediction benchmarks.
△ Less
Submitted 12 October, 2020;
originally announced October 2020.
-
Hit ratio: An Evaluation Metric for Hashtag Recommendation
Authors:
Areej Alsini,
Du Q. Huynh,
Amitava Datta
Abstract:
Hashtag recommendation is a crucial task, especially with an increase of interest in using social media platforms such as Twitter in the last decade. Hashtag recommendation systems automatically suggest hashtags to a user while writing a tweet. Most of the research in the area of hashtag recommendation have used classical metrics such as hit rate, precision, recall, and F1-score to measure the acc…
▽ More
Hashtag recommendation is a crucial task, especially with an increase of interest in using social media platforms such as Twitter in the last decade. Hashtag recommendation systems automatically suggest hashtags to a user while writing a tweet. Most of the research in the area of hashtag recommendation have used classical metrics such as hit rate, precision, recall, and F1-score to measure the accuracy of hashtag recommendation systems. These metrics are based on the exact match of the recommended hashtags with their corresponding ground truth. However, it is not clear how adequate these metrics to evaluate hashtag recommendation. The research question that we are interested in seeking an answer is: are these metrics adequate for evaluating hashtag recommendation systems when the numbers of ground truth hashtags in tweets are highly variable? In this paper, we propose a new metric which we call hit ratio for hashtag recommendation. Extensive evaluation through hypothetical examples and real-world application across a range of hashtag recommendation models indicate that the hit ratio is a useful metric. A comparison of hit ratio with the classical evaluation metrics reveals their limitations.
△ Less
Submitted 2 October, 2020;
originally announced October 2020.
-
A Simple and Efficient Ensemble Classifier Combining Multiple Neural Network Models on Social Media Datasets in Vietnamese
Authors:
Huy Duc Huynh,
Hang Thi-Thuy Do,
Kiet Van Nguyen,
Ngan Luu-Thuy Nguyen
Abstract:
Text classification is a popular topic of natural language processing, which has currently attracted numerous research efforts worldwide. The significant increase of data in social media requires the vast attention of researchers to analyze such data. There are various studies in this field in many languages but limited to the Vietnamese language. Therefore, this study aims to classify Vietnamese…
▽ More
Text classification is a popular topic of natural language processing, which has currently attracted numerous research efforts worldwide. The significant increase of data in social media requires the vast attention of researchers to analyze such data. There are various studies in this field in many languages but limited to the Vietnamese language. Therefore, this study aims to classify Vietnamese texts on social media from three different Vietnamese benchmark datasets. Advanced deep learning models are used and optimized in this study, including CNN, LSTM, and their variants. We also implement the BERT, which has never been applied to the datasets. Our experiments find a suitable model for classification tasks on each specific dataset. To take advantage of single models, we propose an ensemble model, combining the highest-performance models. Our single models reach positive results on each dataset. Moreover, our ensemble model achieves the best performance on all three datasets. We reach 86.96% of F1- score for the HSD-VLSP dataset, 65.79% of F1-score for the UIT-VSMEC dataset, 92.79% and 89.70% for sentiments and topics on the UIT-VSFC dataset, respectively. Therefore, our models achieve better performances as compared to previous studies on these datasets.
△ Less
Submitted 28 September, 2020; v1 submitted 28 September, 2020;
originally announced September 2020.
-
Cryptotree: fast and accurate predictions on encrypted structured data
Authors:
Daniel Huynh
Abstract:
Applying machine learning algorithms to private data, such as financial or medical data, while preserving their confidentiality, is a difficult task. Homomorphic Encryption (HE) is acknowledged for its ability to allow computation on encrypted data, where both the input and output are encrypted, which therefore enables secure inference on private data. Nonetheless, because of the constraints of HE…
▽ More
Applying machine learning algorithms to private data, such as financial or medical data, while preserving their confidentiality, is a difficult task. Homomorphic Encryption (HE) is acknowledged for its ability to allow computation on encrypted data, where both the input and output are encrypted, which therefore enables secure inference on private data. Nonetheless, because of the constraints of HE, such as its inability to evaluate non-polynomial functions or to perform arbitrary matrix multiplication efficiently, only inference of linear models seem usable in practice in the HE paradigm so far.
In this paper, we propose Cryptotree, a framework that enables the use of Random Forests (RF), a very powerful learning procedure compared to linear regression, in the context of HE. To this aim, we first convert a regular RF to a Neural RF, then adapt this to fit the HE scheme CKKS, which allows HE operations on real values. Through SIMD operations, we are able to have quick inference and prediction results better than the original RF on encrypted data.
△ Less
Submitted 15 June, 2020;
originally announced June 2020.
-
Binarizing MobileNet via Evolution-based Searching
Authors:
Hai Phan,
Zechun Liu,
Dang Huynh,
Marios Savvides,
Kwang-Ting Cheng,
Zhiqiang Shen
Abstract:
Binary Neural Networks (BNNs), known to be one among the effectively compact network architectures, have achieved great outcomes in the visual tasks. Designing efficient binary architectures is not trivial due to the binary nature of the network. In this paper, we propose a use of evolutionary search to facilitate the construction and training scheme when binarizing MobileNet, a compact network wi…
▽ More
Binary Neural Networks (BNNs), known to be one among the effectively compact network architectures, have achieved great outcomes in the visual tasks. Designing efficient binary architectures is not trivial due to the binary nature of the network. In this paper, we propose a use of evolutionary search to facilitate the construction and training scheme when binarizing MobileNet, a compact network with separable depth-wise convolution. Inspired by one-shot architecture search frameworks, we manipulate the idea of group convolution to design efficient 1-Bit Convolutional Neural Networks (CNNs), assuming an approximately optimal trade-off between computational cost and model accuracy. Our objective is to come up with a tiny yet efficient binary neural architecture by exploring the best candidates of the group convolution while optimizing the model performance in terms of complexity and latency. The approach is threefold. First, we train strong baseline binary networks with a wide range of random group combinations at each convolutional layer. This set-up gives the binary neural networks a capability of preserving essential information through layers. Second, to find a good set of hyperparameters for group convolutions we make use of the evolutionary search which leverages the exploration of efficient 1-bit models. Lastly, these binary models are trained from scratch in a usual manner to achieve the final binary model. Various experiments on ImageNet are conducted to show that following our construction guideline, the final model achieves 60.09% Top-1 accuracy and outperforms the state-of-the-art CI-BCNN with the same computational cost.
△ Less
Submitted 15 May, 2020; v1 submitted 13 May, 2020;
originally announced May 2020.
-
Take a NAP: Non-Autoregressive Prediction for Pedestrian Trajectories
Authors:
Hao Xue,
Du. Q. Huynh,
Mark Reynolds
Abstract:
Pedestrian trajectory prediction is a challenging task as there are three properties of human movement behaviors which need to be addressed, namely, the social influence from other pedestrians, the scene constraints, and the multimodal (multiroute) nature of predictions. Although existing methods have explored these key properties, the prediction process of these methods is autoregressive. This me…
▽ More
Pedestrian trajectory prediction is a challenging task as there are three properties of human movement behaviors which need to be addressed, namely, the social influence from other pedestrians, the scene constraints, and the multimodal (multiroute) nature of predictions. Although existing methods have explored these key properties, the prediction process of these methods is autoregressive. This means they can only predict future locations sequentially. In this paper, we present NAP, a non-autoregressive method for trajectory prediction. Our method comprises specifically designed feature encoders and a latent variable generator to handle the three properties above. It also has a time-agnostic context generator and a time-specific context generator for non-autoregressive prediction. Through extensive experiments that compare NAP against several recent methods, we show that NAP has state-of-the-art trajectory prediction performance.
△ Less
Submitted 21 April, 2020;
originally announced April 2020.
-
VerSe: A Vertebrae Labelling and Segmentation Benchmark for Multi-detector CT Images
Authors:
Anjany Sekuboyina,
Malek E. Husseini,
Amirhossein Bayat,
Maximilian Löffler,
Hans Liebl,
Hongwei Li,
Giles Tetteh,
Jan Kukačka,
Christian Payer,
Darko Štern,
Martin Urschler,
Maodong Chen,
Dalong Cheng,
Nikolas Lessmann,
Yujin Hu,
Tianfu Wang,
Dong Yang,
Daguang Xu,
Felix Ambellan,
Tamaz Amiranashvili,
Moritz Ehlke,
Hans Lamecker,
Sebastian Lehnert,
Marilia Lirio,
Nicolás Pérez de Olaguer
, et al. (44 additional authors not shown)
Abstract:
Vertebral labelling and segmentation are two fundamental tasks in an automated spine processing pipeline. Reliable and accurate processing of spine images is expected to benefit clinical decision-support systems for diagnosis, surgery planning, and population-based analysis on spine and bone health. However, designing automated algorithms for spine processing is challenging predominantly due to co…
▽ More
Vertebral labelling and segmentation are two fundamental tasks in an automated spine processing pipeline. Reliable and accurate processing of spine images is expected to benefit clinical decision-support systems for diagnosis, surgery planning, and population-based analysis on spine and bone health. However, designing automated algorithms for spine processing is challenging predominantly due to considerable variations in anatomy and acquisition protocols and due to a severe shortage of publicly available data. Addressing these limitations, the Large Scale Vertebrae Segmentation Challenge (VerSe) was organised in conjunction with the International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI) in 2019 and 2020, with a call for algorithms towards labelling and segmentation of vertebrae. Two datasets containing a total of 374 multi-detector CT scans from 355 patients were prepared and 4505 vertebrae have individually been annotated at voxel-level by a human-machine hybrid algorithm (https://osf.io/nqjyw/, https://osf.io/t98fz/). A total of 25 algorithms were benchmarked on these datasets. In this work, we present the the results of this evaluation and further investigate the performance-variation at vertebra-level, scan-level, and at different fields-of-view. We also evaluate the generalisability of the approaches to an implicit domain shift in data by evaluating the top performing algorithms of one challenge iteration on data from the other iteration. The principal takeaway from VerSe: the performance of an algorithm in labelling and segmenting a spine scan hinges on its ability to correctly identify vertebrae in cases of rare anatomical variations. The content and code concerning VerSe can be accessed at: https://github.com/anjany/verse.
△ Less
Submitted 5 April, 2022; v1 submitted 24 January, 2020;
originally announced January 2020.
-
Hate Speech Detection on Vietnamese Social Media Text using the Bidirectional-LSTM Model
Authors:
Hang Thi-Thuy Do,
Huy Duc Huynh,
Kiet Van Nguyen,
Ngan Luu-Thuy Nguyen,
Anh Gia-Tuan Nguyen
Abstract:
In this paper, we describe our system which participates in the shared task of Hate Speech Detection on Social Networks of VLSP 2019 evaluation campaign. We are provided with the pre-labeled dataset and an unlabeled dataset for social media comments or posts. Our mission is to pre-process and build machine learning models to classify comments/posts. In this report, we use Bidirectional Long Short-…
▽ More
In this paper, we describe our system which participates in the shared task of Hate Speech Detection on Social Networks of VLSP 2019 evaluation campaign. We are provided with the pre-labeled dataset and an unlabeled dataset for social media comments or posts. Our mission is to pre-process and build machine learning models to classify comments/posts. In this report, we use Bidirectional Long Short-Term Memory to build the model that can predict labels for social media text according to Clean, Offensive, Hate. With this system, we achieve comparative results with 71.43% on the public standard test set of VLSP 2019.
△ Less
Submitted 9 November, 2019;
originally announced November 2019.
-
MoBiNet: A Mobile Binary Network for Image Classification
Authors:
Hai Phan,
Dang Huynh,
Yihui He,
Marios Savvides,
Zhiqiang Shen
Abstract:
MobileNet and Binary Neural Networks are two among the most widely used techniques to construct deep learning models for performing a variety of tasks on mobile and embedded platforms.In this paper, we present a simple yet efficient scheme to exploit MobileNet binarization at activation function and model weights. However, training a binary network from scratch with separable depth-wise and point-…
▽ More
MobileNet and Binary Neural Networks are two among the most widely used techniques to construct deep learning models for performing a variety of tasks on mobile and embedded platforms.In this paper, we present a simple yet efficient scheme to exploit MobileNet binarization at activation function and model weights. However, training a binary network from scratch with separable depth-wise and point-wise convolutions in case of MobileNet is not trivial and prone to divergence. To tackle this training issue, we propose a novel neural network architecture, namely MoBiNet - Mobile Binary Network in which skip connections are manipulated to prevent information loss and vanishing gradient, thus facilitate the training process. More importantly, while existing binary neural networks often make use of cumbersome backbones such as Alex-Net, ResNet, VGG-16 with float-type pre-trained weights initialization, our MoBiNet focuses on binarizing the already-compressed neural networks like MobileNet without the need of a pre-trained model to start with. Therefore, our proposal results in an effectively small model while keeping the accuracy comparable to existing ones. Experiments on ImageNet dataset show the potential of the MoBiNet as it achieves 54.40% top-1 accuracy and dramatically reduces the computational cost with binary operators.
△ Less
Submitted 30 July, 2019; v1 submitted 29 July, 2019;
originally announced July 2019.
-
Loss Switching Fusion with Similarity Search for Video Classification
Authors:
Lei Wang,
Du Q. Huynh,
Moussa Reda Mansour
Abstract:
From video streaming to security and surveillance applications, video data play an important role in our daily living today. However, managing a large amount of video data and retrieving the most useful information for the user remain a challenging task. In this paper, we propose a novel video classification system that would benefit the scene understanding task. We define our classification probl…
▽ More
From video streaming to security and surveillance applications, video data play an important role in our daily living today. However, managing a large amount of video data and retrieving the most useful information for the user remain a challenging task. In this paper, we propose a novel video classification system that would benefit the scene understanding task. We define our classification problem as classifying background and foreground motions using the same feature representation for outdoor scenes. This means that the feature representation needs to be robust enough and adaptable to different classification tasks. We propose a lightweight Loss Switching Fusion Network (LSFNet) for the fusion of spatiotemporal descriptors and a similarity search scheme with soft voting to boost the classification performance. The proposed system has a variety of potential applications such as content-based video clustering, video filtering, etc. Evaluation results on two private industry datasets show that our system is robust in both classifying different background motions and detecting human motions from these background motions.
△ Less
Submitted 27 June, 2019;
originally announced June 2019.
-
A Comparative Review of Recent Kinect-based Action Recognition Algorithms
Authors:
Lei Wang,
Du Q. Huynh,
Piotr Koniusz
Abstract:
Video-based human action recognition is currently one of the most active research areas in computer vision. Various research studies indicate that the performance of action recognition is highly dependent on the type of features being extracted and how the actions are represented. Since the release of the Kinect camera, a large number of Kinect-based human action recognition techniques have been p…
▽ More
Video-based human action recognition is currently one of the most active research areas in computer vision. Various research studies indicate that the performance of action recognition is highly dependent on the type of features being extracted and how the actions are represented. Since the release of the Kinect camera, a large number of Kinect-based human action recognition techniques have been proposed in the literature. However, there still does not exist a thorough comparison of these Kinect-based techniques under the grouping of feature types, such as handcrafted versus deep learning features and depth-based versus skeleton-based features. In this paper, we analyze and compare ten recent Kinect-based algorithms for both cross-subject action recognition and cross-view action recognition using six benchmark datasets. In addition, we have implemented and improved some of these techniques and included their variants in the comparison. Our experiments show that the majority of methods perform better on cross-subject action recognition than cross-view action recognition, that skeleton-based features are more robust for cross-view recognition than depth-based features, and that deep learning features are suitable for large datasets.
△ Less
Submitted 24 June, 2019;
originally announced June 2019.
-
Hallucinating IDT Descriptors and I3D Optical Flow Features for Action Recognition with CNNs
Authors:
Lei Wang,
Piotr Koniusz,
Du Q. Huynh
Abstract:
In this paper, we revive the use of old-fashioned handcrafted video representations for action recognition and put new life into these techniques via a CNN-based hallucination step. Despite of the use of RGB and optical flow frames, the I3D model (amongst others) thrives on combining its output with the Improved Dense Trajectory (IDT) and extracted with its low-level video descriptors encoded via…
▽ More
In this paper, we revive the use of old-fashioned handcrafted video representations for action recognition and put new life into these techniques via a CNN-based hallucination step. Despite of the use of RGB and optical flow frames, the I3D model (amongst others) thrives on combining its output with the Improved Dense Trajectory (IDT) and extracted with its low-level video descriptors encoded via Bag-of-Words (BoW) and Fisher Vectors (FV). Such a fusion of CNNs and handcrafted representations is time-consuming due to pre-processing, descriptor extraction, encoding and tuning parameters. Thus, we propose an end-to-end trainable network with streams which learn the IDT-based BoW/FV representations at the training stage and are simple to integrate with the I3D model. Specifically, each stream takes I3D feature maps ahead of the last 1D conv. layer and learns to `translate' these maps to BoW/FV representations. Thus, our model can hallucinate and use such synthesized BoW/FV representations at the testing stage. We show that even features of the entire I3D optical flow stream can be hallucinated thus simplifying the pipeline. Our model saves 20-55h of computations and yields state-of-the-art results on four publicly available datasets.
△ Less
Submitted 18 August, 2019; v1 submitted 13 June, 2019;
originally announced June 2019.
-
An extended polygonal finite element method for large deformation fracture analysis
Authors:
Hai D. Huynh,
Phuong Tran,
Xiaoying Zhuang,
H. Nguyen-Xuan
Abstract:
The modeling of large deformation fracture mechanics has been a challenging problem regarding the accuracy of numerical methods and their ability to deal with considerable changes in deformations of meshes where having the presence of cracks. This paper further investigates the extended finite element method (XFEM) for the simulation of large strain fracture for hyper-elastic materials, in particu…
▽ More
The modeling of large deformation fracture mechanics has been a challenging problem regarding the accuracy of numerical methods and their ability to deal with considerable changes in deformations of meshes where having the presence of cracks. This paper further investigates the extended finite element method (XFEM) for the simulation of large strain fracture for hyper-elastic materials, in particular rubber ones. A crucial idea is to use a polygonal mesh to represent space of the present numerical technique in advance, and then a local refinement of structured meshes at the vicinity of the discontinuities is additionally established. Due to differences in the size and type of elements at the boundaries of those two regions, hanging nodes produced in the modified mesh are considered as normal nodes in an arbitrarily polygonal element. Conforming these special elements becomes straightforward by the flexible use of basis functions over polygonal elements. Results of this study are shown through several numerical examples to prove its efficiency and accuracy through comparison with former achievements.
△ Less
Submitted 20 March, 2019; v1 submitted 9 March, 2019;
originally announced March 2019.
-
Large-Scale Video Classification with Feature Space Augmentation coupled with Learned Label Relations and Ensembling
Authors:
Choongyeun Cho,
Benjamin Antin,
Sanchit Arora,
Shwan Ashrafi,
Peilin Duan,
Dang The Huynh,
Lee James,
Hang Tuan Nguyen,
Mojtaba Solgi,
Cuong Van Than
Abstract:
This paper presents the Axon AI's solution to the 2nd YouTube-8M Video Understanding Challenge, achieving the final global average precision (GAP) of 88.733% on the private test set (ranked 3rd among 394 teams, not considering the model size constraint), and 87.287% using a model that meets size requirement. Two sets of 7 individual models belonging to 3 different families were trained separately.…
▽ More
This paper presents the Axon AI's solution to the 2nd YouTube-8M Video Understanding Challenge, achieving the final global average precision (GAP) of 88.733% on the private test set (ranked 3rd among 394 teams, not considering the model size constraint), and 87.287% using a model that meets size requirement. Two sets of 7 individual models belonging to 3 different families were trained separately. Then, the inference results on a training data were aggregated from these multiple models and fed to train a compact model that meets the model size requirement. In order to further improve performance we explored and employed data over/sub-sampling in feature space, an additional regularization term during training exploiting label relationship, and learned weights for ensembling different individual models.
△ Less
Submitted 20 September, 2018;
originally announced September 2018.
-
LiveRank: How to Refresh Old Datasets
Authors:
The Dang Huynh,
Fabien Mathieu,
Laurent Viennot
Abstract:
This paper considers the problem of refreshing a dataset. More precisely , given a collection of nodes gathered at some time (Web pages, users from an online social network) along with some structure (hyperlinks, social relationships), we want to identify a significant fraction of the nodes that still exist at present time. The liveness of an old node can be tested through an online query at prese…
▽ More
This paper considers the problem of refreshing a dataset. More precisely , given a collection of nodes gathered at some time (Web pages, users from an online social network) along with some structure (hyperlinks, social relationships), we want to identify a significant fraction of the nodes that still exist at present time. The liveness of an old node can be tested through an online query at present time. We call LiveRank a ranking of the old pages so that active nodes are more likely to appear first. The quality of a LiveRank is measured by the number of queries necessary to identify a given fraction of the active nodes when using the LiveRank order. We study different scenarios from a static setting where the Liv-eRank is computed before any query is made, to dynamic settings where the LiveRank can be updated as queries are processed. Our results show that building on the PageRank can lead to efficient LiveRanks, for Web graphs as well as for online social networks.
△ Less
Submitted 6 January, 2016;
originally announced January 2016.
-
D-Iteration: diffusion approach for solving PageRank
Authors:
Dohy Hong,
The Dang Huynh,
Fabien Mathieu
Abstract:
In this paper we present a new method that can accelerate the computation of the PageRank importance vector. Our method, called D-Iteration (DI), is based on the decomposition of the matrix-vector product that can be seen as a fluid diffusion model and is potentially adapted to asynchronous implementation. We give theoretical results about the convergence of our algorithm and we show through exper…
▽ More
In this paper we present a new method that can accelerate the computation of the PageRank importance vector. Our method, called D-Iteration (DI), is based on the decomposition of the matrix-vector product that can be seen as a fluid diffusion model and is potentially adapted to asynchronous implementation. We give theoretical results about the convergence of our algorithm and we show through experimentations on a real Web graph that DI can improve the computation efficiency compared to other classical algorithm like Power Iteration, Gauss-Seidel or OPIC.
△ Less
Submitted 6 May, 2015; v1 submitted 26 January, 2015;
originally announced January 2015.
-
Context-Dependent Fine-Grained Entity Type Tagging
Authors:
Dan Gillick,
Nevena Lazic,
Kuzman Ganchev,
Jesse Kirchner,
David Huynh
Abstract:
Entity type tagging is the task of assigning category labels to each mention of an entity in a document. While standard systems focus on a small set of types, recent work (Ling and Weld, 2012) suggests that using a large fine-grained label set can lead to dramatic improvements in downstream tasks. In the absence of labeled training data, existing fine-grained tagging systems obtain examples automa…
▽ More
Entity type tagging is the task of assigning category labels to each mention of an entity in a document. While standard systems focus on a small set of types, recent work (Ling and Weld, 2012) suggests that using a large fine-grained label set can lead to dramatic improvements in downstream tasks. In the absence of labeled training data, existing fine-grained tagging systems obtain examples automatically, using resolved entities and their types extracted from a knowledge base. However, since the appropriate type often depends on context (e.g. Washington could be tagged either as city or government), this procedure can result in spurious labels, leading to poorer generalization. We propose the task of context-dependent fine type tagging, where the set of acceptable labels for a mention is restricted to only those deducible from the local context (e.g. sentence or document). We introduce new resources for this task: 12,017 mentions annotated with their context-dependent fine types, and we provide baseline experimental results on this data.
△ Less
Submitted 1 August, 2016; v1 submitted 3 December, 2014;
originally announced December 2014.
-
Histogram of Oriented Principal Components for Cross-View Action Recognition
Authors:
Hossein Rahmani,
Arif Mahmood,
Du Huynh,
Ajmal Mian
Abstract:
Existing techniques for 3D action recognition are sensitive to viewpoint variations because they extract features from depth images which are viewpoint dependent. In contrast, we directly process pointclouds for cross-view action recognition from unknown and unseen views. We propose the Histogram of Oriented Principal Components (HOPC) descriptor that is robust to noise, viewpoint, scale and actio…
▽ More
Existing techniques for 3D action recognition are sensitive to viewpoint variations because they extract features from depth images which are viewpoint dependent. In contrast, we directly process pointclouds for cross-view action recognition from unknown and unseen views. We propose the Histogram of Oriented Principal Components (HOPC) descriptor that is robust to noise, viewpoint, scale and action speed variations. At a 3D point, HOPC is computed by projecting the three scaled eigenvectors of the pointcloud within its local spatio-temporal support volume onto the vertices of a regular dodecahedron. HOPC is also used for the detection of Spatio-Temporal Keypoints (STK) in 3D pointcloud sequences so that view-invariant STK descriptors (or Local HOPC descriptors) at these key locations only are used for action recognition. We also propose a global descriptor computed from the normalized spatio-temporal distribution of STKs in 4-D, which we refer to as STK-D. We have evaluated the performance of our proposed descriptors against nine existing techniques on two cross-view and three single-view human action recognition datasets. The Experimental results show that our techniques provide significant improvement over state-of-the-art methods.
△ Less
Submitted 3 September, 2015; v1 submitted 23 September, 2014;
originally announced September 2014.
-
Action Classification with Locality-constrained Linear Coding
Authors:
Hossein Rahmani,
Arif Mahmood,
Du Huynh,
Ajmal Mian
Abstract:
We propose an action classification algorithm which uses Locality-constrained Linear Coding (LLC) to capture discriminative information of human body variations in each spatiotemporal subsequence of a video sequence. Our proposed method divides the input video into equally spaced overlapping spatiotemporal subsequences, each of which is decomposed into blocks and then cells. We use the Histogram o…
▽ More
We propose an action classification algorithm which uses Locality-constrained Linear Coding (LLC) to capture discriminative information of human body variations in each spatiotemporal subsequence of a video sequence. Our proposed method divides the input video into equally spaced overlapping spatiotemporal subsequences, each of which is decomposed into blocks and then cells. We use the Histogram of Oriented Gradient (HOG3D) feature to encode the information in each cell. We justify the use of LLC for encoding the block descriptor by demonstrating its superiority over Sparse Coding (SC). Our sequence descriptor is obtained via a logistic regression classifier with L2 regularization. We evaluate and compare our algorithm with ten state-of-the-art algorithms on five benchmark datasets. Experimental results show that, on average, our algorithm gives better accuracy than these ten algorithms.
△ Less
Submitted 22 September, 2014; v1 submitted 17 August, 2014;
originally announced August 2014.