-
What can Off-the-Shelves Large Multi-Modal Models do for Dynamic Scene Graph Generation?
Authors:
Xuanming Cui,
Jaiminkumar Ashokbhai Bhoi,
Chionh Wei Peng,
Adriel Kuek,
Ser Nam Lim
Abstract:
Dynamic Scene Graph Generation (DSGG) for videos is a challenging task in computer vision. While existing approaches often focus on sophisticated architectural design and solely use recall during evaluation, we take a closer look at their predicted scene graphs and discover three critical issues with existing DSGG methods: severe precision-recall trade-off, lack of awareness on triplet importance,…
▽ More
Dynamic Scene Graph Generation (DSGG) for videos is a challenging task in computer vision. While existing approaches often focus on sophisticated architectural design and solely use recall during evaluation, we take a closer look at their predicted scene graphs and discover three critical issues with existing DSGG methods: severe precision-recall trade-off, lack of awareness on triplet importance, and inappropriate evaluation protocols. On the other hand, recent advances of Large Multimodal Models (LMMs) have shown great capabilities in video understanding, yet they have not been tested on fine-grained, frame-wise understanding tasks like DSGG. In this work, we conduct the first systematic analysis of Video LMMs for performing DSGG. Without relying on sophisticated architectural design, we show that LMMs with simple decoder-only structure can be turned into State-of-the-Art scene graph generators that effectively overcome the aforementioned issues, while requiring little finetuning (5-10% training data).
△ Less
Submitted 20 March, 2025;
originally announced March 2025.
-
CAMI: A Counselor Agent Supporting Motivational Interviewing through State Inference and Topic Exploration
Authors:
Yizhe Yang,
Palakorn Achananuparp,
Heyan Huang,
Jing Jiang,
Kit Phey Leng,
Nicholas Gabriel Lim,
Cameron Tan Shi Ern,
Ee-peng Lim
Abstract:
Conversational counselor agents have become essential tools for addressing the rising demand for scalable and accessible mental health support. This paper introduces CAMI, a novel automated counselor agent grounded in Motivational Interviewing (MI) -- a client-centered counseling approach designed to address ambivalence and facilitate behavior change. CAMI employs a novel STAR framework, consistin…
▽ More
Conversational counselor agents have become essential tools for addressing the rising demand for scalable and accessible mental health support. This paper introduces CAMI, a novel automated counselor agent grounded in Motivational Interviewing (MI) -- a client-centered counseling approach designed to address ambivalence and facilitate behavior change. CAMI employs a novel STAR framework, consisting of client's state inference, motivation topic exploration, and response generation modules, leveraging large language models (LLMs). These components work together to evoke change talk, aligning with MI principles and improving counseling outcomes for clients from diverse backgrounds. We evaluate CAMI's performance through both automated and manual evaluations, utilizing simulated clients to assess MI skill competency, client's state inference accuracy, topic exploration proficiency, and overall counseling success. Results show that CAMI not only outperforms several state-of-the-art methods but also shows more realistic counselor-like behavior. Additionally, our ablation study underscores the critical roles of state inference and topic exploration in achieving this performance.
△ Less
Submitted 4 February, 2025;
originally announced February 2025.
-
Consistent Client Simulation for Motivational Interviewing-based Counseling
Authors:
Yizhe Yang,
Palakorn Achananuparp,
Heyan Huang,
Jing Jiang,
John Pinto,
Jenny Giam,
Kit Phey Leng,
Nicholas Gabriel Lim,
Cameron Tan Shi Ern,
Ee-peng Lim
Abstract:
Simulating human clients in mental health counseling is crucial for training and evaluating counselors (both human or simulated) in a scalable manner. Nevertheless, past research on client simulation did not focus on complex conversation tasks such as mental health counseling. In these tasks, the challenge is to ensure that the client's actions (i.e., interactions with the counselor) are consisten…
▽ More
Simulating human clients in mental health counseling is crucial for training and evaluating counselors (both human or simulated) in a scalable manner. Nevertheless, past research on client simulation did not focus on complex conversation tasks such as mental health counseling. In these tasks, the challenge is to ensure that the client's actions (i.e., interactions with the counselor) are consistent with with its stipulated profiles and negative behavior settings. In this paper, we propose a novel framework that supports consistent client simulation for mental health counseling. Our framework tracks the mental state of a simulated client, controls its state transitions, and generates for each state behaviors consistent with the client's motivation, beliefs, preferred plan to change, and receptivity. By varying the client profile and receptivity, we demonstrate that consistent simulated clients for different counseling scenarios can be effectively created. Both our automatic and expert evaluations on the generated counseling sessions also show that our client simulation method achieves higher consistency than previous methods.
△ Less
Submitted 4 February, 2025;
originally announced February 2025.
-
Exploring Complex Mental Health Symptoms via Classifying Social Media Data with Explainable LLMs
Authors:
Kexin Chen,
Noelle Lim,
Claire Lee,
Michael Guerzhoy
Abstract:
We propose a pipeline for gaining insights into complex diseases by training LLMs on challenging social media text data classification tasks, obtaining explanations for the classification outputs, and performing qualitative and quantitative analysis on the explanations. We report initial results on predicting, explaining, and systematizing the explanations of predicted reports on mental health con…
▽ More
We propose a pipeline for gaining insights into complex diseases by training LLMs on challenging social media text data classification tasks, obtaining explanations for the classification outputs, and performing qualitative and quantitative analysis on the explanations. We report initial results on predicting, explaining, and systematizing the explanations of predicted reports on mental health concerns in people reporting Lyme disease concerns. We report initial results on predicting future ADHD concerns for people reporting anxiety disorder concerns, and demonstrate preliminary results on visualizing the explanations for predicting that a person with anxiety concerns will in the future have ADHD concerns.
△ Less
Submitted 8 December, 2024;
originally announced December 2024.
-
Innovative Weight Simulation in Virtual Reality Cube Games: A Pseudo-Haptic Approach
Authors:
Woan Ning Lim,
Edric Yi Junn Leong,
Yun Li Lee,
Kian Meng Yap
Abstract:
This paper presents an innovative pseudo-haptic model for weight simulation in virtual reality (VR) environments. By integrating visual feedback with voluntary exerted force through a passive haptic glove, the model creates haptic illusions of weight perception. Two VR cube games were developed to evaluate the model's effectiveness. The first game assesses participants' ability to discriminate rel…
▽ More
This paper presents an innovative pseudo-haptic model for weight simulation in virtual reality (VR) environments. By integrating visual feedback with voluntary exerted force through a passive haptic glove, the model creates haptic illusions of weight perception. Two VR cube games were developed to evaluate the model's effectiveness. The first game assesses participants' ability to discriminate relative weights, while the second evaluates their capability to estimate absolute weights. Twelve participants, aged 18 to 59, tested the games. Results suggest that the pseudo-haptic model is effective for relative weight discrimination tasks and holds potential for various VR applications. Further research with a larger participant group and more complex scenarios is recommended to refine and validate the model.
△ Less
Submitted 7 November, 2024;
originally announced November 2024.
-
Particle Semi-Implicit Variational Inference
Authors:
Jen Ning Lim,
Adam M. Johansen
Abstract:
Semi-implicit variational inference (SIVI) enriches the expressiveness of variational families by utilizing a kernel and a mixing distribution to hierarchically define the variational distribution. Existing SIVI methods parameterize the mixing distribution using implicit distributions, leading to intractable variational densities. As a result, directly maximizing the evidence lower bound (ELBO) is…
▽ More
Semi-implicit variational inference (SIVI) enriches the expressiveness of variational families by utilizing a kernel and a mixing distribution to hierarchically define the variational distribution. Existing SIVI methods parameterize the mixing distribution using implicit distributions, leading to intractable variational densities. As a result, directly maximizing the evidence lower bound (ELBO) is not possible, so they resort to one of the following: optimizing bounds on the ELBO, employing costly inner-loop Markov chain Monte Carlo runs, or solving minimax objectives. In this paper, we propose a novel method for SIVI called Particle Variational Inference (PVI) which employs empirical measures to approximate the optimal mixing distributions characterized as the minimizer of a free energy functional. PVI arises naturally as a particle approximation of a Euclidean--Wasserstein gradient flow and, unlike prior works, it directly optimizes the ELBO whilst making no parametric assumption about the mixing distribution. Our empirical results demonstrate that PVI performs favourably compared to other SIVI methods across various tasks. Moreover, we provide a theoretical analysis of the behaviour of the gradient flow of a related free energy functional: establishing the existence and uniqueness of solutions as well as propagation of chaos results.
△ Less
Submitted 14 January, 2025; v1 submitted 30 June, 2024;
originally announced July 2024.
-
FSViewFusion: Few-Shots View Generation of Novel Objects
Authors:
Rukhshanda Hussain,
Hui Xian Grace Lim,
Borchun Chen,
Mubarak Shah,
Ser Nam Lim
Abstract:
Novel view synthesis has observed tremendous developments since the arrival of NeRFs. However, Nerf models overfit on a single scene, lacking generalization to out of distribution objects. Recently, diffusion models have exhibited remarkable performance on introducing generalization in view synthesis. Inspired by these advancements, we explore the capabilities of a pretrained stable diffusion mode…
▽ More
Novel view synthesis has observed tremendous developments since the arrival of NeRFs. However, Nerf models overfit on a single scene, lacking generalization to out of distribution objects. Recently, diffusion models have exhibited remarkable performance on introducing generalization in view synthesis. Inspired by these advancements, we explore the capabilities of a pretrained stable diffusion model for view synthesis without explicit 3D priors. Specifically, we base our method on a personalized text to image model, Dreambooth, given its strong ability to adapt to specific novel objects with a few shots. Our research reveals two interesting findings. First, we observe that Dreambooth can learn the high level concept of a view, compared to arguably more complex strategies which involve finetuning diffusions on large amounts of multi-view data. Second, we establish that the concept of a view can be disentangled and transferred to a novel object irrespective of the original object's identify from which the views are learnt. Motivated by this, we introduce a learning strategy, FSViewFusion, which inherits a specific view through only one image sample of a single scene, and transfers the knowledge to a novel object, learnt from few shots, using low rank adapters. Through extensive experiments we demonstrate that our method, albeit simple, is efficient in generating reliable view samples for in the wild images. Code and models will be released.
△ Less
Submitted 12 March, 2024; v1 submitted 10 March, 2024;
originally announced March 2024.
-
Detecting a Proxy for Potential Comorbid ADHD in People Reporting Anxiety Symptoms from Social Media Data
Authors:
Claire S. Lee,
Noelle Lim,
Michael Guerzhoy
Abstract:
We present a novel task that can elucidate the connection between anxiety and ADHD; use Transformers to make progress toward solving a task that is not solvable by keyword-based classifiers; and discuss a method for visualization of our classifier illuminating the connection between anxiety and ADHD presentations.
Up to approximately 50% of adults with ADHD may also have an anxiety disorder and…
▽ More
We present a novel task that can elucidate the connection between anxiety and ADHD; use Transformers to make progress toward solving a task that is not solvable by keyword-based classifiers; and discuss a method for visualization of our classifier illuminating the connection between anxiety and ADHD presentations.
Up to approximately 50% of adults with ADHD may also have an anxiety disorder and approximately 30\% of adults with anxiety may also have ADHD. Patients presenting with anxiety may be treated for anxiety without ADHD ever being considered, possibly affecting treatment. We show how data that bears on ADHD that is comorbid with anxiety can be obtained from social media data, and show that Transformers can be used to detect a proxy for possible comorbid ADHD in people with anxiety symptoms.
We collected data from anxiety and ADHD online forums (subreddits). We identified posters who first started posting in the Anxiety subreddit and later started posting in the ADHD subreddit as well. We use this subset of the posters as a proxy for people who presented with anxiety symptoms and then became aware that they might have ADHD. We fine-tune a Transformer architecture-based classifier to classify people who started posting in the Anxiety subreddit and then started posting in the ADHD subreddit vs. people who posted in the Anxiety subreddit without later posting in the ADHD subreddit. We show that a Transformer architecture is capable of achieving reasonable results (76% correct for RoBERTa vs. under 60% correct for the best keyword-based model, both with 50% base rate).
△ Less
Submitted 17 February, 2024;
originally announced March 2024.
-
Momentum Particle Maximum Likelihood
Authors:
Jen Ning Lim,
Juan Kuntz,
Samuel Power,
Adam M. Johansen
Abstract:
Maximum likelihood estimation (MLE) of latent variable models is often recast as the minimization of a free energy functional over an extended space of parameters and probability distributions. This perspective was recently combined with insights from optimal transport to obtain novel particle-based algorithms for fitting latent variable models to data. Drawing inspiration from prior works which i…
▽ More
Maximum likelihood estimation (MLE) of latent variable models is often recast as the minimization of a free energy functional over an extended space of parameters and probability distributions. This perspective was recently combined with insights from optimal transport to obtain novel particle-based algorithms for fitting latent variable models to data. Drawing inspiration from prior works which interpret `momentum-enriched' optimization algorithms as discretizations of ordinary differential equations, we propose an analogous dynamical-systems-inspired approach to minimizing the free energy functional. The result is a dynamical system that blends elements of Nesterov's Accelerated Gradient method, the underdamped Langevin diffusion, and particle methods. Under suitable assumptions, we prove that the continuous-time system minimizes the functional. By discretizing the system, we obtain a practical algorithm for MLE in latent variable models. The algorithm outperforms existing particle methods in numerical experiments and compares favourably with other MLE algorithms.
△ Less
Submitted 4 June, 2024; v1 submitted 12 December, 2023;
originally announced December 2023.
-
Energy Discrepancies: A Score-Independent Loss for Energy-Based Models
Authors:
Tobias Schröder,
Zijing Ou,
Jen Ning Lim,
Yingzhen Li,
Sebastian J. Vollmer,
Andrew B. Duncan
Abstract:
Energy-based models are a simple yet powerful class of probabilistic models, but their widespread adoption has been limited by the computational burden of training them. We propose a novel loss function called Energy Discrepancy (ED) which does not rely on the computation of scores or expensive Markov chain Monte Carlo. We show that ED approaches the explicit score matching and negative log-likeli…
▽ More
Energy-based models are a simple yet powerful class of probabilistic models, but their widespread adoption has been limited by the computational burden of training them. We propose a novel loss function called Energy Discrepancy (ED) which does not rely on the computation of scores or expensive Markov chain Monte Carlo. We show that ED approaches the explicit score matching and negative log-likelihood loss under different limits, effectively interpolating between both. Consequently, minimum ED estimation overcomes the problem of nearsightedness encountered in score-based estimation methods, while also enjoying theoretical guarantees. Through numerical experiments, we demonstrate that ED learns low-dimensional data distributions faster and more accurately than explicit score matching or contrastive divergence. For high-dimensional image data, we describe how the manifold hypothesis puts limitations on our approach and demonstrate the effectiveness of energy discrepancy by training the energy-based model as a prior of a variational decoder model.
△ Less
Submitted 27 November, 2023; v1 submitted 12 July, 2023;
originally announced July 2023.
-
A simple but strong baseline for online continual learning: Repeated Augmented Rehearsal
Authors:
Yaqian Zhang,
Bernhard Pfahringer,
Eibe Frank,
Albert Bifet,
Nick Jin Sean Lim,
Yunzhe Jia
Abstract:
Online continual learning (OCL) aims to train neural networks incrementally from a non-stationary data stream with a single pass through data. Rehearsal-based methods attempt to approximate the observed input distributions over time with a small memory and revisit them later to avoid forgetting. Despite its strong empirical performance, rehearsal methods still suffer from a poor approximation of t…
▽ More
Online continual learning (OCL) aims to train neural networks incrementally from a non-stationary data stream with a single pass through data. Rehearsal-based methods attempt to approximate the observed input distributions over time with a small memory and revisit them later to avoid forgetting. Despite its strong empirical performance, rehearsal methods still suffer from a poor approximation of the loss landscape of past data with memory samples. This paper revisits the rehearsal dynamics in online settings. We provide theoretical insights on the inherent memory overfitting risk from the viewpoint of biased and dynamic empirical risk minimization, and examine the merits and limits of repeated rehearsal. Inspired by our analysis, a simple and intuitive baseline, Repeated Augmented Rehearsal (RAR), is designed to address the underfitting-overfitting dilemma of online rehearsal. Surprisingly, across four rather different OCL benchmarks, this simple baseline outperforms vanilla rehearsal by 9%-17% and also significantly improves state-of-the-art rehearsal-based methods MIR, ASER, and SCR. We also demonstrate that RAR successfully achieves an accurate approximation of the loss landscape of past data and high-loss ridge aversion in its learning trajectory. Extensive ablation studies are conducted to study the interplay between repeated and augmented rehearsal and reinforcement learning (RL) is applied to dynamically adjust the hyperparameters of RAR to balance the stability-plasticity trade-off online. Code is available at https://github.com/YaqianZhang/RepeatedAugmentedRehearsal
△ Less
Submitted 13 November, 2022; v1 submitted 28 September, 2022;
originally announced September 2022.
-
Joint Triplet Loss Learning for Next New POI Recommendation
Authors:
Nicholas Lim,
Bryan Hooi,
See-Kiong Ng,
Yong Liang Goh
Abstract:
Sparsity of the User-POI matrix is a well established problem for next POI recommendation, which hinders effective learning of user preferences. Focusing on a more granular extension of the problem, we propose a Joint Triplet Loss Learning (JTLL) module for the Next New ($N^2$) POI recommendation task, which is more challenging. Our JTLL module first computes additional training samples from the u…
▽ More
Sparsity of the User-POI matrix is a well established problem for next POI recommendation, which hinders effective learning of user preferences. Focusing on a more granular extension of the problem, we propose a Joint Triplet Loss Learning (JTLL) module for the Next New ($N^2$) POI recommendation task, which is more challenging. Our JTLL module first computes additional training samples from the users' historical POI visit sequence, then, a designed triplet loss function is proposed to decrease and increase distances of POI and user embeddings based on their respective relations. Next, the JTLL module is jointly trained with recent approaches to additionally learn unvisited relations for the recommendation task. Experiments conducted on two known real-world LBSN datasets show that our joint training module was able to improve the performances of recent existing works.
△ Less
Submitted 25 September, 2022;
originally announced September 2022.
-
Recursive Deformable Image Registration Network with Mutual Attention
Authors:
Jian-Qing Zheng,
Ziyang Wang,
Baoru Huang,
Ngee Han Lim,
Tonia Vincent,
Bartlomiej W. Papiez
Abstract:
Deformable image registration, estimating the spatial transformation between different images, is an important task in medical imaging. Many previous studies have used learning-based methods for multi-stage registration to perform 3D image registration to improve performance. The performance of the multi-stage approach, however, is limited by the size of the receptive field where complex motion do…
▽ More
Deformable image registration, estimating the spatial transformation between different images, is an important task in medical imaging. Many previous studies have used learning-based methods for multi-stage registration to perform 3D image registration to improve performance. The performance of the multi-stage approach, however, is limited by the size of the receptive field where complex motion does not occur at a single spatial scale. We propose a new registration network combining recursive network architecture and mutual attention mechanism to overcome these limitations. Compared with the state-of-the-art deep learning methods, our network based on the recursive structure achieves the highest accuracy in lung Computed Tomography (CT) data set (Dice score of 92\% and average surface distance of 3.8mm for lungs) and one of the most accurate results in abdominal CT data set with 9 organs of various sizes (Dice score of 55\% and average surface distance of 7.8mm). We also showed that adding 3 recursive networks is sufficient to achieve the state-of-the-art results without a significant increase in the inference time.
△ Less
Submitted 30 June, 2022; v1 submitted 3 June, 2022;
originally announced June 2022.
-
Particle algorithms for maximum likelihood training of latent variable models
Authors:
Juan Kuntz,
Jen Ning Lim,
Adam M. Johansen
Abstract:
(Neal and Hinton, 1998) recast maximum likelihood estimation of any given latent variable model as the minimization of a free energy functional $F$, and the EM algorithm as coordinate descent applied to $F$. Here, we explore alternative ways to optimize the functional. In particular, we identify various gradient flows associated with $F$ and show that their limits coincide with $F$'s stationary po…
▽ More
(Neal and Hinton, 1998) recast maximum likelihood estimation of any given latent variable model as the minimization of a free energy functional $F$, and the EM algorithm as coordinate descent applied to $F$. Here, we explore alternative ways to optimize the functional. In particular, we identify various gradient flows associated with $F$ and show that their limits coincide with $F$'s stationary points. By discretizing the flows, we obtain practical particle-based algorithms for maximum likelihood estimation in broad classes of latent variable models. The novel algorithms scale to high-dimensional settings and perform well in numerical experiments.
△ Less
Submitted 18 February, 2023; v1 submitted 27 April, 2022;
originally announced April 2022.
-
Metamorphic Testing-based Adversarial Attack to Fool Deepfake Detectors
Authors:
Nyee Thoang Lim,
Meng Yi Kuan,
Muxin Pu,
Mei Kuan Lim,
Chun Yong Chong
Abstract:
Deepfakes utilise Artificial Intelligence (AI) techniques to create synthetic media where the likeness of one person is replaced with another. There are growing concerns that deepfakes can be maliciously used to create misleading and harmful digital contents. As deepfakes become more common, there is a dire need for deepfake detection technology to help spot deepfake media. Present deepfake detect…
▽ More
Deepfakes utilise Artificial Intelligence (AI) techniques to create synthetic media where the likeness of one person is replaced with another. There are growing concerns that deepfakes can be maliciously used to create misleading and harmful digital contents. As deepfakes become more common, there is a dire need for deepfake detection technology to help spot deepfake media. Present deepfake detection models are able to achieve outstanding accuracy (>90%). However, most of them are limited to within-dataset scenario, where the same dataset is used for training and testing. Most models do not generalise well enough in cross-dataset scenario, where models are tested on unseen datasets from another source. Furthermore, state-of-the-art deepfake detection models rely on neural network-based classification models that are known to be vulnerable to adversarial attacks. Motivated by the need for a robust deepfake detection model, this study adapts metamorphic testing (MT) principles to help identify potential factors that could influence the robustness of the examined model, while overcoming the test oracle problem in this domain. Metamorphic testing is specifically chosen as the testing technique as it fits our demand to address learning-based system testing with probabilistic outcomes from largely black-box components, based on potentially large input domains. We performed our evaluations on MesoInception-4 and TwoStreamNet models, which are the state-of-the-art deepfake detection models. This study identified makeup application as an adversarial attack that could fool deepfake detectors. Our experimental results demonstrate that both the MesoInception-4 and TwoStreamNet models degrade in their performance by up to 30\% when the input data is perturbed with makeup.
△ Less
Submitted 31 May, 2022; v1 submitted 18 April, 2022;
originally announced April 2022.
-
Task-Agnostic Robust Representation Learning
Authors:
A. Tuan Nguyen,
Ser Nam Lim,
Philip Torr
Abstract:
It has been reported that deep learning models are extremely vulnerable to small but intentionally chosen perturbations of its input. In particular, a deep network, despite its near-optimal accuracy on the clean images, often mis-classifies an image with a worst-case but humanly imperceptible perturbation (so-called adversarial examples). To tackle this problem, a great amount of research has been…
▽ More
It has been reported that deep learning models are extremely vulnerable to small but intentionally chosen perturbations of its input. In particular, a deep network, despite its near-optimal accuracy on the clean images, often mis-classifies an image with a worst-case but humanly imperceptible perturbation (so-called adversarial examples). To tackle this problem, a great amount of research has been done to study the training procedure of a network to improve its robustness. However, most of the research so far has focused on the case of supervised learning. With the increasing popularity of self-supervised learning methods, it is also important to study and improve the robustness of their resulting representation on the downstream tasks. In this paper, we study the problem of robust representation learning with unlabeled data in a task-agnostic manner. Specifically, we first derive an upper bound on the adversarial loss of a prediction model (which is based on the learned representation) on any downstream task, using its loss on the clean data and a robustness regularizer. Moreover, the regularizer is task-independent, thus we propose to minimize it directly during the representation learning phase to make the downstream prediction model more robust. Extensive experiments show that our method achieves preferable adversarial performance compared to relevant baselines.
△ Less
Submitted 14 March, 2022;
originally announced March 2022.
-
Fairness Evaluation in Deepfake Detection Models using Metamorphic Testing
Authors:
Muxin Pu,
Meng Yi Kuan,
Nyee Thoang Lim,
Chun Yong Chong,
Mei Kuan Lim
Abstract:
Fairness of deepfake detectors in the presence of anomalies are not well investigated, especially if those anomalies are more prominent in either male or female subjects. The primary motivation for this work is to evaluate how deepfake detection model behaves under such anomalies. However, due to the black-box nature of deep learning (DL) and artificial intelligence (AI) systems, it is hard to pre…
▽ More
Fairness of deepfake detectors in the presence of anomalies are not well investigated, especially if those anomalies are more prominent in either male or female subjects. The primary motivation for this work is to evaluate how deepfake detection model behaves under such anomalies. However, due to the black-box nature of deep learning (DL) and artificial intelligence (AI) systems, it is hard to predict the performance of a model when the input data is modified. Crucially, if this defect is not addressed properly, it will adversely affect the fairness of the model and result in discrimination of certain sub-population unintentionally. Therefore, the objective of this work is to adopt metamorphic testing to examine the reliability of the selected deepfake detection model, and how the transformation of input variation places influence on the output. We have chosen MesoInception-4, a state-of-the-art deepfake detection model, as the target model and makeup as the anomalies. Makeups are applied through utilizing the Dlib library to obtain the 68 facial landmarks prior to filling in the RGB values. Metamorphic relations are derived based on the notion that realistic perturbations of the input images, such as makeup, involving eyeliners, eyeshadows, blushes, and lipsticks (which are common cosmetic appearance) applied to male and female images, should not alter the output of the model by a huge margin. Furthermore, we narrow down the scope to focus on revealing potential gender biases in DL and AI systems. Specifically, we are interested to examine whether MesoInception-4 model produces unfair decisions, which should be considered as a consequence of robustness issues. The findings from our work have the potential to pave the way for new research directions in the quality assurance and fairness in DL and AI systems.
△ Less
Submitted 13 March, 2022;
originally announced March 2022.
-
Residual Aligner Network
Authors:
Jian-Qing Zheng,
Ziyang Wang,
Baoru Huang,
Ngee Han Lim,
Bartlomiej W. Papiez
Abstract:
Image registration is important for medical imaging, the estimation of the spatial transformation between different images. Many previous studies have used learning-based methods for coarse-to-fine registration to efficiently perform 3D image registration. The coarse-to-fine approach, however, is limited when dealing with the different motions of nearby objects. Here we propose a novel Motion-Awar…
▽ More
Image registration is important for medical imaging, the estimation of the spatial transformation between different images. Many previous studies have used learning-based methods for coarse-to-fine registration to efficiently perform 3D image registration. The coarse-to-fine approach, however, is limited when dealing with the different motions of nearby objects. Here we propose a novel Motion-Aware (MA) structure that captures the different motions in a region. The MA structure incorporates a novel Residual Aligner (RA) module which predicts the multi-head displacement field used to disentangle the different motions of multiple neighbouring objects. Compared with other deep learning methods, the network based on the MA structure and RA module achieve one of the most accurate unsupervised inter-subject registration on the 9 organs of assorted sizes in abdominal CT scans, with the highest-ranked registration of the veins (Dice Similarity Coefficient / Average surface distance: 62\%/4.9mm for the vena cava and 34\%/7.9mm for the portal and splenic vein), with a half-sized structure and more efficient computation. Applied to the segmentation of lungs in chest CT scans, the new network achieves results which were indistinguishable from the best-ranked networks (94\%/3.0mm). Additionally, the theorem on predicted motion pattern and the design of MA structure are validated by further analysis.
△ Less
Submitted 7 March, 2022;
originally announced March 2022.
-
Faking feature importance: A cautionary tale on the use of differentially-private synthetic data
Authors:
Oscar Giles,
Kasra Hosseini,
Grigorios Mingas,
Oliver Strickson,
Louise Bowler,
Camila Rangel Smith,
Harrison Wilde,
Jen Ning Lim,
Bilal Mateen,
Kasun Amarasinghe,
Rayid Ghani,
Alison Heppenstall,
Nik Lomax,
Nick Malleson,
Martin O'Reilly,
Sebastian Vollmerteke
Abstract:
Synthetic datasets are often presented as a silver-bullet solution to the problem of privacy-preserving data publishing. However, for many applications, synthetic data has been shown to have limited utility when used to train predictive models. One promising potential application of these data is in the exploratory phase of the machine learning workflow, which involves understanding, engineering a…
▽ More
Synthetic datasets are often presented as a silver-bullet solution to the problem of privacy-preserving data publishing. However, for many applications, synthetic data has been shown to have limited utility when used to train predictive models. One promising potential application of these data is in the exploratory phase of the machine learning workflow, which involves understanding, engineering and selecting features. This phase often involves considerable time, and depends on the availability of data. There would be substantial value in synthetic data that permitted these steps to be carried out while, for example, data access was being negotiated, or with fewer information governance restrictions. This paper presents an empirical analysis of the agreement between the feature importance obtained from raw and from synthetic data, on a range of artificially generated and real-world datasets (where feature importance represents how useful each feature is when predicting a the outcome). We employ two differentially-private methods to produce synthetic data, and apply various utility measures to quantify the agreement in feature importance as this varies with the level of privacy. Our results indicate that synthetic data can sometimes preserve several representations of the ranking of feature importance in simple settings but their performance is not consistent and depends upon a number of factors. Particular caution should be exercised in more nuanced real-world settings, where synthetic data can lead to differences in ranked feature importance that could alter key modelling decisions. This work has important implications for developing synthetic versions of highly sensitive data sets in fields such as finance and healthcare.
△ Less
Submitted 2 March, 2022;
originally announced March 2022.
-
Combining Reinforcement Learning and Optimal Transport for the Traveling Salesman Problem
Authors:
Yong Liang Goh,
Wee Sun Lee,
Xavier Bresson,
Thomas Laurent,
Nicholas Lim
Abstract:
The traveling salesman problem is a fundamental combinatorial optimization problem with strong exact algorithms. However, as problems scale up, these exact algorithms fail to provide a solution in a reasonable time. To resolve this, current works look at utilizing deep learning to construct reasonable solutions. Such efforts have been very successful, but tend to be slow and compute intensive. Thi…
▽ More
The traveling salesman problem is a fundamental combinatorial optimization problem with strong exact algorithms. However, as problems scale up, these exact algorithms fail to provide a solution in a reasonable time. To resolve this, current works look at utilizing deep learning to construct reasonable solutions. Such efforts have been very successful, but tend to be slow and compute intensive. This paper exemplifies the integration of entropic regularized optimal transport techniques as a layer in a deep reinforcement learning network. We show that we can construct a model capable of learning without supervision and inferences significantly faster than current autoregressive approaches. We also empirically evaluate the benefits of including optimal transport algorithms within deep learning models to enforce assignment constraints during end-to-end training.
△ Less
Submitted 2 March, 2022;
originally announced March 2022.
-
Energy-Based Models for Functional Data using Path Measure Tilting
Authors:
Jen Ning Lim,
Sebastian Vollmer,
Lorenz Wolf,
Andrew Duncan
Abstract:
Energy-Based Models (EBMs) have proven to be a highly effective approach for modelling densities on finite-dimensional spaces. Their ability to incorporate domain-specific choices and constraints into the structure of the model through composition make EBMs an appealing candidate for applications in physics, biology and computer vision and various other fields. Recently, Energy-Based Processes (EB…
▽ More
Energy-Based Models (EBMs) have proven to be a highly effective approach for modelling densities on finite-dimensional spaces. Their ability to incorporate domain-specific choices and constraints into the structure of the model through composition make EBMs an appealing candidate for applications in physics, biology and computer vision and various other fields. Recently, Energy-Based Processes (EBP) for modelling stochastic processes was proposed for \textit{unconditional} exchangeable data (e.g., point clouds). In this work, we present a novel subclass of EBPs, called $\mathcal{F}$-EBM for \textit{conditional} exchangeable data, which is able to learn distributions of functions (such as curves or surfaces) from functional samples evaluated at finitely many points. Two unique challenges arise in the functional context. Firstly, training data is often not evaluated along a fixed set of points. Secondly, steps must be taken to control the behaviour of the model between evaluation points, to mitigate overfitting. The proposed model is an energy based model on function space that is decomposed spectrally, where a Gaussian Process path measure is used to reweight the distribution to capture smoothness properties of the underlying process being modelled. The resulting model has the ability to utilize irregularly sampled training data and can output predictions at any resolution, providing an effective approach to up-scaling functional data. We demonstrate the efficacy of our proposed approach for modelling a range of datasets, including data collected from Standard and Poor's 500 (S\&P) and UK National grid.
△ Less
Submitted 22 February, 2023; v1 submitted 3 February, 2022;
originally announced February 2022.
-
D-Net: Siamese based Network with Mutual Attention for Volume Alignment
Authors:
Jian-Qing Zheng,
Ngee Han Lim,
Bartlomiej W. Papiez
Abstract:
Alignment of contrast and non-contrast-enhanced imaging is essential for the quantification of changes in several biomedical applications. In particular, the extraction of cartilage shape from contrast-enhanced Computed Tomography (CT) of tibiae requires accurate alignment of the bone, currently performed manually. Existing deep learning-based methods for alignment require a common template or are…
▽ More
Alignment of contrast and non-contrast-enhanced imaging is essential for the quantification of changes in several biomedical applications. In particular, the extraction of cartilage shape from contrast-enhanced Computed Tomography (CT) of tibiae requires accurate alignment of the bone, currently performed manually. Existing deep learning-based methods for alignment require a common template or are limited in rotation range. Therefore, we present a novel network, D-net, to estimate arbitrary rotation and translation between 3D CT scans that additionally does not require a prior standard template. D-net is an extension to the branched Siamese encoder-decoder structure connected by new mutual non-local links, which efficiently capture long-range connections of similar features between two branches. The 3D supervised network is trained and validated using preclinical CT scans of mouse tibiae with and without contrast enhancement in cartilage. The presented results show a significant improvement in the estimation of CT alignment, outperforming the current comparable methods.
△ Less
Submitted 25 January, 2021;
originally announced January 2021.
-
Origin-Aware Next Destination Recommendation with Personalized Preference Attention
Authors:
Nicholas Lim,
Bryan Hooi,
See-Kiong Ng,
Xueou Wang,
Yong Liang Goh,
Renrong Weng,
Rui Tan
Abstract:
Next destination recommendation is an important task in the transportation domain of taxi and ride-hailing services, where users are recommended with personalized destinations given their current origin location. However, recent recommendation works do not satisfy this origin-awareness property, and only consider learning from historical destination locations, without origin information. Thus, the…
▽ More
Next destination recommendation is an important task in the transportation domain of taxi and ride-hailing services, where users are recommended with personalized destinations given their current origin location. However, recent recommendation works do not satisfy this origin-awareness property, and only consider learning from historical destination locations, without origin information. Thus, the resulting approaches are unable to learn and predict origin-aware recommendations based on the user's current location, leading to sub-optimal performance and poor real-world practicality. Hence, in this work, we study the origin-aware next destination recommendation task. We propose the Spatial-Temporal Origin-Destination Personalized Preference Attention (STOD-PPA) encoder-decoder model to learn origin-origin (OO), destination-destination (DD), and origin-destination (OD) relationships by first encoding both origin and destination sequences with spatial and temporal factors in local and global views, then decoding them through personalized preference attention to predict the next destination. Experimental results on seven real-world user trajectory taxi datasets show that our model significantly outperforms baseline and state-of-the-art methods.
△ Less
Submitted 11 January, 2021; v1 submitted 3 December, 2020;
originally announced December 2020.
-
STP-UDGAT: Spatial-Temporal-Preference User Dimensional Graph Attention Network for Next POI Recommendation
Authors:
Nicholas Lim,
Bryan Hooi,
See-Kiong Ng,
Xueou Wang,
Yong Liang Goh,
Renrong Weng,
Jagannadan Varadarajan
Abstract:
Next Point-of-Interest (POI) recommendation is a longstanding problem across the domains of Location-Based Social Networks (LBSN) and transportation. Recent Recurrent Neural Network (RNN) based approaches learn POI-POI relationships in a local view based on independent user visit sequences. This limits the model's ability to directly connect and learn across users in a global view to recommend sem…
▽ More
Next Point-of-Interest (POI) recommendation is a longstanding problem across the domains of Location-Based Social Networks (LBSN) and transportation. Recent Recurrent Neural Network (RNN) based approaches learn POI-POI relationships in a local view based on independent user visit sequences. This limits the model's ability to directly connect and learn across users in a global view to recommend semantically trained POIs. In this work, we propose a Spatial-Temporal-Preference User Dimensional Graph Attention Network (STP-UDGAT), a novel explore-exploit model that concurrently exploits personalized user preferences and explores new POIs in global spatial-temporal-preference (STP) neighbourhoods, while allowing users to selectively learn from other users. In addition, we propose random walks as a masked self-attention option to leverage the STP graphs' structures and find new higher-order POI neighbours during exploration. Experimental results on six real-world datasets show that our model significantly outperforms baseline and state-of-the-art methods.
△ Less
Submitted 6 October, 2020;
originally announced October 2020.
-
Bayesian optimization for backpropagation in Monte-Carlo tree search
Authors:
Yueqin Li,
Nengli Lim
Abstract:
In large domains, Monte-Carlo tree search (MCTS) is required to estimate the values of the states as efficiently and accurately as possible. However, the standard update rule in backpropagation assumes a stationary distribution for the returns, and particularly in min-max trees, convergence to the true value can be slow because of averaging. We present two methods, Softmax MCTS and Monotone MCTS,…
▽ More
In large domains, Monte-Carlo tree search (MCTS) is required to estimate the values of the states as efficiently and accurately as possible. However, the standard update rule in backpropagation assumes a stationary distribution for the returns, and particularly in min-max trees, convergence to the true value can be slow because of averaging. We present two methods, Softmax MCTS and Monotone MCTS, which generalize previous attempts to improve upon the backpropagation strategy. We demonstrate that both methods reduce to finding optimal monotone functions, which we do so by performing Bayesian optimization with a Gaussian process (GP) prior. We conduct experiments on computer Go, where the returns are given by a deep value neural network, and show that our proposed framework outperforms previous methods.
△ Less
Submitted 25 January, 2020;
originally announced January 2020.
-
Disentangling Multiple Features in Video Sequences using Gaussian Processes in Variational Autoencoders
Authors:
Sarthak Bhagat,
Shagun Uppal,
Zhuyun Yin,
Nengli Lim
Abstract:
We introduce MGP-VAE (Multi-disentangled-features Gaussian Processes Variational AutoEncoder), a variational autoencoder which uses Gaussian processes (GP) to model the latent space for the unsupervised learning of disentangled representations in video sequences. We improve upon previous work by establishing a framework by which multiple features, static or dynamic, can be disentangled. Specifical…
▽ More
We introduce MGP-VAE (Multi-disentangled-features Gaussian Processes Variational AutoEncoder), a variational autoencoder which uses Gaussian processes (GP) to model the latent space for the unsupervised learning of disentangled representations in video sequences. We improve upon previous work by establishing a framework by which multiple features, static or dynamic, can be disentangled. Specifically we use fractional Brownian motions (fBM) and Brownian bridges (BB) to enforce an inter-frame correlation structure in each independent channel, and show that varying this structure enables one to capture different factors of variation in the data. We demonstrate the quality of our representations with experiments on three publicly available datasets, and also quantify the improvement using a video prediction task. Moreover, we introduce a novel geodesic loss function which takes into account the curvature of the data manifold to improve learning. Our experiments show that the combination of the improved representations with the novel loss function enable MGP-VAE to outperform the baselines in video prediction.
△ Less
Submitted 19 July, 2020; v1 submitted 8 January, 2020;
originally announced January 2020.
-
Kernel Stein Tests for Multiple Model Comparison
Authors:
Jen Ning Lim,
Makoto Yamada,
Bernhard Schölkopf,
Wittawat Jitkrittum
Abstract:
We address the problem of non-parametric multiple model comparison: given $l$ candidate models, decide whether each candidate is as good as the best one(s) or worse than it. We propose two statistical tests, each controlling a different notion of decision errors. The first test, building on the post selection inference framework, provably controls the number of best models that are wrongly declare…
▽ More
We address the problem of non-parametric multiple model comparison: given $l$ candidate models, decide whether each candidate is as good as the best one(s) or worse than it. We propose two statistical tests, each controlling a different notion of decision errors. The first test, building on the post selection inference framework, provably controls the number of best models that are wrongly declared worse (false positive rate). The second test is based on multiple correction, and controls the proportion of the models declared worse but are in fact as good as the best (false discovery rate). We prove that under appropriate conditions the first test can yield a higher true positive rate than the second. Experimental results on toy and real (CelebA, Chicago Crime data) problems show that the two tests have high true positive rates with well-controlled error rates. By contrast, the naive approach of choosing the model with the lowest score without correction leads to more false positives.
△ Less
Submitted 27 October, 2019;
originally announced October 2019.
-
More Powerful Selective Kernel Tests for Feature Selection
Authors:
Jen Ning Lim,
Makoto Yamada,
Wittawat Jitkrittum,
Yoshikazu Terada,
Shigeyuki Matsui,
Hidetoshi Shimodaira
Abstract:
Refining one's hypotheses in the light of data is a common scientific practice; however, the dependency on the data introduces selection bias and can lead to specious statistical analysis. An approach for addressing this is via conditioning on the selection procedure to account for how we have used the data to generate our hypotheses, and prevent information to be used again after selection. Many…
▽ More
Refining one's hypotheses in the light of data is a common scientific practice; however, the dependency on the data introduces selection bias and can lead to specious statistical analysis. An approach for addressing this is via conditioning on the selection procedure to account for how we have used the data to generate our hypotheses, and prevent information to be used again after selection. Many selective inference (a.k.a. post-selection inference) algorithms typically take this approach but will "over-condition" for sake of tractability. While this practice yields well calibrated statistic tests with controlled false positive rates (FPR), it can incur a major loss in power. In our work, we extend two recent proposals for selecting features using the Maximum Mean Discrepancy and Hilbert Schmidt Independence Criterion to condition on the minimal conditioning event. We show how recent advances in multiscale bootstrap makes conditioning on the minimal selection event possible and demonstrate our proposal over a range of synthetic and real world experiments. Our results show that our proposed test is indeed more powerful in most scenarios.
△ Less
Submitted 29 February, 2020; v1 submitted 14 October, 2019;
originally announced October 2019.
-
Generate, Segment and Refine: Towards Generic Manipulation Segmentation
Authors:
Peng Zhou,
Bor-Chun Chen,
Xintong Han,
Mahyar Najibi,
Abhinav Shrivastava,
Ser Nam Lim,
Larry S. Davis
Abstract:
Detecting manipulated images has become a significant emerging challenge. The advent of image sharing platforms and the easy availability of advanced photo editing software have resulted in a large quantities of manipulated images being shared on the internet. While the intent behind such manipulations varies widely, concerns on the spread of fake news and misinformation is growing. Current state…
▽ More
Detecting manipulated images has become a significant emerging challenge. The advent of image sharing platforms and the easy availability of advanced photo editing software have resulted in a large quantities of manipulated images being shared on the internet. While the intent behind such manipulations varies widely, concerns on the spread of fake news and misinformation is growing. Current state of the art methods for detecting these manipulated images suffers from the lack of training data due to the laborious labeling process. We address this problem in this paper, for which we introduce a manipulated image generation process that creates true positives using currently available datasets. Drawing from traditional work on image blending, we propose a novel generator for creating such examples. In addition, we also propose to further create examples that force the algorithm to focus on boundary artifacts during training. Strong experimental results validate our proposal.
△ Less
Submitted 11 September, 2019; v1 submitted 23 November, 2018;
originally announced November 2018.
-
DCAN: Dual Channel-wise Alignment Networks for Unsupervised Scene Adaptation
Authors:
Zuxuan Wu,
Xintong Han,
Yen-Liang Lin,
Mustafa Gkhan Uzunbas,
Tom Goldstein,
Ser Nam Lim,
Larry S. Davis
Abstract:
Harvesting dense pixel-level annotations to train deep neural networks for semantic segmentation is extremely expensive and unwieldy at scale. While learning from synthetic data where labels are readily available sounds promising, performance degrades significantly when testing on novel realistic data due to domain discrepancies. We present Dual Channel-wise Alignment Networks (DCAN), a simple yet…
▽ More
Harvesting dense pixel-level annotations to train deep neural networks for semantic segmentation is extremely expensive and unwieldy at scale. While learning from synthetic data where labels are readily available sounds promising, performance degrades significantly when testing on novel realistic data due to domain discrepancies. We present Dual Channel-wise Alignment Networks (DCAN), a simple yet effective approach to reduce domain shift at both pixel-level and feature-level. Exploring statistics in each channel of CNN feature maps, our framework performs channel-wise feature alignment, which preserves spatial structures and semantic information, in both an image generator and a segmentation network. In particular, given an image from the source domain and unlabeled samples from the target domain, the generator synthesizes new images on-the-fly to resemble samples from the target domain in appearance and the segmentation network further refines high-level features before predicting semantic maps, both of which leverage feature statistics of sampled images from the target domain. Unlike much recent and concurrent work relying on adversarial training, our framework is lightweight and easy to train. Extensive experiments on adapting models trained on synthetic segmentation benchmarks to real urban scenes demonstrate the effectiveness of the proposed framework.
△ Less
Submitted 16 April, 2018;
originally announced April 2018.
-
Learning from Synthetic Data: Addressing Domain Shift for Semantic Segmentation
Authors:
Swami Sankaranarayanan,
Yogesh Balaji,
Arpit Jain,
Ser Nam Lim,
Rama Chellappa
Abstract:
Visual Domain Adaptation is a problem of immense importance in computer vision. Previous approaches showcase the inability of even deep neural networks to learn informative representations across domain shift. This problem is more severe for tasks where acquiring hand labeled data is extremely hard and tedious. In this work, we focus on adapting the representations learned by segmentation networks…
▽ More
Visual Domain Adaptation is a problem of immense importance in computer vision. Previous approaches showcase the inability of even deep neural networks to learn informative representations across domain shift. This problem is more severe for tasks where acquiring hand labeled data is extremely hard and tedious. In this work, we focus on adapting the representations learned by segmentation networks across synthetic and real domains. Contrary to previous approaches that use a simple adversarial objective or superpixel information to aid the process, we propose an approach based on Generative Adversarial Networks (GANs) that brings the embeddings closer in the learned feature space. To showcase the generality and scalability of our approach, we show that we can achieve state of the art results on two challenging scenarios of synthetic to real domain adaptation. Additional exploratory experiments show that our approach: (1) generalizes to unseen domains and (2) results in improved alignment of source and target distributions.
△ Less
Submitted 1 April, 2018; v1 submitted 19 November, 2017;
originally announced November 2017.
-
Regularizing deep networks using efficient layerwise adversarial training
Authors:
Swami Sankaranarayanan,
Arpit Jain,
Rama Chellappa,
Ser Nam Lim
Abstract:
Adversarial training has been shown to regularize deep neural networks in addition to increasing their robustness to adversarial examples. However, its impact on very deep state of the art networks has not been fully investigated. In this paper, we present an efficient approach to perform adversarial training by perturbing intermediate layer activations and study the use of such perturbations as a…
▽ More
Adversarial training has been shown to regularize deep neural networks in addition to increasing their robustness to adversarial examples. However, its impact on very deep state of the art networks has not been fully investigated. In this paper, we present an efficient approach to perform adversarial training by perturbing intermediate layer activations and study the use of such perturbations as a regularizer during training. We use these perturbations to train very deep models such as ResNets and show improvement in performance both on adversarial and original test data. Our experiments highlight the benefits of perturbing intermediate layer activations compared to perturbing only the inputs. The results on CIFAR-10 and CIFAR-100 datasets show the merits of the proposed adversarial training approach. Additional results on WideResNets show that our approach provides significant improvement in classification accuracy for a given base model, outperforming dropout and other base models of larger size.
△ Less
Submitted 28 May, 2018; v1 submitted 22 May, 2017;
originally announced May 2017.
-
Self corrective Perturbations for Semantic Segmentation and Classification
Authors:
Swami Sankaranarayanan,
Arpit Jain,
Ser Nam Lim
Abstract:
Convolutional Neural Networks have been a subject of great importance over the past decade and great strides have been made in their utility for producing state of the art performance in many computer vision problems. However, the behavior of deep networks is yet to be fully understood and is still an active area of research. In this work, we present an intriguing behavior: pre-trained CNNs can be…
▽ More
Convolutional Neural Networks have been a subject of great importance over the past decade and great strides have been made in their utility for producing state of the art performance in many computer vision problems. However, the behavior of deep networks is yet to be fully understood and is still an active area of research. In this work, we present an intriguing behavior: pre-trained CNNs can be made to improve their predictions by structurally perturbing the input. We observe that these perturbations - referred as Guided Perturbations - enable a trained network to improve its prediction performance without any learning or change in network weights. We perform various ablative experiments to understand how these perturbations affect the local context and feature representations. Furthermore, we demonstrate that this idea can improve performance of several existing approaches on semantic segmentation and scene labeling tasks on the PASCAL VOC dataset and supervised classification tasks on MNIST and CIFAR10 datasets.
△ Less
Submitted 3 August, 2017; v1 submitted 23 March, 2017;
originally announced March 2017.
-
A Reinforcement Learning Approach to the View Planning Problem
Authors:
Mustafa Devrim Kaba,
Mustafa Gokhan Uzunbas,
Ser Nam Lim
Abstract:
We present a Reinforcement Learning (RL) solution to the view planning problem (VPP), which generates a sequence of view points that are capable of sensing all accessible area of a given object represented as a 3D model. In doing so, the goal is to minimize the number of view points, making the VPP a class of set covering optimization problem (SCOP). The SCOP is NP-hard, and the inapproximability…
▽ More
We present a Reinforcement Learning (RL) solution to the view planning problem (VPP), which generates a sequence of view points that are capable of sensing all accessible area of a given object represented as a 3D model. In doing so, the goal is to minimize the number of view points, making the VPP a class of set covering optimization problem (SCOP). The SCOP is NP-hard, and the inapproximability results tell us that the greedy algorithm provides the best approximation that runs in polynomial time. In order to find a solution that is better than the greedy algorithm, (i) we introduce a novel score function by exploiting the geometry of the 3D model, (ii) we model an intuitive human approach to VPP using this score function, and (iii) we cast VPP as a Markovian Decision Process (MDP), and solve the MDP in RL framework using well-known RL algorithms. In particular, we use SARSA, Watkins-Q and TD with function approximation to solve the MDP. We compare the results of our method with the baseline greedy algorithm in an extensive set of test objects, and show that we can out-perform the baseline in almost all cases.
△ Less
Submitted 18 November, 2016; v1 submitted 19 October, 2016;
originally announced October 2016.
-
Development of a Java Package for Matrix Programming
Authors:
Ngee-Peng Lim,
Maurice HT Ling,
Shawn YC Lim,
Ji-Hee Choi,
Henry BK Teo
Abstract:
We had assembled a Java package, known as MatrixPak, of four classes for the purpose of numerical matrix computation. The classes are matrix, matrix_operations, StrToMatrix, and MatrixToStr; all of which are inherited from java.lang.Object class. Class matrix defines a matrix as a two-dimensional array of float types, and contains the following mathematical methods: transpose, adjoint, determina…
▽ More
We had assembled a Java package, known as MatrixPak, of four classes for the purpose of numerical matrix computation. The classes are matrix, matrix_operations, StrToMatrix, and MatrixToStr; all of which are inherited from java.lang.Object class. Class matrix defines a matrix as a two-dimensional array of float types, and contains the following mathematical methods: transpose, adjoint, determinant, inverse, minor and cofactor. Class matrix_operations contains the following mathematical methods: matrix addition, matrix subtraction, matrix multiplication, and matrix exponential. Class StrToMatrix contains methods necessary to parse a string representation (for example, [[2 3 4]-[5 6 7]]) of a matrix into a matrix definition, whereas class MatrixToStr does the reverse.
△ Less
Submitted 24 June, 2003;
originally announced June 2003.