-
CRISP: A Framework for Cryo-EM Image Segmentation and Processing with Conditional Random Field
Authors:
Szu-Chi Chung,
Po-Cheng Chou
Abstract:
Differentiating signals from the background in micrographs is a critical initial step for cryogenic electron microscopy (cryo-EM), yet it remains laborious due to low signal-to-noise ratio (SNR), the presence of contaminants and densely packed particles of varying sizes. Although image segmentation has recently been introduced to distinguish particles at the pixel level, the low SNR complicates th…
▽ More
Differentiating signals from the background in micrographs is a critical initial step for cryogenic electron microscopy (cryo-EM), yet it remains laborious due to low signal-to-noise ratio (SNR), the presence of contaminants and densely packed particles of varying sizes. Although image segmentation has recently been introduced to distinguish particles at the pixel level, the low SNR complicates the automated generation of accurate annotations for training supervised models. Moreover, platforms for systematically comparing different design choices in pipeline construction are lacking. Thus, a modular framework is essential to understand the advantages and limitations of this approach and drive further development. To address these challenges, we present a pipeline that automatically generates high-quality segmentation maps from cryo-EM data to serve as ground truth labels. Our modular framework enables the selection of various segmentation models and loss functions. We also integrate Conditional Random Fields (CRFs) with different solvers and feature sets to refine coarse predictions, thereby producing fine-grained segmentation. This flexibility facilitates optimal configurations tailored to cryo-EM datasets. When trained on a limited set of micrographs, our approach achieves over 90% accuracy, recall, precision, Intersection over Union (IoU), and F1-score on synthetic data. Furthermore, to demonstrate our framework's efficacy in downstream analyses, we show that the particles extracted by our pipeline produce 3D density maps with higher resolution than those generated by existing particle pickers on real experimental datasets, while achieving performance comparable to that of manually curated datasets from experts.
△ Less
Submitted 12 February, 2025;
originally announced February 2025.
-
Accelerated AI Inference via Dynamic Execution Methods
Authors:
Haim Barad,
Jascha Achterberg,
Tien Pei Chou,
Jean Yu
Abstract:
In this paper, we focus on Dynamic Execution techniques that optimize the computation flow based on input. This aims to identify simpler problems that can be solved using fewer resources, similar to human cognition. The techniques discussed include early exit from deep networks, speculative sampling for language models, and adaptive steps for diffusion models. Experimental results demonstrate that…
▽ More
In this paper, we focus on Dynamic Execution techniques that optimize the computation flow based on input. This aims to identify simpler problems that can be solved using fewer resources, similar to human cognition. The techniques discussed include early exit from deep networks, speculative sampling for language models, and adaptive steps for diffusion models. Experimental results demonstrate that these dynamic approaches can significantly improve latency and throughput without compromising quality. When combined with model-based optimizations, such as quantization, dynamic execution provides a powerful multi-pronged strategy to optimize AI inference.
Generative AI requires a large amount of compute resources. This is expected to grow, and demand for resources in data centers through to the edge is expected to continue to increase at high rates. We take advantage of existing research and provide additional innovations for some generative optimizations. In the case of LLMs, we provide more efficient sampling methods that depend on the complexity of the data. In the case of diffusion model generation, we provide a new method that also leverages the difficulty of the input prompt to predict an optimal early stopping point.
Therefore, dynamic execution methods are relevant because they add another dimension of performance optimizations. Performance is critical from a competitive point of view, but increasing capacity can result in significant power savings and cost savings. We have provided several integrations of these techniques into several Intel performance libraries and Huggingface Optimum. These integrations will make them easier to use and increase the adoption of these techniques.
△ Less
Submitted 30 October, 2024;
originally announced November 2024.
-
SK-VQA: Synthetic Knowledge Generation at Scale for Training Context-Augmented Multimodal LLMs
Authors:
Xin Su,
Man Luo,
Kris W Pan,
Tien Pei Chou,
Vasudev Lal,
Phillip Howard
Abstract:
Synthetic data generation has gained significant attention recently for its utility in training large vision and language models. However, the application of synthetic data to the training of multimodal context-augmented generation systems has been relatively unexplored. This gap in existing work is important because existing vision and language models (VLMs) are not trained specifically for conte…
▽ More
Synthetic data generation has gained significant attention recently for its utility in training large vision and language models. However, the application of synthetic data to the training of multimodal context-augmented generation systems has been relatively unexplored. This gap in existing work is important because existing vision and language models (VLMs) are not trained specifically for context-augmented generation. Resources for adapting such models are therefore crucial for enabling their use in retrieval-augmented generation (RAG) settings, where a retriever is used to gather relevant information that is then subsequently provided to a generative model via context augmentation. To address this challenging problem, we generate SK-VQA: a large synthetic multimodal dataset containing over 2 million question-answer pairs which require external knowledge to determine the final answer. Our dataset is both larger and significantly more diverse than existing resources of its kind, possessing over 11x more unique questions and containing images from a greater variety of sources than previously-proposed datasets. Through extensive experiments, we demonstrate that our synthetic dataset can not only serve as a challenging benchmark, but is also highly effective for adapting existing generative multimodal models for context-augmented generation.
△ Less
Submitted 27 June, 2024;
originally announced June 2024.
-
Interpretable Lightweight Transformer via Unrolling of Learned Graph Smoothness Priors
Authors:
Tam Thuc Do,
Parham Eftekhar,
Seyed Alireza Hosseini,
Gene Cheung,
Philip Chou
Abstract:
We build interpretable and lightweight transformer-like neural networks by unrolling iterative optimization algorithms that minimize graph smoothness priors -- the quadratic graph Laplacian regularizer (GLR) and the $\ell_1$-norm graph total variation (GTV) -- subject to an interpolation constraint. The crucial insight is that a normalized signal-dependent graph learning module amounts to a varian…
▽ More
We build interpretable and lightweight transformer-like neural networks by unrolling iterative optimization algorithms that minimize graph smoothness priors -- the quadratic graph Laplacian regularizer (GLR) and the $\ell_1$-norm graph total variation (GTV) -- subject to an interpolation constraint. The crucial insight is that a normalized signal-dependent graph learning module amounts to a variant of the basic self-attention mechanism in conventional transformers. Unlike "black-box" transformers that require learning of large key, query and value matrices to compute scaled dot products as affinities and subsequent output embeddings, resulting in huge parameter sets, our unrolled networks employ shallow CNNs to learn low-dimensional features per node to establish pairwise Mahalanobis distances and construct sparse similarity graphs. At each layer, given a learned graph, the target interpolated signal is simply a low-pass filtered output derived from the minimization of an assumed graph smoothness prior, leading to a dramatic reduction in parameter count. Experiments for two image interpolation applications verify the restoration performance, parameter efficiency and robustness to covariate shift of our graph-based unrolled networks compared to conventional transformers.
△ Less
Submitted 5 November, 2024; v1 submitted 6 June, 2024;
originally announced June 2024.
-
MP-PolarMask: A Faster and Finer Instance Segmentation for Concave Images
Authors:
Ke-Lei Wang,
Pin-Hsuan Chou,
Young-Ching Chou,
Chia-Jen Liu,
Cheng-Kuan Lin,
Yu-Chee Tseng
Abstract:
While there are a lot of models for instance segmentation, PolarMask stands out as a unique one that represents an object by a Polar coordinate system. With an anchor-box-free design and a single-stage framework that conducts detection and segmentation at one time, PolarMask is proved to be able to balance efficiency and accuracy. Hence, it can be easily connected with other downstream real-time a…
▽ More
While there are a lot of models for instance segmentation, PolarMask stands out as a unique one that represents an object by a Polar coordinate system. With an anchor-box-free design and a single-stage framework that conducts detection and segmentation at one time, PolarMask is proved to be able to balance efficiency and accuracy. Hence, it can be easily connected with other downstream real-time applications. In this work, we observe that there are two deficiencies associated with PolarMask: (i) inability of representing concave objects and (ii) inefficiency in using ray regression. We propose MP-PolarMask (Multi-Point PolarMask) by taking advantage of multiple Polar systems. The main idea is to extend from one main Polar system to four auxiliary Polar systems, thus capable of representing more complicated convex-and-concave-mixed shapes. We validate MP-PolarMask on both general objects and food objects of the COCO dataset, and the results demonstrate significant improvement of 13.69% in AP_L and 7.23% in AP over PolarMask with 36 rays.
△ Less
Submitted 3 June, 2024;
originally announced June 2024.
-
One-Click Upgrade from 2D to 3D: Sandwiched RGB-D Video Compression for Stereoscopic Teleconferencing
Authors:
Yueyu Hu,
Onur G. Guleryuz,
Philip A. Chou,
Danhang Tang,
Jonathan Taylor,
Rus Maxham,
Yao Wang
Abstract:
Stereoscopic video conferencing is still challenging due to the need to compress stereo RGB-D video in real-time. Though hardware implementations of standard video codecs such as H.264 / AVC and HEVC are widely available, they are not designed for stereoscopic videos and suffer from reduced quality and performance. Specific multiview or 3D extensions of these codecs are complex and lack efficient…
▽ More
Stereoscopic video conferencing is still challenging due to the need to compress stereo RGB-D video in real-time. Though hardware implementations of standard video codecs such as H.264 / AVC and HEVC are widely available, they are not designed for stereoscopic videos and suffer from reduced quality and performance. Specific multiview or 3D extensions of these codecs are complex and lack efficient implementations. In this paper, we propose a new approach to upgrade a 2D video codec to support stereo RGB-D video compression, by wrapping it with a neural pre- and post-processor pair. The neural networks are end-to-end trained with an image codec proxy, and shown to work with a more sophisticated video codec. We also propose a geometry-aware loss function to improve rendering quality. We train the neural pre- and post-processors on a synthetic 4D people dataset, and evaluate it on both synthetic and real-captured stereo RGB-D videos. Experimental results show that the neural networks generalize well to unseen data and work out-of-box with various video codecs. Our approach saves about 30% bit-rate compared to a conventional video coding scheme and MV-HEVC at the same level of rendering quality from a novel view, without the need of a task-specific hardware upgrade.
△ Less
Submitted 15 April, 2024;
originally announced April 2024.
-
Sandwiched Compression: Repurposing Standard Codecs with Neural Network Wrappers
Authors:
Onur G. Guleryuz,
Philip A. Chou,
Berivan Isik,
Hugues Hoppe,
Danhang Tang,
Ruofei Du,
Jonathan Taylor,
Philip Davidson,
Sean Fanello
Abstract:
We propose sandwiching standard image and video codecs between pre- and post-processing neural networks. The networks are jointly trained through a differentiable codec proxy to minimize a given rate-distortion loss. This sandwich architecture not only improves the standard codec's performance on its intended content, but more importantly, adapts the codec to other types of image/video content and…
▽ More
We propose sandwiching standard image and video codecs between pre- and post-processing neural networks. The networks are jointly trained through a differentiable codec proxy to minimize a given rate-distortion loss. This sandwich architecture not only improves the standard codec's performance on its intended content, but more importantly, adapts the codec to other types of image/video content and to other distortion measures. The sandwich learns to transmit ``neural code images'' that optimize and improve overall rate-distortion performance, with the improvements becoming significant especially when the overall problem is well outside of the scope of the codec's design. We apply the sandwich architecture to standard codecs with mismatched sources transporting different numbers of channels, higher resolution, higher dynamic range, computer graphics, and with perceptual distortion measures. The results demonstrate substantial improvements (up to 9 dB gains or up to 30\% bitrate reductions) compared to alternative adaptations. We establish optimality properties for sandwiched compression and design differentiable codec proxies approximating current standard codecs. We further analyze model complexity, visual quality under perceptual metrics, as well as sandwich configurations that offer interesting potentials in video compression and streaming.
△ Less
Submitted 20 February, 2025; v1 submitted 8 February, 2024;
originally announced February 2024.
-
Learned Nonlinear Predictor for Critically Sampled 3D Point Cloud Attribute Compression
Authors:
Tam Thuc Do,
Philip A. Chou,
Gene Cheung
Abstract:
We study 3D point cloud attribute compression via a volumetric approach: assuming point cloud geometry is known at both encoder and decoder, parameters $θ$ of a continuous attribute function $f: \mathbb{R}^3 \mapsto \mathbb{R}$ are quantized to $\hatθ$ and encoded, so that discrete samples $f_{\hatθ}(\mathbf{x}_i)$ can be recovered at known 3D points $\mathbf{x}_i \in \mathbb{R}^3$ at the decoder.…
▽ More
We study 3D point cloud attribute compression via a volumetric approach: assuming point cloud geometry is known at both encoder and decoder, parameters $θ$ of a continuous attribute function $f: \mathbb{R}^3 \mapsto \mathbb{R}$ are quantized to $\hatθ$ and encoded, so that discrete samples $f_{\hatθ}(\mathbf{x}_i)$ can be recovered at known 3D points $\mathbf{x}_i \in \mathbb{R}^3$ at the decoder. Specifically, we consider a nested sequences of function subspaces $\mathcal{F}^{(p)}_{l_0} \subseteq \cdots \subseteq \mathcal{F}^{(p)}_L$, where $\mathcal{F}_l^{(p)}$ is a family of functions spanned by B-spline basis functions of order $p$, $f_l^*$ is the projection of $f$ on $\mathcal{F}_l^{(p)}$ represented as low-pass coefficients $F_l^*$, and $g_l^*$ is the residual function in an orthogonal subspace $\mathcal{G}_l^{(p)}$ (where $\mathcal{G}_l^{(p)} \oplus \mathcal{F}_l^{(p)} = \mathcal{F}_{l+1}^{(p)}$) represented as high-pass coefficients $G_l^*$. In this paper, to improve coding performance over \cite{do2023volumetric}, we study predicting $f_{l+1}^*$ at level $l+1$ given $f_l^*$ at level $l$ and encoding of $G_l^*$ for the $p=1$ case (RAHT($1$)). For the prediction, we formalize RAHT(1) linear prediction in MPEG-PCC in a theoretical framework, and propose a new nonlinear predictor using a polynomial of bilateral filter. We derive equations to efficiently compute the critically sampled high-pass coefficients $G_l^*$ amenable to encoding. We optimize parameters in our resulting feed-forward network on a large training set of point clouds by minimizing a rate-distortion Lagrangian. Experimental results show that our improved framework outperforms the MPEG G-PCC predictor by $11\%$--$12\%$ in bit rate.
△ Less
Submitted 20 September, 2024; v1 submitted 22 November, 2023;
originally announced November 2023.
-
An All Deep System for Badminton Game Analysis
Authors:
Po-Yung Chou,
Yu-Chun Lo,
Bo-Zheng Xie,
Cheng-Hung Lin,
Yu-Yung Kao
Abstract:
The CoachAI Badminton 2023 Track1 initiative aim to automatically detect events within badminton match videos. Detecting small objects, especially the shuttlecock, is of quite importance and demands high precision within the challenge. Such detection is crucial for tasks like hit count, hitting time, and hitting location. However, even after revising the well-regarded shuttlecock detecting model,…
▽ More
The CoachAI Badminton 2023 Track1 initiative aim to automatically detect events within badminton match videos. Detecting small objects, especially the shuttlecock, is of quite importance and demands high precision within the challenge. Such detection is crucial for tasks like hit count, hitting time, and hitting location. However, even after revising the well-regarded shuttlecock detecting model, TrackNet, our object detection models still fall short of the desired accuracy. To address this issue, we've implemented various deep learning methods to tackle the problems arising from noisy detectied data, leveraging diverse data types to improve precision. In this report, we detail the detection model modifications we've made and our approach to the 11 tasks. Notably, our system garnered a score of 0.78 out of 1.0 in the challenge. We have released our source code in Github https://github.com/jean50621/Badminton_Challenge
△ Less
Submitted 14 February, 2024; v1 submitted 24 August, 2023;
originally announced August 2023.
-
Team Intro to AI team8 at CoachAI Badminton Challenge 2023: Advanced ShuttleNet for Shot Predictions
Authors:
Shih-Hong Chen,
Pin-Hsuan Chou,
Yong-Fu Liu,
Chien-An Han
Abstract:
In this paper, our objective is to improve the performance of the existing framework ShuttleNet in predicting badminton shot types and locations by leveraging past strokes. We participated in the CoachAI Badminton Challenge at IJCAI 2023 and achieved significantly better results compared to the baseline. Ultimately, our team achieved the first position in the competition and we made our code avail…
▽ More
In this paper, our objective is to improve the performance of the existing framework ShuttleNet in predicting badminton shot types and locations by leveraging past strokes. We participated in the CoachAI Badminton Challenge at IJCAI 2023 and achieved significantly better results compared to the baseline. Ultimately, our team achieved the first position in the competition and we made our code available.
△ Less
Submitted 25 July, 2023;
originally announced July 2023.
-
Volumetric Attribute Compression for 3D Point Clouds using Feedforward Network with Geometric Attention
Authors:
Tam Thuc Do,
Philip A. Chou,
Gene Cheung
Abstract:
We study 3D point cloud attribute compression using a volumetric approach: given a target volumetric attribute function $f : \mathbb{R}^3 \rightarrow \mathbb{R}$, we quantize and encode parameter vector $θ$ that characterizes $f$ at the encoder, for reconstruction $f_{\hatθ}(\mathbf{x})$ at known 3D points $\mathbf{x}$'s at the decoder. Extending a previous work Region Adaptive Hierarchical Transf…
▽ More
We study 3D point cloud attribute compression using a volumetric approach: given a target volumetric attribute function $f : \mathbb{R}^3 \rightarrow \mathbb{R}$, we quantize and encode parameter vector $θ$ that characterizes $f$ at the encoder, for reconstruction $f_{\hatθ}(\mathbf{x})$ at known 3D points $\mathbf{x}$'s at the decoder. Extending a previous work Region Adaptive Hierarchical Transform (RAHT) that employs piecewise constant functions to span a nested sequence of function spaces, we propose a feedforward linear network that implements higher-order B-spline bases spanning function spaces without eigen-decomposition. Feedforward network architecture means that the system is amenable to end-to-end neural learning. The key to our network is space-varying convolution, similar to a graph operator, whose weights are computed from the known 3D geometry for normalization. We show that the number of layers in the normalization at the encoder is equivalent to the number of terms in a matrix inverse Taylor series. Experimental results on real-world 3D point clouds show up to 2-3 dB gain over RAHT in energy compaction and 20-30% bitrate reduction.
△ Less
Submitted 1 April, 2023;
originally announced April 2023.
-
Sandwiched Video Compression: Efficiently Extending the Reach of Standard Codecs with Neural Wrappers
Authors:
Berivan Isik,
Onur G. Guleryuz,
Danhang Tang,
Jonathan Taylor,
Philip A. Chou
Abstract:
We propose sandwiched video compression -- a video compression system that wraps neural networks around a standard video codec. The sandwich framework consists of a neural pre- and post-processor with a standard video codec between them. The networks are trained jointly to optimize a rate-distortion loss function with the goal of significantly improving over the standard codec in various compressi…
▽ More
We propose sandwiched video compression -- a video compression system that wraps neural networks around a standard video codec. The sandwich framework consists of a neural pre- and post-processor with a standard video codec between them. The networks are trained jointly to optimize a rate-distortion loss function with the goal of significantly improving over the standard codec in various compression scenarios. End-to-end training in this setting requires a differentiable proxy for the standard video codec, which incorporates temporal processing with motion compensation, inter/intra mode decisions, and in-loop filtering. We propose differentiable approximations to key video codec components and demonstrate that, in addition to providing meaningful compression improvements over the standard codec, the neural codes of the sandwich lead to significantly better rate-distortion performance in two important scenarios.When transporting high-resolution video via low-resolution HEVC, the sandwich system obtains 6.5 dB improvements over standard HEVC. More importantly, using the well-known perceptual similarity metric, LPIPS, we observe 30% improvements in rate at the same quality over HEVC. Last but not least, we show that pre- and post-processors formed by very modestly-parameterized, light-weight networks can closely approximate these results.
△ Less
Submitted 5 July, 2023; v1 submitted 20 March, 2023;
originally announced March 2023.
-
Fine-grained Visual Classification with High-temperature Refinement and Background Suppression
Authors:
Po-Yung Chou,
Yu-Yung Kao,
Cheng-Hung Lin
Abstract:
Fine-grained visual classification is a challenging task due to the high similarity between categories and distinct differences among data within one single category. To address the challenges, previous strategies have focused on localizing subtle discrepancies between categories and enhencing the discriminative features in them. However, the background also provides important information that can…
▽ More
Fine-grained visual classification is a challenging task due to the high similarity between categories and distinct differences among data within one single category. To address the challenges, previous strategies have focused on localizing subtle discrepancies between categories and enhencing the discriminative features in them. However, the background also provides important information that can tell the model which features are unnecessary or even harmful for classification, and models that rely too heavily on subtle features may overlook global features and contextual information. In this paper, we propose a novel network called ``High-temperaturE Refinement and Background Suppression'' (HERBS), which consists of two modules, namely, the high-temperature refinement module and the background suppression module, for extracting discriminative features and suppressing background noise, respectively. The high-temperature refinement module allows the model to learn the appropriate feature scales by refining the features map at different scales and improving the learning of diverse features. And, the background suppression module first splits the features map into foreground and background using classification confidence scores and suppresses feature values in low-confidence areas while enhancing discriminative features. The experimental results show that the proposed HERBS effectively fuses features of varying scales, suppresses background noise, discriminative features at appropriate scales for fine-grained visual classification.The proposed method achieves state-of-the-art performance on the CUB-200-2011 and NABirds benchmarks, surpassing 93% accuracy on both datasets. Thus, HERBS presents a promising solution for improving the performance of fine-grained visual classification tasks. code: https://github.com/chou141253/FGVC-HERBS
△ Less
Submitted 24 April, 2023; v1 submitted 11 March, 2023;
originally announced March 2023.
-
Ontological Learning from Weak Labels
Authors:
Larry Tang,
Po Hao Chou,
Yi Yu Zheng,
Ziqian Ge,
Ankit Shah,
Bhiksha Raj
Abstract:
Ontologies encompass a formal representation of knowledge through the definition of concepts or properties of a domain, and the relationships between those concepts. In this work, we seek to investigate whether using this ontological information will improve learning from weakly labeled data, which are easier to collect since it requires only the presence or absence of an event to be known. We use…
▽ More
Ontologies encompass a formal representation of knowledge through the definition of concepts or properties of a domain, and the relationships between those concepts. In this work, we seek to investigate whether using this ontological information will improve learning from weakly labeled data, which are easier to collect since it requires only the presence or absence of an event to be known. We use the AudioSet ontology and dataset, which contains audio clips weakly labeled with the ontology concepts and the ontology providing the "Is A" relations between the concepts. We first re-implemented the model proposed by soundevent_ontology with modification to fit the multi-label scenario and then expand on that idea by using a Graph Convolutional Network (GCN) to model the ontology information to learn the concepts. We find that the baseline Siamese does not perform better by incorporating ontology information in the weak and multi-label scenario, but that the GCN does capture the ontology knowledge better for weak, multi-labeled data. In our experiments, we also investigate how different modules can tolerate noises introduced from weak labels and better incorporate ontology information. Our best Siamese-GCN model achieves mAP=0.45 and AUC=0.87 for lower-level concepts and mAP=0.72 and AUC=0.86 for higher-level concepts, which is an improvement over the baseline Siamese but about the same as our models that do not use ontology information.
△ Less
Submitted 4 March, 2022;
originally announced March 2022.
-
A Novel Plug-in Module for Fine-Grained Visual Classification
Authors:
Po-Yung Chou,
Cheng-Hung Lin,
Wen-Chung Kao
Abstract:
Visual classification can be divided into coarse-grained and fine-grained classification. Coarse-grained classification represents categories with a large degree of dissimilarity, such as the classification of cats and dogs, while fine-grained classification represents classifications with a large degree of similarity, such as cat species, bird species, and the makes or models of vehicles. Unlike…
▽ More
Visual classification can be divided into coarse-grained and fine-grained classification. Coarse-grained classification represents categories with a large degree of dissimilarity, such as the classification of cats and dogs, while fine-grained classification represents classifications with a large degree of similarity, such as cat species, bird species, and the makes or models of vehicles. Unlike coarse-grained visual classification, fine-grained visual classification often requires professional experts to label data, which makes data more expensive. To meet this challenge, many approaches propose to automatically find the most discriminative regions and use local features to provide more precise features. These approaches only require image-level annotations, thereby reducing the cost of annotation. However, most of these methods require two- or multi-stage architectures and cannot be trained end-to-end. Therefore, we propose a novel plug-in module that can be integrated to many common backbones, including CNN-based or Transformer-based networks to provide strongly discriminative regions. The plugin module can output pixel-level feature maps and fuse filtered features to enhance fine-grained visual classification. Experimental results show that the proposed plugin module outperforms state-of-the-art approaches and significantly improves the accuracy to 92.77\% and 92.83\% on CUB200-2011 and NABirds, respectively. We have released our source code in Github https://github.com/chou141253/FGVC-PIM.git.
△ Less
Submitted 8 February, 2022;
originally announced February 2022.
-
In-storage Processing of I/O Intensive Applications on Computational Storage Drives
Authors:
Ali HeydariGorji,
Mahdi Torabzadehkashi,
Siavash Rezaei,
Hossein Bobarshad,
Vladimir Alves,
Pai H. Chou
Abstract:
Computational storage drives (CSD) are solid-state drives (SSD) empowered by general-purpose processors that can perform in-storage processing. They have the potential to improve both performance and energy significantly for big-data analytics by bringing compute to data, thereby eliminating costly data transfer while offering better privacy. In this work, we introduce Solana, the first-ever high-…
▽ More
Computational storage drives (CSD) are solid-state drives (SSD) empowered by general-purpose processors that can perform in-storage processing. They have the potential to improve both performance and energy significantly for big-data analytics by bringing compute to data, thereby eliminating costly data transfer while offering better privacy. In this work, we introduce Solana, the first-ever high-capacity(12-TB) CSD in E1.S form factor, and present an actual prototype for evaluation. To demonstrate the benefits of in-storage processing on CSD, we deploy several natural language processing (NLP) applications on datacenter-grade storage servers comprised of clusters of the Solana. Experimental results show up to 3.1x speedup in processing while reducing the energy consumption and data transfer by 67% and 68%, respectively, compared to regular enterprise SSDs.
△ Less
Submitted 23 December, 2021;
originally announced December 2021.
-
LVAC: Learned Volumetric Attribute Compression for Point Clouds using Coordinate Based Networks
Authors:
Berivan Isik,
Philip A. Chou,
Sung Jin Hwang,
Nick Johnston,
George Toderici
Abstract:
We consider the attributes of a point cloud as samples of a vector-valued volumetric function at discrete positions. To compress the attributes given the positions, we compress the parameters of the volumetric function. We model the volumetric function by tiling space into blocks, and representing the function over each block by shifts of a coordinate-based, or implicit, neural network. Inputs to…
▽ More
We consider the attributes of a point cloud as samples of a vector-valued volumetric function at discrete positions. To compress the attributes given the positions, we compress the parameters of the volumetric function. We model the volumetric function by tiling space into blocks, and representing the function over each block by shifts of a coordinate-based, or implicit, neural network. Inputs to the network include both spatial coordinates and a latent vector per block. We represent the latent vectors using coefficients of the region-adaptive hierarchical transform (RAHT) used in the MPEG geometry-based point cloud codec G-PCC. The coefficients, which are highly compressible, are rate-distortion optimized by back-propagation through a rate-distortion Lagrangian loss in an auto-decoder configuration. The result outperforms RAHT by 2--4 dB. This is the first work to compress volumetric functions represented by local coordinate-based neural networks. As such, we expect it to be applicable beyond point clouds, for example to compression of high-resolution neural radiance fields.
△ Less
Submitted 17 November, 2021;
originally announced November 2021.
-
PyTouch: A Machine Learning Library for Touch Processing
Authors:
Mike Lambeta,
Huazhe Xu,
Jingwei Xu,
Po-Wei Chou,
Shaoxiong Wang,
Trevor Darrell,
Roberto Calandra
Abstract:
With the increased availability of rich tactile sensors, there is an equally proportional need for open-source and integrated software capable of efficiently and effectively processing raw touch measurements into high-level signals that can be used for control and decision-making. In this paper, we present PyTouch -- the first machine learning library dedicated to the processing of touch sensing s…
▽ More
With the increased availability of rich tactile sensors, there is an equally proportional need for open-source and integrated software capable of efficiently and effectively processing raw touch measurements into high-level signals that can be used for control and decision-making. In this paper, we present PyTouch -- the first machine learning library dedicated to the processing of touch sensing signals. PyTouch, is designed to be modular, easy-to-use and provides state-of-the-art touch processing capabilities as a service with the goal of unifying the tactile sensing community by providing a library for building scalable, proven, and performance-validated modules over which applications and research can be built upon. We evaluate PyTouch on real-world data from several tactile sensors on touch processing tasks such as touch detection, slip and object pose estimations. PyTouch is open-sourced at https://github.com/facebookresearch/pytouch .
△ Less
Submitted 26 May, 2021;
originally announced May 2021.
-
3D Scene Compression through Entropy Penalized Neural Representation Functions
Authors:
Thomas Bird,
Johannes Ballé,
Saurabh Singh,
Philip A. Chou
Abstract:
Some forms of novel visual media enable the viewer to explore a 3D scene from arbitrary viewpoints, by interpolating between a discrete set of original views. Compared to 2D imagery, these types of applications require much larger amounts of storage space, which we seek to reduce. Existing approaches for compressing 3D scenes are based on a separation of compression and rendering: each of the orig…
▽ More
Some forms of novel visual media enable the viewer to explore a 3D scene from arbitrary viewpoints, by interpolating between a discrete set of original views. Compared to 2D imagery, these types of applications require much larger amounts of storage space, which we seek to reduce. Existing approaches for compressing 3D scenes are based on a separation of compression and rendering: each of the original views is compressed using traditional 2D image formats; the receiver decompresses the views and then performs the rendering. We unify these steps by directly compressing an implicit representation of the scene, a function that maps spatial coordinates to a radiance vector field, which can then be queried to render arbitrary viewpoints. The function is implemented as a neural network and jointly trained for reconstruction as well as compressibility, in an end-to-end manner, with the use of an entropy penalty on the parameters. Our method significantly outperforms a state-of-the-art conventional approach for scene compression, achieving simultaneously higher quality reconstructions and lower bitrates. Furthermore, we show that the performance at lower bitrates can be improved by jointly representing multiple scenes using a soft form of parameter sharing.
△ Less
Submitted 26 April, 2021;
originally announced April 2021.
-
TACTO: A Fast, Flexible, and Open-source Simulator for High-Resolution Vision-based Tactile Sensors
Authors:
Shaoxiong Wang,
Mike Lambeta,
Po-Wei Chou,
Roberto Calandra
Abstract:
Simulators perform an important role in prototyping, debugging, and benchmarking new advances in robotics and learning for control. Although many physics engines exist, some aspects of the real world are harder than others to simulate. One of the aspects that have so far eluded accurate simulation is touch sensing. To address this gap, we present TACTO - a fast, flexible, and open-source simulator…
▽ More
Simulators perform an important role in prototyping, debugging, and benchmarking new advances in robotics and learning for control. Although many physics engines exist, some aspects of the real world are harder than others to simulate. One of the aspects that have so far eluded accurate simulation is touch sensing. To address this gap, we present TACTO - a fast, flexible, and open-source simulator for vision-based tactile sensors. This simulator allows to render realistic high-resolution touch readings at hundreds of frames per second, and can be easily configured to simulate different vision-based tactile sensors, including DIGIT and OmniTact. In this paper, we detail the principles that drove the implementation of TACTO and how they are reflected in its architecture. We demonstrate TACTO on a perceptual task, by learning to predict grasp stability using touch from 1 million grasps, and on a marble manipulation control task. Moreover, we provide a proof-of-concept that TACTO can be successfully used for Sim2Real applications. We believe that TACTO is a step towards the widespread adoption of touch sensing in robotic applications, and to enable machine learning practitioners interested in multi-modal learning and control. TACTO is open-source at https://github.com/facebookresearch/tacto.
△ Less
Submitted 10 February, 2022; v1 submitted 15 December, 2020;
originally announced December 2020.
-
HyperTune: Dynamic Hyperparameter Tuning For Efficient Distribution of DNN Training Over Heterogeneous Systems
Authors:
Ali HeydariGorji,
Siavash Rezaei,
Mahdi Torabzadehkashi,
Hossein Bobarshad,
Vladimir Alves,
Pai H. Chou
Abstract:
Distributed training is a novel approach to accelerate Deep Neural Networks (DNN) training, but common training libraries fall short of addressing the distributed cases with heterogeneous processors or the cases where the processing nodes get interrupted by other workloads. This paper describes distributed training of DNN on computational storage devices (CSD), which are NAND flash-based, high cap…
▽ More
Distributed training is a novel approach to accelerate Deep Neural Networks (DNN) training, but common training libraries fall short of addressing the distributed cases with heterogeneous processors or the cases where the processing nodes get interrupted by other workloads. This paper describes distributed training of DNN on computational storage devices (CSD), which are NAND flash-based, high capacity data storage with internal processing engines. A CSD-based distributed architecture incorporates the advantages of federated learning in terms of performance scalability, resiliency, and data privacy by eliminating the unnecessary data movement between the storage device and the host processor. The paper also describes Stannis, a DNN training framework that improves on the shortcomings of existing distributed training frameworks by dynamically tuning the training hyperparameters in heterogeneous systems to maintain the maximum overall processing speed in term of processed images per second and energy efficiency. Experimental results on image classification training benchmarks show up to 3.1x improvement in performance and 2.45x reduction in energy consumption when using Stannis plus CSD compare to the generic systems.
△ Less
Submitted 15 July, 2020;
originally announced July 2020.
-
Nonlinear Transform Coding
Authors:
Johannes Ballé,
Philip A. Chou,
David Minnen,
Saurabh Singh,
Nick Johnston,
Eirikur Agustsson,
Sung Jin Hwang,
George Toderici
Abstract:
We review a class of methods that can be collected under the name nonlinear transform coding (NTC), which over the past few years have become competitive with the best linear transform codecs for images, and have superseded them in terms of rate--distortion performance under established perceptual quality metrics such as MS-SSIM. We assess the empirical rate--distortion performance of NTC with the…
▽ More
We review a class of methods that can be collected under the name nonlinear transform coding (NTC), which over the past few years have become competitive with the best linear transform codecs for images, and have superseded them in terms of rate--distortion performance under established perceptual quality metrics such as MS-SSIM. We assess the empirical rate--distortion performance of NTC with the help of simple example sources, for which the optimal performance of a vector quantizer is easier to estimate than with natural data sources. To this end, we introduce a novel variant of entropy-constrained vector quantization. We provide an analysis of various forms of stochastic optimization techniques for NTC models; review architectures of transforms based on artificial neural networks, as well as learned entropy models; and provide a direct comparison of a number of methods to parameterize the rate--distortion trade-off of nonlinear transforms, introducing a simplified one.
△ Less
Submitted 23 October, 2020; v1 submitted 6 July, 2020;
originally announced July 2020.
-
Head-mouse: A simple cursor controller based on optical measurement of head tilt
Authors:
Ali HeydariGorji,
Seyede Mahya Safavi,
Cheng-Ting Lee,
Pai H. Chou
Abstract:
This paper describes a wearable wireless mouse-cursor controller that optically tracks the degree of tilt of the user's head to move the mouse relative distances and therefore the degrees of tilt. The raw data can be processed locally on the wearable device before wirelessly transmitting the mouse-movement reports over Bluetooth Low Energy (BLE) protocol to the host computer; but for exploration o…
▽ More
This paper describes a wearable wireless mouse-cursor controller that optically tracks the degree of tilt of the user's head to move the mouse relative distances and therefore the degrees of tilt. The raw data can be processed locally on the wearable device before wirelessly transmitting the mouse-movement reports over Bluetooth Low Energy (BLE) protocol to the host computer; but for exploration of algorithms, the raw data can also be processed on the host. The use of standard Human-Interface Device (HID) profile enables plug-and-play of the proposed mouse device on modern computers without requiring separate driver installation. It can be used in two different modes to move the cursor, the joystick mode and the direct mapped mode. Experimental results show that this head-controlled mouse to be intuitive and effective in operating the mouse cursor with fine-grained control of the cursor even by untrained users.
△ Less
Submitted 24 June, 2020;
originally announced June 2020.
-
DIGIT: A Novel Design for a Low-Cost Compact High-Resolution Tactile Sensor with Application to In-Hand Manipulation
Authors:
Mike Lambeta,
Po-Wei Chou,
Stephen Tian,
Brian Yang,
Benjamin Maloon,
Victoria Rose Most,
Dave Stroud,
Raymond Santos,
Ahmad Byagowi,
Gregg Kammerer,
Dinesh Jayaraman,
Roberto Calandra
Abstract:
Despite decades of research, general purpose in-hand manipulation remains one of the unsolved challenges of robotics. One of the contributing factors that limit current robotic manipulation systems is the difficulty of precisely sensing contact forces -- sensing and reasoning about contact forces are crucial to accurately control interactions with the environment. As a step towards enabling better…
▽ More
Despite decades of research, general purpose in-hand manipulation remains one of the unsolved challenges of robotics. One of the contributing factors that limit current robotic manipulation systems is the difficulty of precisely sensing contact forces -- sensing and reasoning about contact forces are crucial to accurately control interactions with the environment. As a step towards enabling better robotic manipulation, we introduce DIGIT, an inexpensive, compact, and high-resolution tactile sensor geared towards in-hand manipulation. DIGIT improves upon past vision-based tactile sensors by miniaturizing the form factor to be mountable on multi-fingered hands, and by providing several design improvements that result in an easier, more repeatable manufacturing process, and enhanced reliability. We demonstrate the capabilities of the DIGIT sensor by training deep neural network model-based controllers to manipulate glass marbles in-hand with a multi-finger robotic hand. To provide the robotic community access to reliable and low-cost tactile sensors, we open-source the DIGIT design at https://digit.ml/.
△ Less
Submitted 29 May, 2020;
originally announced May 2020.
-
Deep Implicit Volume Compression
Authors:
Danhang Tang,
Saurabh Singh,
Philip A. Chou,
Christian Haene,
Mingsong Dou,
Sean Fanello,
Jonathan Taylor,
Philip Davidson,
Onur G. Guleryuz,
Yinda Zhang,
Shahram Izadi,
Andrea Tagliasacchi,
Sofien Bouaziz,
Cem Keskin
Abstract:
We describe a novel approach for compressing truncated signed distance fields (TSDF) stored in 3D voxel grids, and their corresponding textures. To compress the TSDF, our method relies on a block-based neural network architecture trained end-to-end, achieving state-of-the-art rate-distortion trade-off. To prevent topological errors, we losslessly compress the signs of the TSDF, which also upper bo…
▽ More
We describe a novel approach for compressing truncated signed distance fields (TSDF) stored in 3D voxel grids, and their corresponding textures. To compress the TSDF, our method relies on a block-based neural network architecture trained end-to-end, achieving state-of-the-art rate-distortion trade-off. To prevent topological errors, we losslessly compress the signs of the TSDF, which also upper bounds the reconstruction error by the voxel size. To compress the corresponding texture, we designed a fast block-based UV parameterization, generating coherent texture maps that can be effectively compressed using existing video compression algorithms. We demonstrate the performance of our algorithms on two 4D performance capture datasets, reducing bitrate by 66% for the same distortion, or alternatively reducing the distortion by 50% for the same bitrate, compared to the state-of-the-art.
△ Less
Submitted 18 May, 2020;
originally announced May 2020.
-
Region adaptive graph fourier transform for 3d point clouds
Authors:
Eduardo Pavez,
Benjamin Girault,
Antonio Ortega,
Philip A. Chou
Abstract:
We introduce the Region Adaptive Graph Fourier Transform (RA-GFT) for compression of 3D point cloud attributes. The RA-GFT is a multiresolution transform, formed by combining spatially localized block transforms. We assume the points are organized by a family of nested partitions represented by a rooted tree. At each resolution level, attributes are processed in clusters using block transforms. Ea…
▽ More
We introduce the Region Adaptive Graph Fourier Transform (RA-GFT) for compression of 3D point cloud attributes. The RA-GFT is a multiresolution transform, formed by combining spatially localized block transforms. We assume the points are organized by a family of nested partitions represented by a rooted tree. At each resolution level, attributes are processed in clusters using block transforms. Each block transform produces a single approximation (DC) coefficient, and various detail (AC) coefficients. The DC coefficients are promoted up the tree to the next (lower resolution) level, where the process can be repeated until reaching the root. Since clusters may have a different numbers of points, each block transform must incorporate the relative importance of each coefficient. For this, we introduce the $\mathbf{Q}$-normalized graph Laplacian, and propose using its eigenvectors as the block transform. The RA-GFT achieves better complexity-performance trade-offs than previous approaches. In particular, it outperforms the Region Adaptive Haar Transform (RAHT) by up to 2.5 dB, with a small complexity overhead.
△ Less
Submitted 27 May, 2020; v1 submitted 3 March, 2020;
originally announced March 2020.
-
STANNIS: Low-Power Acceleration of Deep Neural Network Training Using Computational Storage
Authors:
Ali HeydariGorji,
Mahdi Torabzadehkashi,
Siavash Rezaei,
Hossein Bobarshad,
Vladimir Alves,
Pai H. Chou
Abstract:
This paper proposes a framework for distributed, in-storage training of neural networks on clusters of computational storage devices. Such devices not only contain hardware accelerators but also eliminate data movement between the host and storage, resulting in both improved performance and power savings. More importantly, this in-storage processing style of training ensures that private data neve…
▽ More
This paper proposes a framework for distributed, in-storage training of neural networks on clusters of computational storage devices. Such devices not only contain hardware accelerators but also eliminate data movement between the host and storage, resulting in both improved performance and power savings. More importantly, this in-storage processing style of training ensures that private data never leaves the storage while fully controlling the sharing of public data. Experimental results show up to 2.7x speedup and 69% reduction in energy consumption and no significant loss in accuracy.
△ Less
Submitted 19 February, 2020; v1 submitted 17 February, 2020;
originally announced February 2020.
-
Surface Light Field Compression using a Point Cloud Codec
Authors:
Xiang Zhang,
Philip A. Chou,
Ming-Ting Sun,
Maolong Tang,
Shanshe Wang,
Siwei Ma,
Wen Gao
Abstract:
Light field (LF) representations aim to provide photo-realistic, free-viewpoint viewing experiences. However, the most popular LF representations are images from multiple views. Multi-view image-based representations generally need to restrict the range or degrees of freedom of the viewing experience to what can be interpolated in the image domain, essentially because they lack explicit geometry i…
▽ More
Light field (LF) representations aim to provide photo-realistic, free-viewpoint viewing experiences. However, the most popular LF representations are images from multiple views. Multi-view image-based representations generally need to restrict the range or degrees of freedom of the viewing experience to what can be interpolated in the image domain, essentially because they lack explicit geometry information. We present a new surface light field (SLF) representation based on explicit geometry, and a method for SLF compression. First, we map the multi-view images of a scene onto a 3D geometric point cloud. The color of each point in the point cloud is a function of viewing direction known as a view map. We represent each view map efficiently in a B-Spline wavelet basis. This representation is capable of modeling diverse surface materials and complex lighting conditions in a highly scalable and adaptive manner. The coefficients of the B-Spline wavelet representation are then compressed spatially. To increase the spatial correlation and thus improve compression efficiency, we introduce a smoothing term to make the coefficients more similar across the 3D space. We compress the coefficients spatially using existing point cloud compression (PCC) methods. On the decoder side, the scene is rendered efficiently from any viewing direction by reconstructing the view map at each point. In contrast to multi-view image-based LF approaches, our method supports photo-realistic rendering of real-world scenes from arbitrary viewpoints, i.e., with an unlimited six degrees of freedom (6DOF). In terms of rate and distortion, experimental results show that our method achieves superior performance with lighter decoder complexity compared with a reference image-plus-geometry compression (IGC) scheme, indicating its potential in practical virtual and augmented reality applications.
△ Less
Submitted 28 May, 2018;
originally announced May 2018.
-
Rate-Utility Optimized Streaming of Volumetric Media for Augmented Reality
Authors:
Jounsup Park,
Philip A. Chou,
Jenq-Neng Hwang
Abstract:
Volumetric media, popularly known as holograms, need to be delivered to users using both on-demand and live streaming, for new augmented reality (AR) and virtual reality (VR) experiences. As in video streaming, hologram streaming must support network adaptivity and fast startup, but must also moderate large bandwidths, multiple simultaneously streaming objects, and frequent user interaction, which…
▽ More
Volumetric media, popularly known as holograms, need to be delivered to users using both on-demand and live streaming, for new augmented reality (AR) and virtual reality (VR) experiences. As in video streaming, hologram streaming must support network adaptivity and fast startup, but must also moderate large bandwidths, multiple simultaneously streaming objects, and frequent user interaction, which requires low delay. In this paper, we introduce the first system to our knowledge designed specifically for streaming volumetric media. The system reduces bandwidth by introducing 3D tiles, and culling them or reducing their level of detail depending on their relation to the user's view frustum and distance to the user. Our system reduces latency by introducing a window-based buffer, which in contrast to a queue-based buffer allows insertions near the head of the buffer rather than only at the tail of the buffer, to respond quickly to user interaction. To allocate bits between different tiles across multiple objects, we introduce a simple greedy yet provably optimal algorithm for rate-utility optimization. We introduce utility measures based not only on the underlying quality of the representation, but on the level of detail relative to the user's viewpoint and device resolution. Simulation results show that the proposed algorithm provides superior quality compared to existing video-streaming approaches adapted to hologram streaming, in terms of utility and user experience over variable, throughput-constrained networks.
△ Less
Submitted 25 April, 2018;
originally announced April 2018.
-
Scalable Entity Resolution Using Probabilistic Signatures on Parallel Databases
Authors:
Yuhang Zhang,
Kee Siong Ng,
Michael Walker,
Pauline Chou,
Tania Churchill,
Peter Christen
Abstract:
Accurate and efficient entity resolution is an open challenge of particular relevance to intelligence organisations that collect large datasets from disparate sources with differing levels of quality and standard. Starting from a first-principles formulation of entity resolution, this paper presents a novel Entity Resolution algorithm that introduces a data-driven blocking and record-linkage techn…
▽ More
Accurate and efficient entity resolution is an open challenge of particular relevance to intelligence organisations that collect large datasets from disparate sources with differing levels of quality and standard. Starting from a first-principles formulation of entity resolution, this paper presents a novel Entity Resolution algorithm that introduces a data-driven blocking and record-linkage technique based on the probabilistic identification of entity signatures in data. The scalability and accuracy of the proposed algorithm are evaluated using benchmark datasets and shown to achieve state-of-the-art results. The proposed algorithm can be implemented simply on modern parallel databases, which allows it to be deployed with relative ease in large industrial applications.
△ Less
Submitted 18 March, 2018; v1 submitted 27 December, 2017;
originally announced December 2017.
-
FML-based Dynamic Assessment Agent for Human-Machine Cooperative System on Game of Go
Authors:
Chang-Shing Lee,
Mei-Hui Wang,
Sheng-Chi Yang,
Pi-Hsia Hung,
Su-Wei Lin,
Nan Shuo,
Naoyuki Kubota,
Chun-Hsun Chou,
Ping-Chiang Chou,
Chia-Hsiu Kao
Abstract:
In this paper, we demonstrate the application of Fuzzy Markup Language (FML) to construct an FML-based Dynamic Assessment Agent (FDAA), and we present an FML-based Human-Machine Cooperative System (FHMCS) for the game of Go. The proposed FDAA comprises an intelligent decision-making and learning mechanism, an intelligent game bot, a proximal development agent, and an intelligent agent. The intelli…
▽ More
In this paper, we demonstrate the application of Fuzzy Markup Language (FML) to construct an FML-based Dynamic Assessment Agent (FDAA), and we present an FML-based Human-Machine Cooperative System (FHMCS) for the game of Go. The proposed FDAA comprises an intelligent decision-making and learning mechanism, an intelligent game bot, a proximal development agent, and an intelligent agent. The intelligent game bot is based on the open-source code of Facebook Darkforest, and it features a representational state transfer application programming interface mechanism. The proximal development agent contains a dynamic assessment mechanism, a GoSocket mechanism, and an FML engine with a fuzzy knowledge base and rule base. The intelligent agent contains a GoSocket engine and a summarization agent that is based on the estimated win rate, real-time simulation number, and matching degree of predicted moves. Additionally, the FML for player performance evaluation and linguistic descriptions for game results commentary are presented. We experimentally verify and validate the performance of the FDAA and variants of the FHMCS by testing five games in 2016 and 60 games of Google Master Go, a new version of the AlphaGo program, in January 2017. The experimental results demonstrate that the proposed FDAA can work effectively for Go applications.
△ Less
Submitted 16 July, 2017;
originally announced July 2017.
-
Dynamic Polygon Clouds: Representation and Compression for VR/AR
Authors:
Philip A. Chou,
Eduardo Pavez,
Ricardo L. de Queiroz,
Antonio Ortega
Abstract:
We introduce the {\em polygon cloud}, also known as a polygon set or {\em soup}, as a compressible representation of 3D geometry (including its attributes, such as color texture) intermediate between polygonal meshes and point clouds. Dynamic or time-varying polygon clouds, like dynamic polygonal meshes and dynamic point clouds, can take advantage of temporal redundancy for compression, if certain…
▽ More
We introduce the {\em polygon cloud}, also known as a polygon set or {\em soup}, as a compressible representation of 3D geometry (including its attributes, such as color texture) intermediate between polygonal meshes and point clouds. Dynamic or time-varying polygon clouds, like dynamic polygonal meshes and dynamic point clouds, can take advantage of temporal redundancy for compression, if certain challenges are addressed. In this paper, we propose methods for compressing both static and dynamic polygon clouds, specifically triangle clouds. We compare triangle clouds to both triangle meshes and point clouds in terms of compression, for live captured dynamic colored geometry. We find that triangle clouds can be compressed nearly as well as triangle meshes, while being far more robust to noise and other structures typically found in live captures, which violate the assumption of a smooth surface manifold, such as lines, points, and ragged boundaries. We also find that triangle clouds can be used to compress point clouds with significantly better performance than previously demonstrated point cloud compression methods. In particular, for intra-frame coding of geometry, our method improves upon octree-based intra-frame coding by a factor of 5-10 in bit rate. Inter-frame coding improves this by another factor of 2-5. Overall, our dynamic triangle cloud compression improves over the previous state-of-the-art in dynamic point cloud compression by 33\% or more.
△ Less
Submitted 8 March, 2017; v1 submitted 3 October, 2016;
originally announced October 2016.
-
Detection of money laundering groups using supervised learning in networks
Authors:
David Savage,
Qingmai Wang,
Pauline Chou,
Xiuzhen Zhang,
Xinghuo Yu
Abstract:
Money laundering is a major global problem, enabling criminal organisations to hide their ill-gotten gains and to finance further operations. Prevention of money laundering is seen as a high priority by many governments, however detection of money laundering without prior knowledge of predicate crimes remains a significant challenge. Previous detection systems have tended to focus on individuals,…
▽ More
Money laundering is a major global problem, enabling criminal organisations to hide their ill-gotten gains and to finance further operations. Prevention of money laundering is seen as a high priority by many governments, however detection of money laundering without prior knowledge of predicate crimes remains a significant challenge. Previous detection systems have tended to focus on individuals, considering transaction histories and applying anomaly detection to identify suspicious behaviour. However, money laundering involves groups of collaborating individuals, and evidence of money laundering may only be apparent when the collective behaviour of these groups is considered. In this paper we describe a detection system that is capable of analysing group behaviour, using a combination of network analysis and supervised learning. This system is designed for real-world application and operates on networks consisting of millions of interacting parties. Evaluation of the system using real-world data indicates that suspicious activity is successfully detected. Importantly, the system exhibits a low rate of false positives, and is therefore suitable for use in a live intelligence environment.
△ Less
Submitted 2 August, 2016;
originally announced August 2016.
-
Detection of opinion spam based on anomalous rating deviation
Authors:
David Savage,
Xiuzhen Zhang,
Xinghuo Yu,
Pauline Chou,
Qingmai Wang
Abstract:
The publication of fake reviews by parties with vested interests has become a severe problem for consumers who use online product reviews in their decision making. To counter this problem a number of methods for detecting these fake reviews, termed opinion spam, have been proposed. However, to date, many of these methods focus on analysis of review text, making them unsuitable for many review syst…
▽ More
The publication of fake reviews by parties with vested interests has become a severe problem for consumers who use online product reviews in their decision making. To counter this problem a number of methods for detecting these fake reviews, termed opinion spam, have been proposed. However, to date, many of these methods focus on analysis of review text, making them unsuitable for many review systems where accom-panying text is optional, or not possible. Moreover, these approaches are often computationally expensive, requiring extensive resources to handle text analysis over the scale of data typically involved.
In this paper, we consider opinion spammers manipulation of average ratings for products, focusing on dif-ferences between spammer ratings and the majority opinion of honest reviewers. We propose a lightweight, effective method for detecting opinion spammers based on these differences. This method uses binomial regression to identify reviewers having an anomalous proportion of ratings that deviate from the majority opinion. Experiments on real-world and synthetic data show that our approach is able to successfully iden-tify opinion spammers. Comparison with the current state-of-the-art approach, also based only on ratings, shows that our method is able to achieve similar detection accuracy while removing the need for assump-tions regarding probabilities of spam and non-spam reviews and reducing the heavy computation required for learning.
△ Less
Submitted 1 August, 2016;
originally announced August 2016.
-
Anomaly detection in online social networks
Authors:
David Savage,
Xiuzhen Zhang,
Xinghuo Yu,
Pauline Chou,
Qingmai Wang
Abstract:
Anomalies in online social networks can signify irregular, and often illegal behaviour. Anomalies in online social networks can signify irregular, and often illegal behaviour. Detection of such anomalies has been used to identify malicious individuals, including spammers, sexual predators, and online fraudsters. In this paper we survey existing computational techniques for detecting anomalies in o…
▽ More
Anomalies in online social networks can signify irregular, and often illegal behaviour. Anomalies in online social networks can signify irregular, and often illegal behaviour. Detection of such anomalies has been used to identify malicious individuals, including spammers, sexual predators, and online fraudsters. In this paper we survey existing computational techniques for detecting anomalies in online social networks. We characterise anomalies as being either static or dynamic, and as being labelled or unlabelled, and survey methods for detecting these different types of anomalies. We suggest that the detection of anomalies in online social networks is composed of two sub-processes; the selection and calculation of network features, and the classification of observations from this feature space. In addition, this paper provides an overview of the types of problems that anomaly detection can address and identifies key areas of future research.
△ Less
Submitted 31 July, 2016;
originally announced August 2016.
-
Human vs. Computer Go: Review and Prospect
Authors:
Chang-Shing Lee,
Mei-Hui Wang,
Shi-Jim Yen,
Ting-Han Wei,
I-Chen Wu,
Ping-Chiang Chou,
Chun-Hsun Chou,
Ming-Wan Wang,
Tai-Hsiung Yang
Abstract:
The Google DeepMind challenge match in March 2016 was a historic achievement for computer Go development. This article discusses the development of computational intelligence (CI) and its relative strength in comparison with human intelligence for the game of Go. We first summarize the milestones achieved for computer Go from 1998 to 2016. Then, the computer Go programs that have participated in p…
▽ More
The Google DeepMind challenge match in March 2016 was a historic achievement for computer Go development. This article discusses the development of computational intelligence (CI) and its relative strength in comparison with human intelligence for the game of Go. We first summarize the milestones achieved for computer Go from 1998 to 2016. Then, the computer Go programs that have participated in previous IEEE CIS competitions as well as methods and techniques used in AlphaGo are briefly introduced. Commentaries from three high-level professional Go players on the five AlphaGo versus Lee Sedol games are also included. We conclude that AlphaGo beating Lee Sedol is a huge achievement in artificial intelligence (AI) based largely on CI methods. In the future, powerful computer Go programs such as AlphaGo are expected to be instrumental in promoting Go education and AI real-world applications.
△ Less
Submitted 7 June, 2016;
originally announced June 2016.
-
Graph-based compression of dynamic 3D point cloud sequences
Authors:
Dorina Thanou,
Philip A. Chou,
Pascal Frossard
Abstract:
This paper addresses the problem of compression of 3D point cloud sequences that are characterized by moving 3D positions and color attributes. As temporally successive point cloud frames are similar, motion estimation is key to effective compression of these sequences. It however remains a challenging problem as the point cloud frames have varying numbers of points without explicit correspondence…
▽ More
This paper addresses the problem of compression of 3D point cloud sequences that are characterized by moving 3D positions and color attributes. As temporally successive point cloud frames are similar, motion estimation is key to effective compression of these sequences. It however remains a challenging problem as the point cloud frames have varying numbers of points without explicit correspondence information. We represent the time-varying geometry of these sequences with a set of graphs, and consider 3D positions and color attributes of the points clouds as signals on the vertices of the graphs. We then cast motion estimation as a feature matching problem between successive graphs. The motion is estimated on a sparse set of representative vertices using new spectral graph wavelet descriptors. A dense motion field is eventually interpolated by solving a graph-based regularization problem. The estimated motion is finally used for removing the temporal redundancy in the predictive coding of the 3D positions and the color characteristics of the point cloud sequences. Experimental results demonstrate that our method is able to accurately estimate the motion between consecutive frames. Moreover, motion estimation is shown to bring significant improvement in terms of the overall compression performance of the sequence. To the best of our knowledge, this is the first paper that exploits both the spatial correlation inside each frame (through the graph) and the temporal correlation between the frames (through the motion estimation) to compress the color and the geometry of 3D point cloud sequences in an efficient way.
△ Less
Submitted 19 June, 2015;
originally announced June 2015.
-
Precision Enhancement of 3D Surfaces from Multiple Compressed Depth Maps
Authors:
Pengfei Wan,
Gene Cheung,
Philip A. Chou,
Dinei Florencio,
Cha Zhang,
Oscar C. Au
Abstract:
In texture-plus-depth representation of a 3D scene, depth maps from different camera viewpoints are typically lossily compressed via the classical transform coding / coefficient quantization paradigm. In this paper we propose to reduce distortion of the decoded depth maps due to quantization. The key observation is that depth maps from different viewpoints constitute multiple descriptions (MD) of…
▽ More
In texture-plus-depth representation of a 3D scene, depth maps from different camera viewpoints are typically lossily compressed via the classical transform coding / coefficient quantization paradigm. In this paper we propose to reduce distortion of the decoded depth maps due to quantization. The key observation is that depth maps from different viewpoints constitute multiple descriptions (MD) of the same 3D scene. Considering the MD jointly, we perform a POCS-like iterative procedure to project a reconstructed signal from one depth map to the other and back, so that the converged depth maps have higher precision than the original quantized versions.
△ Less
Submitted 24 February, 2014;
originally announced May 2014.
-
Efficient Parallel Estimation for Markov Random Fields
Authors:
Michael J. Swain,
Lambert E. Wixson,
Paul B. Chou
Abstract:
We present a new, deterministic, distributed MAP estimation algorithm for Markov Random Fields called Local Highest Confidence First (Local HCF). The algorithm has been applied to segmentation problems in computer vision and its performance compared with stochastic algorithms. The experiments show that Local HCF finds better estimates than stochastic algorithms with much less computation.
We present a new, deterministic, distributed MAP estimation algorithm for Markov Random Fields called Local Highest Confidence First (Local HCF). The algorithm has been applied to segmentation problems in computer vision and its performance compared with stochastic algorithms. The experiments show that Local HCF finds better estimates than stochastic algorithms with much less computation.
△ Less
Submitted 27 March, 2013;
originally announced April 2013.
-
A Microcantilever-based Gas Flow Sensor for Flow Rate and Direction Detection
Authors:
Y. -H. Wang,
Tzu-Han Hsueh,
Rong-Hua Ma,
Chia-Yen Lee,
Lung-Ming Fu,
P. -Ch. Chou,
Chien-Hsiung Tsai
Abstract:
The purpose of this paper is to apply characteristics of residual stress that causes cantilever beams to bend for manufacturing a micro-structured gas flow sensor. This study uses a silicon wafer deposited silicon nitride layers, reassembled the gas flow sensor with four cantilever beams that perpendicular to each other and manufactured piezoresistive structure on each micro-cantilever by MEMS t…
▽ More
The purpose of this paper is to apply characteristics of residual stress that causes cantilever beams to bend for manufacturing a micro-structured gas flow sensor. This study uses a silicon wafer deposited silicon nitride layers, reassembled the gas flow sensor with four cantilever beams that perpendicular to each other and manufactured piezoresistive structure on each micro-cantilever by MEMS technologies, respectively. When the cantilever beams are formed after etching the silicon wafer, it bends up a little due to the released residual stress induced in the previous fabrication process. As air flows through the sensor upstream and downstream beam deformation was made, thus the airflow direction can be determined through comparing the resistance variation between different cantilever beams. The flow rate can also be measured by calculating the total resistance variations on the four cantilevers.
△ Less
Submitted 7 May, 2008;
originally announced May 2008.
-
Enhanced Sensing Characteristics in MEMS-based Formaldehyde Gas Sensor
Authors:
Yu-Hsiang Wang,
C. -C. Hsiao,
Chia-Yen Lee,
R. -H. Ma,
Po-Cheng Chou
Abstract:
This study has successfully demonstrated a novel self-heating formaldehyde gas sensor based on a thin film of NiO sensing layer. A new fabrication process has been developed in which the Pt micro heater and electrodes are deposited directly on the substrate and the NiO thin film is deposited above on the micro heater to serve as sensing layer. Pt electrodes are formed below the sensing layer to…
▽ More
This study has successfully demonstrated a novel self-heating formaldehyde gas sensor based on a thin film of NiO sensing layer. A new fabrication process has been developed in which the Pt micro heater and electrodes are deposited directly on the substrate and the NiO thin film is deposited above on the micro heater to serve as sensing layer. Pt electrodes are formed below the sensing layer to measure the electrical conductivity changes caused by formaldehyde oxidation at the oxide surface. Furthermore, the upper sensing layer and NiO/Al2O3 co-sputtering significantly increases the sensitivity of the gas sensor, improves its detection limit capability. The microfabricated formaldehyde gas sensor presented in this study is suitable not only for industrial process monitoring, but also for the detection of formaldehyde concentrations in buildings in order to safeguard human health.
△ Less
Submitted 21 February, 2008;
originally announced February 2008.