-
A New Split Algorithm for 3D Gaussian Splatting
Authors:
Qiyuan Feng,
Gengchen Cao,
Haoxiang Chen,
Tai-Jiang Mu,
Ralph R. Martin,
Shi-Min Hu
Abstract:
3D Gaussian splatting models, as a novel explicit 3D representation, have been applied in many domains recently, such as explicit geometric editing and geometry generation. Progress has been rapid. However, due to their mixed scales and cluttered shapes, 3D Gaussian splatting models can produce a blurred or needle-like effect near the surface. At the same time, 3D Gaussian splatting models tend to…
▽ More
3D Gaussian splatting models, as a novel explicit 3D representation, have been applied in many domains recently, such as explicit geometric editing and geometry generation. Progress has been rapid. However, due to their mixed scales and cluttered shapes, 3D Gaussian splatting models can produce a blurred or needle-like effect near the surface. At the same time, 3D Gaussian splatting models tend to flatten large untextured regions, yielding a very sparse point cloud. These problems are caused by the non-uniform nature of 3D Gaussian splatting models, so in this paper, we propose a new 3D Gaussian splitting algorithm, which can produce a more uniform and surface-bounded 3D Gaussian splatting model. Our algorithm splits an $N$-dimensional Gaussian into two N-dimensional Gaussians. It ensures consistency of mathematical characteristics and similarity of appearance, allowing resulting 3D Gaussian splatting models to be more uniform and a better fit to the underlying surface, and thus more suitable for explicit editing, point cloud extraction and other tasks. Meanwhile, our 3D Gaussian splitting approach has a very simple closed-form solution, making it readily applicable to any 3D Gaussian model.
△ Less
Submitted 14 March, 2024;
originally announced March 2024.
-
Long Range Pooling for 3D Large-Scale Scene Understanding
Authors:
Xiang-Li Li,
Meng-Hao Guo,
Tai-Jiang Mu,
Ralph R. Martin,
Shi-Min Hu
Abstract:
Inspired by the success of recent vision transformers and large kernel design in convolutional neural networks (CNNs), in this paper, we analyze and explore essential reasons for their success. We claim two factors that are critical for 3D large-scale scene understanding: a larger receptive field and operations with greater non-linearity. The former is responsible for providing long range contexts…
▽ More
Inspired by the success of recent vision transformers and large kernel design in convolutional neural networks (CNNs), in this paper, we analyze and explore essential reasons for their success. We claim two factors that are critical for 3D large-scale scene understanding: a larger receptive field and operations with greater non-linearity. The former is responsible for providing long range contexts and the latter can enhance the capacity of the network. To achieve the above properties, we propose a simple yet effective long range pooling (LRP) module using dilation max pooling, which provides a network with a large adaptive receptive field. LRP has few parameters, and can be readily added to current CNNs. Also, based on LRP, we present an entire network architecture, LRPNet, for 3D understanding. Ablation studies are presented to support our claims, and show that the LRP module achieves better results than large kernel convolution yet with reduced computation, due to its nonlinearity. We also demonstrate the superiority of LRPNet on various benchmarks: LRPNet performs the best on ScanNet and surpasses other CNN-based methods on S3DIS and Matterport3D. Code will be made publicly available.
△ Less
Submitted 17 January, 2023;
originally announced January 2023.
-
Attention Mechanisms in Computer Vision: A Survey
Authors:
Meng-Hao Guo,
Tian-Xing Xu,
Jiang-Jiang Liu,
Zheng-Ning Liu,
Peng-Tao Jiang,
Tai-Jiang Mu,
Song-Hai Zhang,
Ralph R. Martin,
Ming-Ming Cheng,
Shi-Min Hu
Abstract:
Humans can naturally and effectively find salient regions in complex scenes. Motivated by this observation, attention mechanisms were introduced into computer vision with the aim of imitating this aspect of the human visual system. Such an attention mechanism can be regarded as a dynamic weight adjustment process based on features of the input image. Attention mechanisms have achieved great succes…
▽ More
Humans can naturally and effectively find salient regions in complex scenes. Motivated by this observation, attention mechanisms were introduced into computer vision with the aim of imitating this aspect of the human visual system. Such an attention mechanism can be regarded as a dynamic weight adjustment process based on features of the input image. Attention mechanisms have achieved great success in many visual tasks, including image classification, object detection, semantic segmentation, video understanding, image generation, 3D vision, multi-modal tasks and self-supervised learning. In this survey, we provide a comprehensive review of various attention mechanisms in computer vision and categorize them according to approach, such as channel attention, spatial attention, temporal attention and branch attention; a related repository https://github.com/MenghaoGuo/Awesome-Vision-Attentions is dedicated to collecting related work. We also suggest future directions for attention mechanism research.
△ Less
Submitted 15 November, 2021;
originally announced November 2021.
-
Sampling Equivariant Self-attention Networks for Object Detection in Aerial Images
Authors:
Guo-Ye Yang,
Xiang-Li Li,
Ralph R. Martin,
Shi-Min Hu
Abstract:
Objects in aerial images have greater variations in scale and orientation than in typical images, so detection is more difficult. Convolutional neural networks use a variety of frequency- and orientation-specific kernels to identify objects subject to different transformations; these require many parameters. Sampling equivariant networks can adjust sampling from input feature maps according to the…
▽ More
Objects in aerial images have greater variations in scale and orientation than in typical images, so detection is more difficult. Convolutional neural networks use a variety of frequency- and orientation-specific kernels to identify objects subject to different transformations; these require many parameters. Sampling equivariant networks can adjust sampling from input feature maps according to the transformation of the object, allowing a kernel to extract features of an object under different transformations. Doing so requires fewer parameters, and makes the network more suitable for representing deformable objects, like those in aerial images. However, methods like deformable convolutional networks can only provide sampling equivariance under certain circumstances, because of the locations used for sampling. We propose sampling equivariant self-attention networks which consider self-attention restricted to a local image patch as convolution sampling with masks instead of locations, and design a transformation embedding module to further improve the equivariant sampling ability. We also use a novel randomized normalization module to tackle overfitting due to limited aerial image data. We show that our model (i) provides significantly better sampling equivariance than existing methods, without additional supervision, (ii) provides improved classification on ImageNet, and (iii) achieves state-of-the-art results on the DOTA dataset, without increased computation.
△ Less
Submitted 5 November, 2021;
originally announced November 2021.
-
Subdivision-Based Mesh Convolution Networks
Authors:
Shi-Min Hu,
Zheng-Ning Liu,
Meng-Hao Guo,
Jun-Xiong Cai,
Jiahui Huang,
Tai-Jiang Mu,
Ralph R. Martin
Abstract:
Convolutional neural networks (CNNs) have made great breakthroughs in 2D computer vision. However, their irregular structure makes it hard to harness the potential of CNNs directly on meshes. A subdivision surface provides a hierarchical multi-resolution structure, in which each face in a closed 2-manifold triangle mesh is exactly adjacent to three faces. Motivated by these two observations, this…
▽ More
Convolutional neural networks (CNNs) have made great breakthroughs in 2D computer vision. However, their irregular structure makes it hard to harness the potential of CNNs directly on meshes. A subdivision surface provides a hierarchical multi-resolution structure, in which each face in a closed 2-manifold triangle mesh is exactly adjacent to three faces. Motivated by these two observations, this paper presents SubdivNet, an innovative and versatile CNN framework for 3D triangle meshes with Loop subdivision sequence connectivity. Making an analogy between mesh faces and pixels in a 2D image allows us to present a mesh convolution operator to aggregate local features from nearby faces. By exploiting face neighborhoods, this convolution can support standard 2D convolutional network concepts, e.g. variable kernel size, stride, and dilation. Based on the multi-resolution hierarchy, we make use of pooling layers which uniformly merge four faces into one and an upsampling method which splits one face into four. Thereby, many popular 2D CNN architectures can be easily adapted to process 3D meshes. Meshes with arbitrary connectivity can be remeshed to have Loop subdivision sequence connectivity via self-parameterization, making SubdivNet a general approach. Extensive evaluation and various applications demonstrate SubdivNet's effectiveness and efficiency.
△ Less
Submitted 29 December, 2021; v1 submitted 4 June, 2021;
originally announced June 2021.
-
Can Attention Enable MLPs To Catch Up With CNNs?
Authors:
Meng-Hao Guo,
Zheng-Ning Liu,
Tai-Jiang Mu,
Dun Liang,
Ralph R. Martin,
Shi-Min Hu
Abstract:
In the first week of May, 2021, researchers from four different institutions: Google, Tsinghua University, Oxford University and Facebook, shared their latest work [16, 7, 12, 17] on arXiv.org almost at the same time, each proposing new learning architectures, consisting mainly of linear layers, claiming them to be comparable, or even superior to convolutional-based models. This sparked immediate…
▽ More
In the first week of May, 2021, researchers from four different institutions: Google, Tsinghua University, Oxford University and Facebook, shared their latest work [16, 7, 12, 17] on arXiv.org almost at the same time, each proposing new learning architectures, consisting mainly of linear layers, claiming them to be comparable, or even superior to convolutional-based models. This sparked immediate discussion and debate in both academic and industrial communities as to whether MLPs are sufficient, many thinking that learning architectures are returning to MLPs. Is this true? In this perspective, we give a brief history of learning architectures, including multilayer perceptrons (MLPs), convolutional neural networks (CNNs) and transformers. We then examine what the four newly proposed architectures have in common. Finally, we give our views on challenges and directions for new learning architectures, hoping to inspire future research.
△ Less
Submitted 31 May, 2021;
originally announced May 2021.
-
PCT: Point cloud transformer
Authors:
Meng-Hao Guo,
Jun-Xiong Cai,
Zheng-Ning Liu,
Tai-Jiang Mu,
Ralph R. Martin,
Shi-Min Hu
Abstract:
The irregular domain and lack of ordering make it challenging to design deep neural networks for point cloud processing. This paper presents a novel framework named Point Cloud Transformer(PCT) for point cloud learning. PCT is based on Transformer, which achieves huge success in natural language processing and displays great potential in image processing. It is inherently permutation invariant for…
▽ More
The irregular domain and lack of ordering make it challenging to design deep neural networks for point cloud processing. This paper presents a novel framework named Point Cloud Transformer(PCT) for point cloud learning. PCT is based on Transformer, which achieves huge success in natural language processing and displays great potential in image processing. It is inherently permutation invariant for processing a sequence of points, making it well-suited for point cloud learning. To better capture local context within the point cloud, we enhance input embedding with the support of farthest point sampling and nearest neighbor search. Extensive experiments demonstrate that the PCT achieves the state-of-the-art performance on shape classification, part segmentation and normal estimation tasks.
△ Less
Submitted 6 June, 2021; v1 submitted 17 December, 2020;
originally announced December 2020.
-
Shape retrieval of non-rigid 3d human models
Authors:
David Pickup,
Xianfang Sun,
Paul L Rosin,
Ralph R Martin,
Z Cheng,
Zhouhui Lian,
Masaki Aono,
A Ben Hamza,
A Bronstein,
M Bronstein,
S Bu,
Umberto Castellani,
S Cheng,
Valeria Garro,
Andrea Giachetti,
Afzal Godil,
Luca Isaia,
J Han,
Henry Johan,
L Lai,
Bo Li,
C Li,
Haisheng Li,
Roee Litman,
X Liu
, et al. (6 additional authors not shown)
Abstract:
3D models of humans are commonly used within computer graphics and vision, and so the ability to distinguish between body shapes is an important shape retrieval problem. We extend our recent paper which provided a benchmark for testing non-rigid 3D shape retrieval algorithms on 3D human models. This benchmark provided a far stricter challenge than previous shape benchmarks. We have added 145 new m…
▽ More
3D models of humans are commonly used within computer graphics and vision, and so the ability to distinguish between body shapes is an important shape retrieval problem. We extend our recent paper which provided a benchmark for testing non-rigid 3D shape retrieval algorithms on 3D human models. This benchmark provided a far stricter challenge than previous shape benchmarks. We have added 145 new models for use as a separate training set, in order to standardise the training data used and provide a fairer comparison. We have also included experiments with the FAUST dataset of human scans. All participants of the previous benchmark study have taken part in the new tests reported here, many providing updated results using the new data. In addition, further participants have also taken part, and we provide extra analysis of the retrieval results. A total of 25 different shape retrieval methods.
△ Less
Submitted 1 March, 2020;
originally announced March 2020.
-
On difference graphs and the local dimension of posets
Authors:
Jinha Kim,
Ryan R. Martin,
Tomáš Masařík,
Warren Shull,
Heather C. Smith,
Andrew Uzzell,
Zhiyu Wang
Abstract:
The dimension of a partially-ordered set (poset), introduced by Dushnik and Miller (1941), has been studied extensively in the literature. Recently, Ueckerdt (2016) proposed a variation called local dimension which makes use of partial linear extensions. While local dimension is bounded above by dimension, they can be arbitrarily far apart as the dimension of the standard example is $n$ while its…
▽ More
The dimension of a partially-ordered set (poset), introduced by Dushnik and Miller (1941), has been studied extensively in the literature. Recently, Ueckerdt (2016) proposed a variation called local dimension which makes use of partial linear extensions. While local dimension is bounded above by dimension, they can be arbitrarily far apart as the dimension of the standard example is $n$ while its local dimension is only $3$.
Hiraguchi (1955) proved that the maximum dimension of a poset of order $n$ is $n/2$. However, we find a very different result for local dimension, proving a bound of $Θ(n/\log n)$. This follows from connections with covering graphs using difference graphs which are bipartite graphs whose vertices in a single class have nested neighborhoods.
We also prove that the local dimension of the $n$-dimensional Boolean lattice is $Ω(n/\log n)$ and make progress toward resolving a version of the removable pair conjecture for local dimension.
△ Less
Submitted 22 March, 2018;
originally announced March 2018.