Search | arXiv e-print repository

HSM: Hierarchical Scene Motifs for Multi-Scale Indoor Scene Generation

Authors: Hou In Derek Pun, Hou In Ivan Tam, Austin T. Wang, Xiaoliang Huo, Angel X. Chang, Manolis Savva

Abstract: Despite advances in indoor 3D scene layout generation, synthesizing scenes with dense object arrangements remains challenging. Existing methods primarily focus on large furniture while neglecting smaller objects, resulting in unrealistically empty scenes. Those that place small objects typically do not honor arrangement specifications, resulting in largely random placement not following the text d… ▽ More Despite advances in indoor 3D scene layout generation, synthesizing scenes with dense object arrangements remains challenging. Existing methods primarily focus on large furniture while neglecting smaller objects, resulting in unrealistically empty scenes. Those that place small objects typically do not honor arrangement specifications, resulting in largely random placement not following the text description. We present HSM, a hierarchical framework for indoor scene generation with dense object arrangements across spatial scales. Indoor scenes are inherently hierarchical, with surfaces supporting objects at different scales, from large furniture on floors to smaller objects on tables and shelves. HSM embraces this hierarchy and exploits recurring cross-scale spatial patterns to generate complex and realistic indoor scenes in a unified manner. Our experiments show that HSM outperforms existing methods by generating scenes that are more realistic and better conform to user input across room types and spatial configurations. △ Less

Submitted 21 March, 2025; originally announced March 2025.

Comments: 23 pages, 7 figures

arXiv:2503.14756 [pdf, ps, other]

SceneEval: Evaluating Semantic Coherence in Text-Conditioned 3D Indoor Scene Synthesis

Authors: Hou In Ivan Tam, Hou In Derek Pun, Austin T. Wang, Angel X. Chang, Manolis Savva

Abstract: Despite recent advances in text-conditioned 3D indoor scene generation, there remain gaps in the evaluation of these methods. Existing metrics primarily assess the realism of generated scenes by comparing them to a set of ground-truth scenes, often overlooking alignment with the input text - a critical factor in determining how effectively a method meets user requirements. We present SceneEval, an… ▽ More Despite recent advances in text-conditioned 3D indoor scene generation, there remain gaps in the evaluation of these methods. Existing metrics primarily assess the realism of generated scenes by comparing them to a set of ground-truth scenes, often overlooking alignment with the input text - a critical factor in determining how effectively a method meets user requirements. We present SceneEval, an evaluation framework designed to address this limitation. SceneEval includes metrics for both explicit user requirements, such as the presence of specific objects and their attributes described in the input text, and implicit expectations, like the absence of object collisions, providing a comprehensive assessment of scene quality. To facilitate evaluation, we introduce SceneEval-500, a dataset of scene descriptions with annotated ground-truth scene properties. We evaluate recent scene generation methods using SceneEval and demonstrate its ability to provide detailed assessments of the generated scenes, highlighting strengths and areas for improvement across multiple dimensions. Our results show that current methods struggle at generating scenes that meet user requirements, underscoring the need for further research in this direction. △ Less

Submitted 11 June, 2025; v1 submitted 18 March, 2025; originally announced March 2025.

Comments: Expanded dataset to 500 annotated scene descriptions with new scene types; added validation via extended manual evaluation and a new user study; clarified distinctions from prior metrics; included results using an open-source VLM; stated intent to release code and data; corrected terminology and typos. 24 pages with 8 figures and 6 tables

arXiv:2408.02211 [pdf, ps, other]

SceneMotifCoder: Example-driven Visual Program Learning for Generating 3D Object Arrangements

Authors: Hou In Ivan Tam, Hou In Derek Pun, Austin T. Wang, Angel X. Chang, Manolis Savva

Abstract: Despite advances in text-to-3D generation methods, generation of multi-object arrangements remains challenging. Current methods exhibit failures in generating physically plausible arrangements that respect the provided text description. We present SceneMotifCoder (SMC), an example-driven framework for generating 3D object arrangements through visual program learning. SMC leverages large language m… ▽ More Despite advances in text-to-3D generation methods, generation of multi-object arrangements remains challenging. Current methods exhibit failures in generating physically plausible arrangements that respect the provided text description. We present SceneMotifCoder (SMC), an example-driven framework for generating 3D object arrangements through visual program learning. SMC leverages large language models (LLMs) and program synthesis to overcome these challenges by learning visual programs from example arrangements. These programs are generalized into compact, editable meta-programs. When combined with 3D object retrieval and geometry-aware optimization, they can be used to create object arrangements varying in arrangement structure and contained objects. Our experiments show that SMC generates high-quality arrangements using meta-programs learned from few examples. Evaluation results demonstrates that object arrangements generated by SMC better conform to user-specified text descriptions and are more physically plausible when compared with state-of-the-art text-to-3D generation and layout methods. △ Less

Submitted 3 June, 2025; v1 submitted 4 August, 2024; originally announced August 2024.

Comments: Accepted at 3DV 2025 (Oral). Project page: https://3dlg-hcvc.github.io/smc/. Minor revisions for camera-ready version

arXiv:2406.10180 [pdf, other]

MeshPose: Unifying DensePose and 3D Body Mesh reconstruction

Authors: Eric-Tuan Lê, Antonis Kakolyris, Petros Koutras, Himmy Tam, Efstratios Skordos, George Papandreou, Rıza Alp Güler, Iasonas Kokkinos

Abstract: DensePose provides a pixel-accurate association of images with 3D mesh coordinates, but does not provide a 3D mesh, while Human Mesh Reconstruction (HMR) systems have high 2D reprojection error, as measured by DensePose localization metrics. In this work we introduce MeshPose to jointly tackle DensePose and HMR. For this we first introduce new losses that allow us to use weak DensePose supervision… ▽ More DensePose provides a pixel-accurate association of images with 3D mesh coordinates, but does not provide a 3D mesh, while Human Mesh Reconstruction (HMR) systems have high 2D reprojection error, as measured by DensePose localization metrics. In this work we introduce MeshPose to jointly tackle DensePose and HMR. For this we first introduce new losses that allow us to use weak DensePose supervision to accurately localize in 2D a subset of the mesh vertices ('VertexPose'). We then lift these vertices to 3D, yielding a low-poly body mesh ('MeshPose'). Our system is trained in an end-to-end manner and is the first HMR method to attain competitive DensePose accuracy, while also being lightweight and amenable to efficient inference, making it suitable for real-time AR applications. △ Less

Submitted 14 June, 2024; originally announced June 2024.

Comments: IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

MSC Class: 68 ACM Class: I.2.10

Journal ref: CVPR 2024

arXiv:2312.09570 [pdf, other]

CAGE: Controllable Articulation GEneration

Authors: Jiayi Liu, Hou In Ivan Tam, Ali Mahdavi-Amiri, Manolis Savva

Abstract: We address the challenge of generating 3D articulated objects in a controllable fashion. Currently, modeling articulated 3D objects is either achieved through laborious manual authoring, or using methods from prior work that are hard to scale and control directly. We leverage the interplay between part shape, connectivity, and motion using a denoising diffusion-based method with attention modules… ▽ More We address the challenge of generating 3D articulated objects in a controllable fashion. Currently, modeling articulated 3D objects is either achieved through laborious manual authoring, or using methods from prior work that are hard to scale and control directly. We leverage the interplay between part shape, connectivity, and motion using a denoising diffusion-based method with attention modules designed to extract correlations between part attributes. Our method takes an object category label and a part connectivity graph as input and generates an object's geometry and motion parameters. The generated objects conform to user-specified constraints on the object category, part shape, and part articulation. Our experiments show that our method outperforms the state-of-the-art in articulated object generation, producing more realistic objects while conforming better to user constraints. Video Summary at: http://youtu.be/cH_rbKbyTpE △ Less

Submitted 20 March, 2024; v1 submitted 15 December, 2023; originally announced December 2023.

Comments: CVPR 2024. Project page: https://3dlg-hcvc.github.io/cage/

arXiv:2009.05266 [pdf, other]

GTEA: Inductive Representation Learning on Temporal Interaction Graphs via Temporal Edge Aggregation

Authors: Siyue Xie, Yiming Li, Da Sun Handason Tam, Xiaxin Liu, Qiu Fang Ying, Wing Cheong Lau, Dah Ming Chiu, Shou Zhi Chen

Abstract: In this paper, we propose the Graph Temporal Edge Aggregation (GTEA) framework for inductive learning on Temporal Interaction Graphs (TIGs). Different from previous works, GTEA models the temporal dynamics of interaction sequences in the continuous-time space and simultaneously takes advantage of both rich node and edge/ interaction attributes in the graph. Concretely, we integrate a sequence mode… ▽ More In this paper, we propose the Graph Temporal Edge Aggregation (GTEA) framework for inductive learning on Temporal Interaction Graphs (TIGs). Different from previous works, GTEA models the temporal dynamics of interaction sequences in the continuous-time space and simultaneously takes advantage of both rich node and edge/ interaction attributes in the graph. Concretely, we integrate a sequence model with a time encoder to learn pairwise interactional dynamics between two adjacent nodes.This helps capture complex temporal interactional patterns of a node pair along the history, which generates edge embeddings that can be fed into a GNN backbone. By aggregating features of neighboring nodes and the corresponding edge embeddings, GTEA jointly learns both topological and temporal dependencies of a TIG. In addition, a sparsity-inducing self-attention scheme is incorporated for neighbor aggregation, which highlights more important neighbors and suppresses trivial noises for GTEA. By jointly optimizing the sequence model and the GNN backbone, GTEA learns more comprehensive node representations capturing both temporal and graph structural characteristics. Extensive experiments on five large-scale real-world datasets demonstrate the superiority of GTEA over other inductive models. △ Less

Submitted 3 May, 2023; v1 submitted 11 September, 2020; originally announced September 2020.

Comments: accepted by PAKDD2023

arXiv:1906.05546 [pdf, ps, other]

Identifying Illicit Accounts in Large Scale E-payment Networks -- A Graph Representation Learning Approach

Authors: Da Sun Handason Tam, Wing Cheong Lau, Bin Hu, Qiu Fang Ying, Dah Ming Chiu, Hong Liu

Abstract: Rapid and massive adoption of mobile/ online payment services has brought new challenges to the service providers as well as regulators in safeguarding the proper uses such services/ systems. In this paper, we leverage recent advances in deep-neural-network-based graph representation learning to detect abnormal/ suspicious financial transactions in real-world e-payment networks. In particular, we… ▽ More Rapid and massive adoption of mobile/ online payment services has brought new challenges to the service providers as well as regulators in safeguarding the proper uses such services/ systems. In this paper, we leverage recent advances in deep-neural-network-based graph representation learning to detect abnormal/ suspicious financial transactions in real-world e-payment networks. In particular, we propose an end-to-end Graph Convolution Network (GCN)-based algorithm to learn the embeddings of the nodes and edges of a large-scale time-evolving graph. In the context of e-payment transaction graphs, the resultant node and edge embeddings can effectively characterize the user-background as well as the financial transaction patterns of individual account holders. As such, we can use the graph embedding results to drive downstream graph mining tasks such as node-classification to identify illicit accounts within the payment networks. Our algorithm outperforms state-of-the-art schemes including GraphSAGE, Gradient Boosting Decision Tree and Random Forest to deliver considerably higher accuracy (94.62% and 86.98% respectively) in classifying user accounts within 2 practical e-payment transaction datasets. It also achieves outstanding accuracy (97.43%) for another biomedical entity identification task while using only edge-related information. △ Less

Submitted 13 June, 2019; originally announced June 2019.

arXiv:1905.12957 [pdf, other]

Neural Entropic Estimation: A faster path to mutual information estimation

Authors: Chung Chan, Ali Al-Bashabsheh, Hing Pang Huang, Michael Lim, Da Sun Handason Tam, Chao Zhao

Abstract: We point out a limitation of the mutual information neural estimation (MINE) where the network fails to learn at the initial training phase, leading to slow convergence in the number of training iterations. To solve this problem, we propose a faster method called the mutual information neural entropic estimation (MI-NEE). Our solution first generalizes MINE to estimate the entropy using a custom r… ▽ More We point out a limitation of the mutual information neural estimation (MINE) where the network fails to learn at the initial training phase, leading to slow convergence in the number of training iterations. To solve this problem, we propose a faster method called the mutual information neural entropic estimation (MI-NEE). Our solution first generalizes MINE to estimate the entropy using a custom reference distribution. The entropy estimate can then be used to estimate the mutual information. We argue that the seemingly redundant intermediate step of entropy estimation allows one to improve the convergence by an appropriate reference distribution. In particular, we show that MI-NEE reduces to MINE in the special case when the reference distribution is the product of marginal distributions, but faster convergence is possible by choosing the uniform distribution as the reference distribution instead. Compared to the product of marginals, the uniform distribution introduces more samples in low-density regions and fewer samples in high-density regions, which appear to lead to an overall larger gradient for faster convergence. △ Less

Submitted 30 May, 2019; v1 submitted 30 May, 2019; originally announced May 2019.

arXiv:1107.3194 [pdf]

Fingerprint recognition using standardized fingerprint model

Authors: Le Hoang Thai, Ha Nhat Tam

Abstract: Fingerprint recognition is one of most popular and accuracy Biometric technologies. Nowadays, it is used in many real applications. However, recognizing fingerprints in poor quality images is still a very complex problem. In recent years, many algorithms, models...are given to improve the accuracy of recognition system. This paper discusses on the standardized fingerprint model which is used to sy… ▽ More Fingerprint recognition is one of most popular and accuracy Biometric technologies. Nowadays, it is used in many real applications. However, recognizing fingerprints in poor quality images is still a very complex problem. In recent years, many algorithms, models...are given to improve the accuracy of recognition system. This paper discusses on the standardized fingerprint model which is used to synthesize the template of fingerprints. In this model, after pre-processing step, we find the transformation between templates, adjust parameters, synthesize fingerprint, and reduce noises. Then, we use the final fingerprint to match with others in FVC2004 fingerprint database (DB4) to show the capability of the model. △ Less

Submitted 15 July, 2011; originally announced July 2011.

Comments: 7 pages, 16 figures, 3 tables, IJCSI International Journal of Computer Science Issues, Vol. 7, Issue 3, No 7, May 2010

Journal ref: IJCSI International Journal of Computer Science Issues, Vol. 7, Issue 3, No 7, May 2010, ISSN (Online): 1694-0784, ISSN (Print): 1694-0814

Showing 1–9 of 9 results for author: Tam, H