Search | arXiv e-print repository

Mask-aware Text-to-Image Retrieval: Referring Expression Segmentation Meets Cross-modal Retrieval

Authors: Li-Cheng Shen, Jih-Kang Hsieh, Wei-Hua Li, Chu-Song Chen

Abstract: Text-to-image retrieval (TIR) aims to find relevant images based on a textual query, but existing approaches are primarily based on whole-image captions and lack interpretability. Meanwhile, referring expression segmentation (RES) enables precise object localization based on natural language descriptions but is computationally expensive when applied across large image collections. To bridge this g… ▽ More Text-to-image retrieval (TIR) aims to find relevant images based on a textual query, but existing approaches are primarily based on whole-image captions and lack interpretability. Meanwhile, referring expression segmentation (RES) enables precise object localization based on natural language descriptions but is computationally expensive when applied across large image collections. To bridge this gap, we introduce Mask-aware TIR (MaTIR), a new task that unifies TIR and RES, requiring both efficient image search and accurate object segmentation. To address this task, we propose a two-stage framework, comprising a first stage for segmentation-aware image retrieval and a second stage for reranking and object grounding with a multimodal large language model (MLLM). We leverage SAM 2 to generate object masks and Alpha-CLIP to extract region-level embeddings offline at first, enabling effective and scalable online retrieval. Secondly, MLLM is used to refine retrieval rankings and generate bounding boxes, which are matched to segmentation masks. We evaluate our approach on COCO and D$^3$ datasets, demonstrating significant improvements in both retrieval accuracy and segmentation quality over previous methods. △ Less

Submitted 28 June, 2025; originally announced June 2025.

Comments: ICMR 2025

arXiv:2506.10361 [pdf, ps, other]

FaceLiVT: Face Recognition using Linear Vision Transformer with Structural Reparameterization For Mobile Device

Authors: Novendra Setyawan, Chi-Chia Sun, Mao-Hsiu Hsu, Wen-Kai Kuo, Jun-Wei Hsieh

Abstract: This paper introduces FaceLiVT, a lightweight yet powerful face recognition model that integrates a hybrid Convolution Neural Network (CNN)-Transformer architecture with an innovative and lightweight Multi-Head Linear Attention (MHLA) mechanism. By combining MHLA alongside a reparameterized token mixer, FaceLiVT effectively reduces computational complexity while preserving competitive accuracy. Ex… ▽ More This paper introduces FaceLiVT, a lightweight yet powerful face recognition model that integrates a hybrid Convolution Neural Network (CNN)-Transformer architecture with an innovative and lightweight Multi-Head Linear Attention (MHLA) mechanism. By combining MHLA alongside a reparameterized token mixer, FaceLiVT effectively reduces computational complexity while preserving competitive accuracy. Extensive evaluations on challenging benchmarks; including LFW, CFP-FP, AgeDB-30, IJB-B, and IJB-C; highlight its superior performance compared to state-of-the-art lightweight models. MHLA notably improves inference speed, allowing FaceLiVT to deliver high accuracy with lower latency on mobile devices. Specifically, FaceLiVT is 8.6 faster than EdgeFace, a recent hybrid CNN-Transformer model optimized for edge devices, and 21.2 faster than a pure ViT-Based model. With its balanced design, FaceLiVT offers an efficient and practical solution for real-time face recognition on resource-constrained platforms. △ Less

Submitted 12 June, 2025; originally announced June 2025.

Comments: 2025 ICIP

arXiv:2505.17360 [pdf, ps, other]

The Quasi-Polynomial Low-Degree Conjecture is False

Authors: Rares-Darius Buhai, Jun-Ting Hsieh, Aayush Jain, Pravesh K. Kothari

Abstract: There is a growing body of work on proving hardness results for average-case estimation problems by bounding the low-degree advantage (LDA) - a quantitative estimate of the closeness of low-degree moments - between a null distribution and a related planted distribution. Such hardness results are now ubiquitous not only for foundational average-case problems but also central questions in statistics… ▽ More There is a growing body of work on proving hardness results for average-case estimation problems by bounding the low-degree advantage (LDA) - a quantitative estimate of the closeness of low-degree moments - between a null distribution and a related planted distribution. Such hardness results are now ubiquitous not only for foundational average-case problems but also central questions in statistics and cryptography. This line of work is supported by the low-degree conjecture of Hopkins, which postulates that a vanishing degree-$D$ LDA implies the absence of any noise-tolerant distinguishing algorithm with runtime $n^{\widetilde{O}(D)}$ whenever 1) the null distribution is product on $\{0,1\}^{\binom{n}{k}}$, and 2) the planted distribution is permutation invariant, that is, invariant under any relabeling $[n] \rightarrow [n]$. In this paper, we disprove this conjecture. Specifically, we show that for any fixed $\varepsilon>0$ and $k\geq 2$, there is a permutation-invariant planted distribution on $\{0,1\}^{\binom{n}{k}}$ that has a vanishing degree-$n^{1-O(\varepsilon)}$ LDA with respect to the uniform distribution on $\{0,1\}^{\binom{n}{k}}$, yet the corresponding $\varepsilon$-noisy distinguishing problem can be solved in $n^{O(\log^{1/(k-1)}(n))}$ time. Our construction relies on algorithms for list-decoding for noisy polynomial interpolation in the high-error regime. We also give another construction of a pair of planted and (non-product) null distributions on $\mathbb{R}^{n \times n}$ with a vanishing $n^{Ω(1)}$-degree LDA while the largest eigenvalue serves as an efficient noise-tolerant distinguisher. Our results suggest that while a vanishing LDA may still be interpreted as evidence of hardness, developing a theory of average-case complexity based on such heuristics requires a more careful approach. △ Less

Submitted 22 May, 2025; originally announced May 2025.

arXiv:2504.15087 [pdf, other]

Explicit Lossless Vertex Expanders

Authors: Jun-Ting Hsieh, Alexander Lubotzky, Sidhanth Mohanty, Assaf Reiner, Rachel Yun Zhang

Abstract: We give the first construction of explicit constant-degree lossless vertex expanders. Specifically, for any $\varepsilon > 0$ and sufficiently large $d$, we give an explicit construction of an infinite family of $d$-regular graphs where every small set $S$ of vertices has $(1-\varepsilon)d|S|$ neighbors (which implies $(1-2\varepsilon)d|S|$ unique-neighbors). Our results also extend naturally to c… ▽ More We give the first construction of explicit constant-degree lossless vertex expanders. Specifically, for any $\varepsilon > 0$ and sufficiently large $d$, we give an explicit construction of an infinite family of $d$-regular graphs where every small set $S$ of vertices has $(1-\varepsilon)d|S|$ neighbors (which implies $(1-2\varepsilon)d|S|$ unique-neighbors). Our results also extend naturally to construct biregular bipartite graphs of any constant imbalance, where small sets on each side have strong expansion guarantees. The graphs we construct admit a free group action, and hence realize new families of quantum LDPC codes of Lin and M. Hsieh with a linear time decoding algorithm. Our construction is based on taking an appropriate product of a constant-sized lossless expander with a base graph constructed from Ramanujan Cayley cubical complexes. △ Less

Submitted 21 April, 2025; originally announced April 2025.

Comments: 33 pages, 3 figures

arXiv:2504.13392 [pdf, ps, other]

POET: Supporting Prompting Creativity and Personalization with Automated Expansion of Text-to-Image Generation

Authors: Evans Xu Han, Alice Qian Zhang, Hong Shen, Haiyi Zhu, Paul Pu Liang, Jane Hsieh

Abstract: State-of-the-art visual generative AI tools hold immense potential to assist users in the early ideation stages of creative tasks -- offering the ability to generate (rather than search for) novel and unprecedented (instead of existing) images of considerable quality that also adhere to boundless combinations of user specifications. However, many large-scale text-to-image systems are designed for… ▽ More State-of-the-art visual generative AI tools hold immense potential to assist users in the early ideation stages of creative tasks -- offering the ability to generate (rather than search for) novel and unprecedented (instead of existing) images of considerable quality that also adhere to boundless combinations of user specifications. However, many large-scale text-to-image systems are designed for broad applicability, yielding conventional output that may limit creative exploration. They also employ interaction methods that may be difficult for beginners. Given that creative end users often operate in diverse, context-specific ways that are often unpredictable, more variation and personalization are necessary. We introduce POET, a real-time interactive tool that (1) automatically discovers dimensions of homogeneity in text-to-image generative models, (2) expands these dimensions to diversify the output space of generated images, and (3) learns from user feedback to personalize expansions. An evaluation with 28 users spanning four creative task domains demonstrated POET's ability to generate results with higher perceived diversity and help users reach satisfaction in fewer prompts during creative tasks, thereby prompting them to deliberate and reflect more on a wider range of possible produced results during the co-creative process. Focusing on visual creativity, POET offers a first glimpse of how interaction techniques of future text-to-image generation tools may support and align with more pluralistic values and the needs of end users during the ideation stages of their work. △ Less

Submitted 17 April, 2025; originally announced April 2025.

arXiv:2503.15512 [pdf, other]

Beyond Accuracy, SHAP, and Anchors -- On the difficulty of designing effective end-user explanations

Authors: Zahra Abba Omar, Nadia Nahar, Jacob Tjaden, Inès M. Gilles, Fikir Mekonnen, Jane Hsieh, Christian Kästner, Alka Menon

Abstract: Modern machine learning produces models that are impossible for users or developers to fully understand -- raising concerns about trust, oversight and human dignity. Transparency and explainability methods aim to provide some help in understanding models, but it remains challenging for developers to design explanations that are understandable to target users and effective for their purpose. Emergi… ▽ More Modern machine learning produces models that are impossible for users or developers to fully understand -- raising concerns about trust, oversight and human dignity. Transparency and explainability methods aim to provide some help in understanding models, but it remains challenging for developers to design explanations that are understandable to target users and effective for their purpose. Emerging guidelines and regulations set goals but may not provide effective actionable guidance to developers. In a controlled experiment with 124 participants, we investigate whether and how specific forms of policy guidance help developers design explanations for an ML-powered screening tool for diabetic retinopathy. Contrary to our expectations, we found that participants across the board struggled to produce quality explanations, comply with the provided policy requirements for explainability, and provide evidence of compliance. We posit that participant noncompliance is in part due to a failure to imagine and anticipate the needs of their audience, particularly non-technical stakeholders. Drawing on cognitive process theory and the sociological imagination to contextualize participants' failure, we recommend educational interventions. △ Less

Submitted 28 January, 2025; originally announced March 2025.

arXiv:2502.13221 [pdf, other]

Two Tickets are Better than One: Fair and Accurate Hiring Under Strategic LLM Manipulations

Authors: Lee Cohen, Jack Hsieh, Connie Hong, Judy Hanwen Shen

Abstract: In an era of increasingly capable foundation models, job seekers are turning to generative AI tools to enhance their application materials. However, unequal access to and knowledge about generative AI tools can harm both employers and candidates by reducing the accuracy of hiring decisions and giving some candidates an unfair advantage. To address these challenges, we introduce a new variant of th… ▽ More In an era of increasingly capable foundation models, job seekers are turning to generative AI tools to enhance their application materials. However, unequal access to and knowledge about generative AI tools can harm both employers and candidates by reducing the accuracy of hiring decisions and giving some candidates an unfair advantage. To address these challenges, we introduce a new variant of the strategic classification framework tailored to manipulations performed using large language models, accommodating varying levels of manipulations and stochastic outcomes. We propose a ``two-ticket'' scheme, where the hiring algorithm applies an additional manipulation to each submitted resume and considers this manipulated version together with the original submitted resume. We establish theoretical guarantees for this scheme, showing improvements for both the fairness and accuracy of hiring decisions when the true positive rate is maximized subject to a no false positives constraint. We further generalize this approach to an $n$-ticket scheme and prove that hiring outcomes converge to a fixed, group-independent decision, eliminating disparities arising from differential LLM access. Finally, we empirically validate our framework and the performance of our two-ticket scheme on real resumes using an open-source resume screening tool. △ Less

Submitted 18 February, 2025; originally announced February 2025.

arXiv:2502.07417 [pdf, other]

Fast-COS: A Fast One-Stage Object Detector Based on Reparameterized Attention Vision Transformer for Autonomous Driving

Authors: Novendra Setyawan, Ghufron Wahyu Kurniawan, Chi-Chia Sun, Wen-Kai Kuo, Jun-Wei Hsieh

Abstract: The perception system is a a critical role of an autonomous driving system for ensuring safety. The driving scene perception system fundamentally represents an object detection task that requires achieving a balance between accuracy and processing speed. Many contemporary methods focus on improving detection accuracy but often overlook the importance of real-time detection capabilities when comput… ▽ More The perception system is a a critical role of an autonomous driving system for ensuring safety. The driving scene perception system fundamentally represents an object detection task that requires achieving a balance between accuracy and processing speed. Many contemporary methods focus on improving detection accuracy but often overlook the importance of real-time detection capabilities when computational resources are limited. Thus, it is vital to investigate efficient object detection strategies for driving scenes. This paper introduces Fast-COS, a novel single-stage object detection framework crafted specifically for driving scene applications. The research initiates with an analysis of the backbone, considering both macro and micro architectural designs, yielding the Reparameterized Attention Vision Transformer (RAViT). RAViT utilizes Reparameterized Multi-Scale Depth-Wise Convolution (RepMSDW) and Reparameterized Self-Attention (RepSA) to enhance computational efficiency and feature extraction. In extensive tests across GPU, edge, and mobile platforms, RAViT achieves 81.4% Top-1 accuracy on the ImageNet-1K dataset, demonstrating significant throughput improvements over comparable backbone models such as ResNet, FastViT, RepViT, and EfficientFormer. Additionally, integrating RepMSDW into a feature pyramid network forms RepFPN, enabling fast and multi-scale feature fusion. Fast-COS enhances object detection in driving scenes, attaining an AP50 score of 57.2% on the BDD100K dataset and 80.0% on the TJU-DHD Traffic dataset. It surpasses leading models in efficiency, delivering up to 75.9% faster GPU inference and 1.38 higher throughput on edge devices compared to FCOS, YOLOF, and RetinaNet. These findings establish Fast-COS as a highly scalable and reliable solution suitable for real-time applications, especially in resource-limited environments like autonomous driving systems △ Less

Submitted 11 February, 2025; originally announced February 2025.

Comments: Under Review on IEEE Transactions on Intelligent Transportation Systems

arXiv:2502.05800 [pdf, other]

doi 10.1109/ISCAS56072.2025.11043206

MicroViT: A Vision Transformer with Low Complexity Self Attention for Edge Device

Authors: Novendra Setyawan, Chi-Chia Sun, Mao-Hsiu Hsu, Wen-Kai Kuo, Jun-Wei Hsieh

Abstract: The Vision Transformer (ViT) has demonstrated state-of-the-art performance in various computer vision tasks, but its high computational demands make it impractical for edge devices with limited resources. This paper presents MicroViT, a lightweight Vision Transformer architecture optimized for edge devices by significantly reducing computational complexity while maintaining high accuracy. The core… ▽ More The Vision Transformer (ViT) has demonstrated state-of-the-art performance in various computer vision tasks, but its high computational demands make it impractical for edge devices with limited resources. This paper presents MicroViT, a lightweight Vision Transformer architecture optimized for edge devices by significantly reducing computational complexity while maintaining high accuracy. The core of MicroViT is the Efficient Single Head Attention (ESHA) mechanism, which utilizes group convolution to reduce feature redundancy and processes only a fraction of the channels, thus lowering the burden of the self-attention mechanism. MicroViT is designed using a multi-stage MetaFormer architecture, stacking multiple MicroViT encoders to enhance efficiency and performance. Comprehensive experiments on the ImageNet-1K and COCO datasets demonstrate that MicroViT achieves competitive accuracy while significantly improving 3.6 faster inference speed and reducing energy consumption with 40% higher efficiency than the MobileViT series, making it suitable for deployment in resource-constrained environments such as mobile and edge devices. △ Less

Submitted 9 February, 2025; originally announced February 2025.

arXiv:2502.04482 [pdf, other]

doi 10.1145/3706598.3714398

Gig2Gether: Data-sharing to Empower, Unify and Demystify Gig Work

Authors: Jane Hsieh, Angie Zhang, Sajel Surati, Sijia Xie, Yeshua Ayala, Nithila Sathiya, Tzu-Sheng Kuo, Min Kyung Lee, Haiyi Zhu

Abstract: The wide adoption of platformized work has generated remarkable advancements in the labor patterns and mobility of modern society. Underpinning such progress, gig workers are exposed to unprecedented challenges and accountabilities: lack of data transparency, social and physical isolation, as well as insufficient infrastructural safeguards. Gig2Gether presents a space designed for workers to engag… ▽ More The wide adoption of platformized work has generated remarkable advancements in the labor patterns and mobility of modern society. Underpinning such progress, gig workers are exposed to unprecedented challenges and accountabilities: lack of data transparency, social and physical isolation, as well as insufficient infrastructural safeguards. Gig2Gether presents a space designed for workers to engage in an initial experience of voluntarily contributing anecdotal and statistical data to affect policy and build solidarity across platforms by exchanging unifying and diverse experiences. Our 7-day field study with 16 active workers from three distinct platforms and work domains showed existing affordances of data-sharing: facilitating mutual support across platforms, as well as enabling financial reflection and planning. Additionally, workers envisioned future use cases of data-sharing for collectivism (e.g., collaborative examinations of algorithmic speculations) and informing policy (e.g., around safety and pay), which motivated (latent) worker desiderata of additional capabilities and data metrics. Based on these findings, we discuss remaining challenges to address and how data-sharing tools can complement existing structures to maximize worker empowerment and policy impact. △ Less

Submitted 6 February, 2025; originally announced February 2025.

arXiv:2501.07991 [pdf]

Training Hybrid Neural Networks with Multimode Optical Nonlinearities Using Digital Twins

Authors: Ilker Oguz, Louis J. E. Suter, Jih-Liang Hsieh, Mustafa Yildirim, Niyazi Ulas Dinc, Christophe Moser, Demetri Psaltis

Abstract: The ability to train ever-larger neural networks brings artificial intelligence to the forefront of scientific and technical discoveries. However, their exponentially increasing size creates a proportionally greater demand for energy and computational hardware. Incorporating complex physical events in networks as fixed, efficient computation modules can address this demand by decreasing the comple… ▽ More The ability to train ever-larger neural networks brings artificial intelligence to the forefront of scientific and technical discoveries. However, their exponentially increasing size creates a proportionally greater demand for energy and computational hardware. Incorporating complex physical events in networks as fixed, efficient computation modules can address this demand by decreasing the complexity of trainable layers. Here, we utilize ultrashort pulse propagation in multimode fibers, which perform large-scale nonlinear transformations, for this purpose. Training the hybrid architecture is achieved through a neural model that differentiably approximates the optical system. The training algorithm updates the neural simulator and backpropagates the error signal over this proxy to optimize layers preceding the optical one. Our experimental results achieve state-of-the-art image classification accuracies and simulation fidelity. Moreover, the framework demonstrates exceptional resilience to experimental drifts. By integrating low-energy physical systems into neural networks, this approach enables scalable, energy-efficient AI models with significantly reduced computational demands. △ Less

Submitted 14 January, 2025; originally announced January 2025.

Comments: 17 pages, 6 figures

arXiv:2501.07917 [pdf]

Roadmap on Neuromorphic Photonics

Authors: Daniel Brunner, Bhavin J. Shastri, Mohammed A. Al Qadasi, H. Ballani, Sylvain Barbay, Stefano Biasi, Peter Bienstman, Simon Bilodeau, Wim Bogaerts, Fabian Böhm, G. Brennan, Sonia Buckley, Xinlun Cai, Marcello Calvanese Strinati, B. Canakci, Benoit Charbonnier, Mario Chemnitz, Yitong Chen, Stanley Cheung, Jeff Chiles, Suyeon Choi, Demetrios N. Christodoulides, Lukas Chrostowski, J. Chu, J. H. Clegg , et al. (125 additional authors not shown)

Abstract: This roadmap consolidates recent advances while exploring emerging applications, reflecting the remarkable diversity of hardware platforms, neuromorphic concepts, and implementation philosophies reported in the field. It emphasizes the critical role of cross-disciplinary collaboration in this rapidly evolving field. This roadmap consolidates recent advances while exploring emerging applications, reflecting the remarkable diversity of hardware platforms, neuromorphic concepts, and implementation philosophies reported in the field. It emphasizes the critical role of cross-disciplinary collaboration in this rapidly evolving field. △ Less

Submitted 16 January, 2025; v1 submitted 14 January, 2025; originally announced January 2025.

arXiv:2412.02973 [pdf, other]

Supporting Gig Worker Needs and Advancing Policy Through Worker-Centered Data-Sharing

Authors: Jane Hsieh, Angie Zhang, Mialy Rasetarinera, Erik Chou, Daniel Ngo, Karen Lightman, Min Kyung Lee, Haiyi Zhu

Abstract: The proliferating adoption of platform-based gig work increasingly raises concerns for worker conditions. Past studies documented how platforms leveraged design to exploit labor, withheld information to generate power asymmetries, and left workers alone to manage logistical overheads as well as social isolation. However, researchers also called attention to the potential of helping workers overcom… ▽ More The proliferating adoption of platform-based gig work increasingly raises concerns for worker conditions. Past studies documented how platforms leveraged design to exploit labor, withheld information to generate power asymmetries, and left workers alone to manage logistical overheads as well as social isolation. However, researchers also called attention to the potential of helping workers overcome such costs via worker-led datasharing, which can enable collective actions and mutual aid among workers, while offering advocates, lawmakers and regulatory bodies insights for improving work conditions. To understand stakeholders' desiderata for a data-sharing system (i.e. functionality and policy initiatives that it can serve), we interviewed 11 policy domain experts in the U.S. and conducted co-design workshops with 14 active gig workers across four domains. Our results outline policymakers' prioritized initiatives, information needs, and (mis)alignments with workers' concerns and desires around data collectives. We offer design recommendations for data-sharing systems that support worker needs while bringing us closer to legislation that promote more thriving and equitable gig work futures. △ Less

Submitted 11 December, 2024; v1 submitted 3 December, 2024; originally announced December 2024.

arXiv:2411.14361 [pdf, other]

Improved Lower Bounds for all Odd-Query Locally Decodable Codes

Authors: Arpon Basu, Jun-Ting Hsieh, Pravesh K. Kothari, Andrew D. Lin

Abstract: We prove that for every odd $q\geq 3$, any $q$-query binary, possibly non-linear locally decodable code ($q$-LDC) $E:\{\pm1\}^k \rightarrow \{\pm1\}^n$ must satisfy $k \leq \tilde{O}(n^{1-2/q})$. For even $q$, this bound was established in a sequence of prior works. For $q=3$, the above bound was achieved in a recent work of Alrabiah, Guruswami, Kothari and Manohar using an argument that crucially… ▽ More We prove that for every odd $q\geq 3$, any $q$-query binary, possibly non-linear locally decodable code ($q$-LDC) $E:\{\pm1\}^k \rightarrow \{\pm1\}^n$ must satisfy $k \leq \tilde{O}(n^{1-2/q})$. For even $q$, this bound was established in a sequence of prior works. For $q=3$, the above bound was achieved in a recent work of Alrabiah, Guruswami, Kothari and Manohar using an argument that crucially exploits known exponential lower bounds for $2$-LDCs. Their strategy hits an inherent bottleneck for $q \geq 5$. Our key insight is identifying a general sufficient condition on the hypergraph of local decoding sets called $t$-approximate strong regularity. This condition demands that 1) the number of hyperedges containing any given subset of vertices of size $t$ (i.e., its co-degree) be equal to the same but arbitrary value $d_t$ up to a multiplicative constant slack, and 2) all other co-degrees be upper-bounded relative to $d_t$. This condition significantly generalizes related proposals in prior works that demand absolute upper bounds on all co-degrees. We give an argument based on spectral bounds on Kikuchi Matrices that lower bounds the blocklength of any LDC whose local decoding sets satisfy $t$-approximate strong regularity for any $t \leq q$. Crucially, unlike prior works, our argument works despite having no non-trivial absolute upper bound on the co-degrees of any set of vertices. To apply our argument to arbitrary $q$-LDCs, we give a new, greedy, approximate strong regularity decomposition that shows that arbitrary, dense enough hypergraphs can be partitioned (up to a small error) into approximately strongly regular pieces satisfying the required relative bounds on the co-degrees. △ Less

Submitted 21 November, 2024; originally announced November 2024.

arXiv:2411.11627 [pdf, ps, other]

Explicit Two-Sided Vertex Expanders Beyond the Spectral Barrier

Authors: Jun-Ting Hsieh, Ting-Chun Lin, Sidhanth Mohanty, Ryan O'Donnell, Rachel Yun Zhang

Abstract: We construct the first explicit two-sided vertex expanders that bypass the spectral barrier. Previously, the strongest known explicit vertex expanders were given by $d$-regular Ramanujan graphs, whose spectral properties imply that every small subset of vertices $S$ has at least $0.5d|S|$ distinct neighbors. However, it is possible to construct Ramanujan graphs containing a small set $S$ with no… ▽ More We construct the first explicit two-sided vertex expanders that bypass the spectral barrier. Previously, the strongest known explicit vertex expanders were given by $d$-regular Ramanujan graphs, whose spectral properties imply that every small subset of vertices $S$ has at least $0.5d|S|$ distinct neighbors. However, it is possible to construct Ramanujan graphs containing a small set $S$ with no more than $0.5d|S|$ neighbors. In fact, no explicit construction was known to break the $0.5 d$-barrier. In this work, we give an explicit construction of an infinite family of $d$-regular graphs (for large enough $d$) where every small set expands by a factor of $\approx 0.6d$. More generally, for large enough $d_1,d_2$, we give an infinite family of $(d_1,d_2)$-biregular graphs where small sets on the left expand by a factor of $\approx 0.6d_1$, and small sets on the right expand by a factor of $\approx 0.6d_2$. In fact, our construction satisfies an even stronger property: small sets on the left and right have unique-neighbor expansion $0.6d_1$ and $0.6d_2$ respectively. Our construction follows the tripartite line product framework of Hsieh, McKenzie, Mohanty & Paredes, and instantiates it using the face-vertex incidence of the $4$-dimensional Ramanujan clique complex as its base component. As a key part of our analysis, we derive new bounds on the triangle density of small sets in the Ramanujan clique complex. △ Less

Submitted 18 November, 2024; originally announced November 2024.

Comments: 28 pages

arXiv:2409.15644 [pdf, other]

doi 10.1145/3706598.3713865

PolicyCraft: Supporting Collaborative and Participatory Policy Design through Case-Grounded Deliberation

Authors: Tzu-Sheng Kuo, Quan Ze Chen, Amy X. Zhang, Jane Hsieh, Haiyi Zhu, Kenneth Holstein

Abstract: Community and organizational policies are typically designed in a top-down, centralized fashion, with limited input from impacted stakeholders. This can result in policies that are misaligned with community needs or perceived as illegitimate. How can we support more collaborative, participatory approaches to policy design? In this paper, we present PolicyCraft, a system that structures collaborati… ▽ More Community and organizational policies are typically designed in a top-down, centralized fashion, with limited input from impacted stakeholders. This can result in policies that are misaligned with community needs or perceived as illegitimate. How can we support more collaborative, participatory approaches to policy design? In this paper, we present PolicyCraft, a system that structures collaborative policy design through case-grounded deliberation. Building on past research that highlights the value of concrete cases in establishing common ground, PolicyCraft supports users in collaboratively proposing, critiquing, and revising policies through discussion and voting on cases. A field study across two university courses showed that students using PolicyCraft reached greater consensus and developed better-supported course policies, compared with those using a baseline system that did not scaffold their use of concrete cases. Reflecting on our findings, we discuss opportunities for future HCI systems to help groups more effectively bridge between abstract policies and concrete cases. △ Less

Submitted 5 February, 2025; v1 submitted 23 September, 2024; originally announced September 2024.

Journal ref: Proceedings of the 2025 CHI Conference on Human Factors in Computing Systems (CHI '25)

arXiv:2409.03684 [pdf, ps, other]

Predicting quantum channels over general product distributions

Authors: Sitan Chen, Jaume de Dios Pont, Jun-Ting Hsieh, Hsin-Yuan Huang, Jane Lange, Jerry Li

Abstract: We investigate the problem of predicting the output behavior of unknown quantum channels. Given query access to an $n$-qubit channel $E$ and an observable $O$, we aim to learn the mapping \begin{equation*} ρ\mapsto \mathrm{Tr}(O E[ρ]) \end{equation*} to within a small error for most $ρ$ sampled from a distribution $D$. Previously, Huang, Chen, and Preskill proved a surprising result that even if… ▽ More We investigate the problem of predicting the output behavior of unknown quantum channels. Given query access to an $n$-qubit channel $E$ and an observable $O$, we aim to learn the mapping \begin{equation*} ρ\mapsto \mathrm{Tr}(O E[ρ]) \end{equation*} to within a small error for most $ρ$ sampled from a distribution $D$. Previously, Huang, Chen, and Preskill proved a surprising result that even if $E$ is arbitrary, this task can be solved in time roughly $n^{O(\log(1/ε))}$, where $ε$ is the target prediction error. However, their guarantee applied only to input distributions $D$ invariant under all single-qubit Clifford gates, and their algorithm fails for important cases such as general product distributions over product states $ρ$. In this work, we propose a new approach that achieves accurate prediction over essentially any product distribution $D$, provided it is not "classical" in which case there is a trivial exponential lower bound. Our method employs a "biased Pauli analysis," analogous to classical biased Fourier analysis. Implementing this approach requires overcoming several challenges unique to the quantum setting, including the lack of a basis with appropriate orthogonality properties. The techniques we develop to address these issues may have broader applications in quantum information. △ Less

Submitted 5 September, 2024; originally announced September 2024.

Comments: 20 pages, comments welcome

arXiv:2409.00737 [pdf, other]

doi 10.1145/3678884.3681829

Data Collectives as a means to Improve Accountability, Combat Surveillance and Reduce Inequalities

Authors: Jane Hsieh, Angie Zhang, Seyun Kim, Varun Nagaraj Rao, Samantha Dalal, Alexandra Mateescu, Rafael Do Nascimento Grohmann, Motahhare Eslami, Min Kyung Lee, Haiyi Zhu

Abstract: Platform-based laborers face unprecedented challenges and working conditions that result from algorithmic opacity, insufficient data transparency, and unclear policies and regulations. The CSCW and HCI communities increasingly turn to worker data collectives as a means to advance related policy and regulation, hold platforms accountable for data transparency and disclosure, and empower the collect… ▽ More Platform-based laborers face unprecedented challenges and working conditions that result from algorithmic opacity, insufficient data transparency, and unclear policies and regulations. The CSCW and HCI communities increasingly turn to worker data collectives as a means to advance related policy and regulation, hold platforms accountable for data transparency and disclosure, and empower the collective worker voice. However, fundamental questions remain for designing, governing and sustaining such data infrastructures. In this workshop, we leverage frameworks such as data feminism to design sustainable and power-aware data collectives that tackle challenges present in various types of online labor platforms (e.g., ridesharing, freelancing, crowdwork, carework). While data collectives aim to support worker collectives and complement relevant policy initiatives, the goal of this workshop is to encourage their designers to consider topics of governance, privacy, trust, and transparency. In this one-day session, we convene research and advocacy community members to reflect on critical platform work issues (e.g., worker surveillance, discrimination, wage theft, insufficient platform accountability) as well as to collaborate on codesigning data collectives that ethically and equitably address these concerns by supporting working collectivism and informing policy development. △ Less

Submitted 1 September, 2024; originally announced September 2024.

arXiv:2405.16296 [pdf, other]

Neural Network-Based Tracking and 3D Reconstruction of Baseball Pitch Trajectories from Single-View 2D Video

Authors: Jhen Hsieh

Abstract: In this paper, we present a neural network-based approach for tracking and reconstructing the trajectories of baseball pitches from 2D video footage to 3D coordinates. We utilize OpenCV's CSRT algorithm to accurately track the baseball and fixed reference points in 2D video frames. These tracked pixel coordinates are then used as input features for our neural network model, which comprises multipl… ▽ More In this paper, we present a neural network-based approach for tracking and reconstructing the trajectories of baseball pitches from 2D video footage to 3D coordinates. We utilize OpenCV's CSRT algorithm to accurately track the baseball and fixed reference points in 2D video frames. These tracked pixel coordinates are then used as input features for our neural network model, which comprises multiple fully connected layers to map the 2D coordinates to 3D space. The model is trained on a dataset of labeled trajectories using a mean squared error loss function and the Adam optimizer, optimizing the network to minimize prediction errors. Our experimental results demonstrate that this approach achieves high accuracy in reconstructing 3D trajectories from 2D inputs. This method shows great potential for applications in sports analysis, coaching, and enhancing the accuracy of trajectory predictions in various sports. △ Less

Submitted 25 May, 2024; originally announced May 2024.

arXiv:2405.10238 [pdf, other]

Rounding Large Independent Sets on Expanders

Authors: Mitali Bafna, Jun-Ting Hsieh, Pravesh K. Kothari

Abstract: We develop a new approach for approximating large independent sets when the input graph is a one-sided spectral expander - that is, the uniform random walk matrix of the graph has its second eigenvalue bounded away from 1. Consequently, we obtain a polynomial time algorithm to find linear-sized independent sets in one-sided expanders that are almost $3$-colorable or are promised to contain an inde… ▽ More We develop a new approach for approximating large independent sets when the input graph is a one-sided spectral expander - that is, the uniform random walk matrix of the graph has its second eigenvalue bounded away from 1. Consequently, we obtain a polynomial time algorithm to find linear-sized independent sets in one-sided expanders that are almost $3$-colorable or are promised to contain an independent set of size $(1/2-ε)n$. Our second result above can be refined to require only a weaker vertex expansion property with an efficient certificate. In a surprising contrast to our algorithmic result, we observe that the analogous task of finding a linear-sized independent set in almost $4$-colorable one-sided expanders (even when the second eigenvalue is $o_n(1)$) is NP-hard, assuming the Unique Games Conjecture. All prior algorithms that beat the worst-case guarantees for this problem rely on bottom eigenspace enumeration techniques (following the classical spectral methods of Alon and Kahale) and require two-sided expansion, meaning a bounded number of negative eigenvalues of magnitude $Ω(1)$. Such techniques naturally extend to almost $k$-colorable graphs for any constant $k$, in contrast to analogous guarantees on one-sided expanders, which are Unique Games-hard to achieve for $k \geq 4$. Our rounding builds on the method of simulating multiple samples from a pseudo-distribution introduced by Bafna et. al. for rounding Unique Games instances. The key to our analysis is a new clustering property of large independent sets in expanding graphs - every large independent set has a larger-than-expected intersection with some member of a small list - and its formalization in the low-degree sum-of-squares proof system. △ Less

Submitted 5 November, 2024; v1 submitted 16 May, 2024; originally announced May 2024.

Comments: 57 pages, 3 figures

arXiv:2405.05373 [pdf, other]

Certifying Euclidean Sections and Finding Planted Sparse Vectors Beyond the $\sqrt{n}$ Dimension Threshold

Authors: Venkatesan Guruswami, Jun-Ting Hsieh, Prasad Raghavendra

Abstract: We consider the task of certifying that a random $d$-dimensional subspace $X$ in $\mathbb{R}^n$ is well-spread - every vector $x \in X$ satisfies $c\sqrt{n} \|x\|_2 \leq \|x\|_1 \leq \sqrt{n}\|x\|_2$. In a seminal work, Barak et. al. showed a polynomial-time certification algorithm when $d \leq O(\sqrt{n})$. On the other hand, when $d \gg \sqrt{n}$, the certification task is information-theoretica… ▽ More We consider the task of certifying that a random $d$-dimensional subspace $X$ in $\mathbb{R}^n$ is well-spread - every vector $x \in X$ satisfies $c\sqrt{n} \|x\|_2 \leq \|x\|_1 \leq \sqrt{n}\|x\|_2$. In a seminal work, Barak et. al. showed a polynomial-time certification algorithm when $d \leq O(\sqrt{n})$. On the other hand, when $d \gg \sqrt{n}$, the certification task is information-theoretically possible but there is evidence that it is computationally hard [MW21,Cd22], a phenomenon known as the information-computation gap. In this paper, we give subexponential-time certification algorithms in the $d \gg \sqrt{n}$ regime. Our algorithm runs in time $\exp(\widetilde{O}(n^{\varepsilon}))$ when $d \leq \widetilde{O}(n^{(1+\varepsilon)/2})$, establishing a smooth trade-off between runtime and the dimension. Our techniques naturally extend to the related planted problem, where the task is to recover a sparse vector planted in a random subspace. Our algorithm achieves the same runtime and dimension trade-off for this task. △ Less

Submitted 8 May, 2024; originally announced May 2024.

Comments: 32 pages, 2 Figures

arXiv:2404.09432 [pdf, other]

The 8th AI City Challenge

Authors: Shuo Wang, David C. Anastasiu, Zheng Tang, Ming-Ching Chang, Yue Yao, Liang Zheng, Mohammed Shaiqur Rahman, Meenakshi S. Arya, Anuj Sharma, Pranamesh Chakraborty, Sanjita Prajapati, Quan Kong, Norimasa Kobori, Munkhjargal Gochoo, Munkh-Erdene Otgonbold, Fady Alnajjar, Ganzorig Batnasan, Ping-Yang Chen, Jun-Wei Hsieh, Xunlei Wu, Sameer Satish Pusegaonkar, Yizhou Wang, Sujit Biswas, Rama Chellappa

Abstract: The eighth AI City Challenge highlighted the convergence of computer vision and artificial intelligence in areas like retail, warehouse settings, and Intelligent Traffic Systems (ITS), presenting significant research opportunities. The 2024 edition featured five tracks, attracting unprecedented interest from 726 teams in 47 countries and regions. Track 1 dealt with multi-target multi-camera (MTMC)… ▽ More The eighth AI City Challenge highlighted the convergence of computer vision and artificial intelligence in areas like retail, warehouse settings, and Intelligent Traffic Systems (ITS), presenting significant research opportunities. The 2024 edition featured five tracks, attracting unprecedented interest from 726 teams in 47 countries and regions. Track 1 dealt with multi-target multi-camera (MTMC) people tracking, highlighting significant enhancements in camera count, character number, 3D annotation, and camera matrices, alongside new rules for 3D tracking and online tracking algorithm encouragement. Track 2 introduced dense video captioning for traffic safety, focusing on pedestrian accidents using multi-camera feeds to improve insights for insurance and prevention. Track 3 required teams to classify driver actions in a naturalistic driving analysis. Track 4 explored fish-eye camera analytics using the FishEye8K dataset. Track 5 focused on motorcycle helmet rule violation detection. The challenge utilized two leaderboards to showcase methods, with participants setting new benchmarks, some surpassing existing state-of-the-art achievements. △ Less

Submitted 14 April, 2024; originally announced April 2024.

Comments: Summary of the 8th AI City Challenge Workshop in conjunction with CVPR 2024

arXiv:2403.15004 [pdf, other]

ParFormer: A Vision Transformer with Parallel Mixer and Sparse Channel Attention Patch Embedding

Authors: Novendra Setyawan, Ghufron Wahyu Kurniawan, Chi-Chia Sun, Jun-Wei Hsieh, Jing-Ming Guo, Wen-Kai Kuo

Abstract: Convolutional Neural Networks (CNNs) and Transformers have achieved remarkable success in computer vision tasks. However, their deep architectures often lead to high computational redundancy, making them less suitable for resource-constrained environments, such as edge devices. This paper introduces ParFormer, a novel vision transformer that addresses this challenge by incorporating a Parallel Mix… ▽ More Convolutional Neural Networks (CNNs) and Transformers have achieved remarkable success in computer vision tasks. However, their deep architectures often lead to high computational redundancy, making them less suitable for resource-constrained environments, such as edge devices. This paper introduces ParFormer, a novel vision transformer that addresses this challenge by incorporating a Parallel Mixer and a Sparse Channel Attention Patch Embedding (SCAPE). By combining convolutional and attention mechanisms, ParFormer improves feature extraction. This makes spatial feature extraction more efficient and cuts down on unnecessary computation. The SCAPE module further reduces computational redundancy while preserving essential feature information during down-sampling. Experimental results on the ImageNet-1K dataset show that ParFormer-T achieves 78.9\% Top-1 accuracy with a high throughput on a GPU that outperforms other small models with 2.56$\times$ higher throughput than MobileViT-S, 0.24\% faster than FasterNet-T2, and 1.79$\times$ higher than EdgeNeXt-S. For edge device deployment, ParFormer-T excels with a throughput of 278.1 images/sec, which is 1.38 $\times$ higher than EdgeNeXt-S and 2.36$\times$ higher than MobileViT-S, making it highly suitable for real-time applications in resource-constrained settings. The larger variant, ParFormer-L, reaches 83.5\% Top-1 accuracy, offering a balanced trade-off between accuracy and efficiency, surpassing many state-of-the-art models. In COCO object detection, ParFormer-M achieves 40.7 AP for object detection and 37.6 AP for instance segmentation, surpassing models like ResNet-50, PVT-S and PoolFormer-S24 with significantly higher efficiency. These results validate ParFormer as a highly efficient and scalable model for both high-performance and resource-constrained scenarios, making it an ideal solution for edge-based AI applications. △ Less

Submitted 1 October, 2024; v1 submitted 22 March, 2024; originally announced March 2024.

Comments: Under Review in IEEE Transactions on Cognitive and Developmental System

arXiv:2403.02363 [pdf, other]

Addressing Long-Tail Noisy Label Learning Problems: a Two-Stage Solution with Label Refurbishment Considering Label Rarity

Authors: Ying-Hsuan Wu, Jun-Wei Hsieh, Li Xin, Shin-You Teng, Yi-Kuan Hsieh, Ming-Ching Chang

Abstract: Real-world datasets commonly exhibit noisy labels and class imbalance, such as long-tailed distributions. While previous research addresses this issue by differentiating noisy and clean samples, reliance on information from predictions based on noisy long-tailed data introduces potential errors. To overcome the limitations of prior works, we introduce an effective two-stage approach by combining s… ▽ More Real-world datasets commonly exhibit noisy labels and class imbalance, such as long-tailed distributions. While previous research addresses this issue by differentiating noisy and clean samples, reliance on information from predictions based on noisy long-tailed data introduces potential errors. To overcome the limitations of prior works, we introduce an effective two-stage approach by combining soft-label refurbishing with multi-expert ensemble learning. In the first stage of robust soft label refurbishing, we acquire unbiased features through contrastive learning, making preliminary predictions using a classifier trained with a carefully designed BAlanced Noise-tolerant Cross-entropy (BANC) loss. In the second stage, our label refurbishment method is applied to obtain soft labels for multi-expert ensemble learning, providing a principled solution to the long-tail noisy label problem. Experiments conducted across multiple benchmarks validate the superiority of our approach, Label Refurbishment considering Label Rarity (LR^2), achieving remarkable accuracies of 94.19% and 77.05% on simulated noisy CIFAR-10 and CIFAR-100 long-tail datasets, as well as 77.74% and 81.40% on real-noise long-tail datasets, Food-101N and Animal-10N, surpassing existing state-of-the-art methods. △ Less

Submitted 4 March, 2024; originally announced March 2024.

arXiv:2401.11590 [pdf, ps, other]

Small Even Covers, Locally Decodable Codes and Restricted Subgraphs of Edge-Colored Kikuchi Graphs

Authors: Jun-Ting Hsieh, Pravesh K. Kothari, Sidhanth Mohanty, David Munhá Correia, Benny Sudakov

Abstract: Given a $k$-uniform hypergraph $H$ on $n$ vertices, an even cover in $H$ is a collection of hyperedges that touch each vertex an even number of times. Even covers are a generalization of cycles in graphs and are equivalent to linearly dependent subsets of a system of linear equations modulo $2$. As a result, they arise naturally in the context of well-studied questions in coding theory and refutin… ▽ More Given a $k$-uniform hypergraph $H$ on $n$ vertices, an even cover in $H$ is a collection of hyperedges that touch each vertex an even number of times. Even covers are a generalization of cycles in graphs and are equivalent to linearly dependent subsets of a system of linear equations modulo $2$. As a result, they arise naturally in the context of well-studied questions in coding theory and refuting unsatisfiable $k$-SAT formulas. Analogous to the irregular Moore bound of Alon, Hoory, and Linial (2002), in 2008, Feige conjectured an extremal trade-off between the number of hyperedges and the length of the smallest even cover in a $k$-uniform hypergraph. This conjecture was recently settled up to a multiplicative logarithmic factor in the number of hyperedges (Guruswami, Kothari, and Manohar 2022 and Hsieh, Kothari, and Mohanty 2023). These works introduce the new technique that relates hypergraph even covers to cycles in the associated Kikuchi graphs. Their analysis of these Kikuchi graphs, especially for odd $k$, is rather involved and relies on matrix concentration inequalities. In this work, we give a simple and purely combinatorial argument that recovers the best-known bound for Feige's conjecture for even $k$. We also introduce a novel variant of a Kikuchi graph which together with this argument improves the logarithmic factor in the best-known bounds for odd $k$. As an application of our ideas, we also give a purely combinatorial proof of the improved lower bounds (Alrabiah, Guruswami, Kothari and Manohar, 2023) on 3-query binary linear locally decodable codes. △ Less

Submitted 25 November, 2024; v1 submitted 21 January, 2024; originally announced January 2024.

Comments: 19 pages

arXiv:2312.16771 [pdf, other]

Scale-Aware Crowd Count Network with Annotation Error Correction

Authors: Yi-Kuan Hsieh, Jun-Wei Hsieh, Yu-Chee Tseng, Ming-Ching Chang, Li Xin

Abstract: Traditional crowd counting networks suffer from information loss when feature maps are downsized through pooling layers, leading to inaccuracies in counting crowds at a distance. Existing methods often assume correct annotations during training, disregarding the impact of noisy annotations, especially in crowded scenes. Furthermore, the use of a fixed Gaussian kernel fails to account for the varyi… ▽ More Traditional crowd counting networks suffer from information loss when feature maps are downsized through pooling layers, leading to inaccuracies in counting crowds at a distance. Existing methods often assume correct annotations during training, disregarding the impact of noisy annotations, especially in crowded scenes. Furthermore, the use of a fixed Gaussian kernel fails to account for the varying pixel distribution with respect to the camera distance. To overcome these challenges, we propose a Scale-Aware Crowd Counting Network (SACC-Net) that introduces a ``scale-aware'' architecture with error-correcting capabilities of noisy annotations. For the first time, we {\bf simultaneously} model labeling errors (mean) and scale variations (variance) by spatially-varying Gaussian distributions to produce fine-grained heat maps for crowd counting. Furthermore, the proposed adaptive Gaussian kernel variance enables the model to learn dynamically with a low-rank approximation, leading to improved convergence efficiency with comparable accuracy. The performance of SACC-Net is extensively evaluated on four public datasets: UCF-QNRF, UCF CC 50, NWPU, and ShanghaiTech A-B. Experimental results demonstrate that SACC-Net outperforms all state-of-the-art methods, validating its effectiveness in achieving superior crowd counting accuracy. △ Less

Submitted 27 December, 2023; originally announced December 2023.

Comments: 7 pages, 6 figues. arXiv admin note: text overlap with arXiv:2211.06835

arXiv:2311.09354 [pdf]

doi 10.1063/5.0189222

Nondestructive, quantitative viability analysis of 3D tissue cultures using machine learning image segmentation

Authors: Kylie J. Trettner, Jeremy Hsieh, Weikun Xiao, Jerry S. H. Lee, Andrea M. Armani

Abstract: Ascertaining the collective viability of cells in different cell culture conditions has typically relied on averaging colorimetric indicators and is often reported out in simple binary readouts. Recent research has combined viability assessment techniques with image-based deep-learning models to automate the characterization of cellular properties. However, further development of viability measure… ▽ More Ascertaining the collective viability of cells in different cell culture conditions has typically relied on averaging colorimetric indicators and is often reported out in simple binary readouts. Recent research has combined viability assessment techniques with image-based deep-learning models to automate the characterization of cellular properties. However, further development of viability measurements to assess the continuity of possible cellular states and responses to perturbation across cell culture conditions is needed. In this work, we demonstrate an image processing algorithm for quantifying cellular viability in 3D cultures without the need for assay-based indicators. We show that our algorithm performs similarly to a pair of human experts in whole-well images over a range of days and culture matrix compositions. To demonstrate potential utility, we perform a longitudinal study investigating the impact of a known therapeutic on pancreatic cancer spheroids. Using images taken with a high content imaging system, the algorithm successfully tracks viability at the individual spheroid and whole-well level. The method we propose reduces analysis time by 97% in comparison to the experts. Because the method is independent of the microscope or imaging system used, this approach lays the foundation for accelerating progress in and for improving the robustness and reproducibility of 3D culture analysis across biological and clinical research. △ Less

Submitted 11 March, 2024; v1 submitted 15 November, 2023; originally announced November 2023.

Comments: 52 total pages, Main text and SI included, 35 figures (5 main text, 30 supplemental), 9 tables, 6 datasets (provided on linked GitHub), linked image files on Zenodo

arXiv:2310.00393 [pdf, ps, other]

New SDP Roundings and Certifiable Approximation for Cubic Optimization

Authors: Jun-Ting Hsieh, Pravesh K. Kothari, Lucas Pesenti, Luca Trevisan

Abstract: We give new rounding schemes for SDP relaxations for the problems of maximizing cubic polynomials over the unit sphere and the $n$-dimensional hypercube. In both cases, the resulting algorithms yield a $O(\sqrt{n/k})$ multiplicative approximation in $2^{O(k)} \text{poly}(n)$ time. In particular, we obtain a $O(\sqrt{n/\log n})$ approximation in polynomial time. For the unit sphere, this improves o… ▽ More We give new rounding schemes for SDP relaxations for the problems of maximizing cubic polynomials over the unit sphere and the $n$-dimensional hypercube. In both cases, the resulting algorithms yield a $O(\sqrt{n/k})$ multiplicative approximation in $2^{O(k)} \text{poly}(n)$ time. In particular, we obtain a $O(\sqrt{n/\log n})$ approximation in polynomial time. For the unit sphere, this improves on the rounding algorithms of Bhattiprolu et. al. [BGG+17] that need quasi-polynomial time to obtain a similar approximation guarantee. Over the $n$-dimensional hypercube, our results match the guarantee of a search algorithm of Khot and Naor [KN08] that obtains a similar approximation ratio via techniques from convex geometry. Unlike their method, our algorithm obtains an upper bound on the integrality gap of SDP relaxations for the problem and as a result, also yields a certificate on the optimum value of the input instance. Our results naturally generalize to homogeneous polynomials of higher degree and imply improved algorithms for approximating satisfiable instances of Max-3SAT. Our main motivation is the stark lack of rounding techniques for SDP relaxations of higher degree polynomial optimization in sharp contrast to a rich theory of SDP roundings for the quadratic case. Our rounding algorithms introduce two new ideas: 1) a new polynomial reweighting based method to round sum-of-squares relaxations of higher degree polynomial maximization problems, and 2) a general technique to compress such relaxations down to substantially smaller SDPs by relying on an explicit construction of certain hitting sets. We hope that our work will inspire improved rounding algorithms for polynomial optimization and related problems. △ Less

Submitted 30 September, 2023; originally announced October 2023.

arXiv:2309.16897 [pdf, other]

Efficient Algorithms for Semirandom Planted CSPs at the Refutation Threshold

Authors: Venkatesan Guruswami, Jun-Ting Hsieh, Pravesh K. Kothari, Peter Manohar

Abstract: We present an efficient algorithm to solve semirandom planted instances of any Boolean constraint satisfaction problem (CSP). The semirandom model is a hybrid between worst-case and average-case input models, where the input is generated by (1) choosing an arbitrary planted assignment $x^*$, (2) choosing an arbitrary clause structure, and (3) choosing literal negations for each clause from an arbi… ▽ More We present an efficient algorithm to solve semirandom planted instances of any Boolean constraint satisfaction problem (CSP). The semirandom model is a hybrid between worst-case and average-case input models, where the input is generated by (1) choosing an arbitrary planted assignment $x^*$, (2) choosing an arbitrary clause structure, and (3) choosing literal negations for each clause from an arbitrary distribution "shifted by $x^*$" so that $x^*$ satisfies each constraint. For an $n$ variable semirandom planted instance of a $k$-arity CSP, our algorithm runs in polynomial time and outputs an assignment that satisfies all but a $o(1)$-fraction of constraints, provided that the instance has at least $\tilde{O}(n^{k/2})$ constraints. This matches, up to $polylog(n)$ factors, the clause threshold for algorithms that solve fully random planted CSPs [FPV15], as well as algorithms that refute random and semirandom CSPs [AOW15, AGK21]. Our result shows that despite having worst-case clause structure, the randomness in the literal patterns makes semirandom planted CSPs significantly easier than worst-case, where analogous results require $O(n^k)$ constraints [AKK95, FLP16]. Perhaps surprisingly, our algorithm follows a significantly different conceptual framework when compared to the recent resolution of semirandom CSP refutation. This turns out to be inherent and, at a technical level, can be attributed to the need for relative spectral approximation of certain random matrices - reminiscent of the classical spectral sparsification - which ensures that an SDP can certify the uniqueness of the planted assignment. In contrast, in the refutation setting, it suffices to obtain a weaker guarantee of absolute upper bounds on the spectral norm of related matrices. △ Less

Submitted 28 September, 2023; originally announced September 2023.

Comments: FOCS 2023

arXiv:2308.12817 [pdf, other]

MixNet: Toward Accurate Detection of Challenging Scene Text in the Wild

Authors: Yu-Xiang Zeng, Jun-Wei Hsieh, Xin Li, Ming-Ching Chang

Abstract: Detecting small scene text instances in the wild is particularly challenging, where the influence of irregular positions and nonideal lighting often leads to detection errors. We present MixNet, a hybrid architecture that combines the strengths of CNNs and Transformers, capable of accurately detecting small text from challenging natural scenes, regardless of the orientations, styles, and lighting… ▽ More Detecting small scene text instances in the wild is particularly challenging, where the influence of irregular positions and nonideal lighting often leads to detection errors. We present MixNet, a hybrid architecture that combines the strengths of CNNs and Transformers, capable of accurately detecting small text from challenging natural scenes, regardless of the orientations, styles, and lighting conditions. MixNet incorporates two key modules: (1) the Feature Shuffle Network (FSNet) to serve as the backbone and (2) the Central Transformer Block (CTBlock) to exploit the 1D manifold constraint of the scene text. We first introduce a novel feature shuffling strategy in FSNet to facilitate the exchange of features across multiple scales, generating high-resolution features superior to popular ResNet and HRNet. The FSNet backbone has achieved significant improvements over many existing text detection methods, including PAN, DB, and FAST. Then we design a complementary CTBlock to leverage center line based features similar to the medial axis of text regions and show that it can outperform contour-based approaches in challenging cases when small scene texts appear closely. Extensive experimental results show that MixNet, which mixes FSNet with CTBlock, achieves state-of-the-art results on multiple scene text detection datasets. △ Less

Submitted 27 August, 2023; v1 submitted 23 August, 2023; originally announced August 2023.

arXiv:2308.07427 [pdf, other]

doi 10.1145/3610092

Nip it in the Bud: Moderation Strategies in Open Source Software Projects and the Role of Bots

Authors: Jane Hsieh, Joselyn Kim, Laura Dabbish, Haiyi Zhu

Abstract: Much of our modern digital infrastructure relies critically upon open sourced software. The communities responsible for building this cyberinfrastructure require maintenance and moderation, which is often supported by volunteer efforts. Moderation, as a non-technical form of labor, is a necessary but often overlooked task that maintainers undertake to sustain the community around an OSS project. T… ▽ More Much of our modern digital infrastructure relies critically upon open sourced software. The communities responsible for building this cyberinfrastructure require maintenance and moderation, which is often supported by volunteer efforts. Moderation, as a non-technical form of labor, is a necessary but often overlooked task that maintainers undertake to sustain the community around an OSS project. This study examines the various structures and norms that support community moderation, describes the strategies moderators use to mitigate conflicts, and assesses how bots can play a role in assisting these processes. We interviewed 14 practitioners to uncover existing moderation practices and ways that automation can provide assistance. Our main contributions include a characterization of moderated content in OSS projects, moderation techniques, as well as perceptions of and recommendations for improving the automation of moderation tasks. We hope that these findings will inform the implementation of more effective moderation practices in open source communities. △ Less

Submitted 14 August, 2023; originally announced August 2023.

arXiv:2307.05954 [pdf, other]

Ellipsoid Fitting Up to a Constant

Authors: Jun-Ting Hsieh, Pravesh K. Kothari, Aaron Potechin, Jeff Xu

Abstract: In [Sau11,SPW13], Saunderson, Parrilo and Willsky asked the following elegant geometric question: what is the largest $m= m(d)$ such that there is an ellipsoid in $\mathbb{R}^d$ that passes through $v_1, v_2, \ldots, v_m$ with high probability when the $v_i$s are chosen independently from the standard Gaussian distribution $N(0,I_{d})$. The existence of such an ellipsoid is equivalent to the exist… ▽ More In [Sau11,SPW13], Saunderson, Parrilo and Willsky asked the following elegant geometric question: what is the largest $m= m(d)$ such that there is an ellipsoid in $\mathbb{R}^d$ that passes through $v_1, v_2, \ldots, v_m$ with high probability when the $v_i$s are chosen independently from the standard Gaussian distribution $N(0,I_{d})$. The existence of such an ellipsoid is equivalent to the existence of a positive semidefinite matrix $X$ such that $v_i^{\top}X v_i =1$ for every $1 \leq i \leq m$ - a natural example of a random semidefinite program. SPW conjectured that $m= (1-o(1)) d^2/4$ with high probability. Very recently, Potechin, Turner, Venkat and Wein and Kane and Diakonikolas proved that $m \geq d^2/\log^{O(1)}(d)$ via certain explicit constructions. In this work, we give a substantially tighter analysis of their construction to prove that $m \geq d^2/C$ for an absolute constant $C>0$. This resolves one direction of the SPW conjecture up to a constant. Our analysis proceeds via the method of Graphical Matrix Decomposition that has recently been used to analyze correlated random matrices arising in various areas [BHK+19]. Our key new technical tool is a refined method to prove singular value upper bounds on certain correlated random matrices that are tight up to absolute dimension-independent constants. In contrast, all previous methods that analyze such matrices lose logarithmic factors in the dimension. △ Less

Submitted 12 July, 2023; originally announced July 2023.

Comments: ICALP 2023

arXiv:2306.12972 [pdf, ps, other]

doi 10.1145/3596671.3598576

Designing Individualized Policy and Technology Interventions to Improve Gig Work Conditions

Authors: Jane Hsieh, Oluwatobi Adisa, Sachi Bafna, Haiyi Zhu

Abstract: The gig economy is characterized by short-term contract work completed by independent workers who are paid to perform "gigs", and who have control over when, whether and how they conduct work. Gig economy platforms (e.g., Uber, Lyft, Instacart) offer workers increased job opportunities, lower barriers to entry, and improved flexibility. However, growing evidence suggests that worker well-being and… ▽ More The gig economy is characterized by short-term contract work completed by independent workers who are paid to perform "gigs", and who have control over when, whether and how they conduct work. Gig economy platforms (e.g., Uber, Lyft, Instacart) offer workers increased job opportunities, lower barriers to entry, and improved flexibility. However, growing evidence suggests that worker well-being and gig work conditions have become significant societal issues. In designing public-facing policies and technologies for improving gig work conditions, inherent tradeoffs exist between offering individual flexibility and when attempting to meet all community needs. In platform-based gig work, contractors pursue the flexibility of short-term tasks, but policymakers resist segmenting the population when designing policies to support their work. As platforms offer an ever-increasing variety of services, we argue that policymakers and platform designers must provide more targeted and personalized policies, benefits, and protections for platform-based workers, so that they can lead more successful and sustainable gig work careers. We present in this paper relevant legal and scholarly evidence from the United States to support this position, and make recommendations for future innovations in policy and technology. △ Less

Submitted 22 June, 2023; originally announced June 2023.

arXiv:2306.09662 [pdf, other]

Cooperative Multi-Objective Reinforcement Learning for Traffic Signal Control and Carbon Emission Reduction

Authors: Cheng Ruei Tang, Jun Wei Hsieh, Shin You Teng

Abstract: Existing traffic signal control systems rely on oversimplified rule-based methods, and even RL-based methods are often suboptimal and unstable. To address this, we propose a cooperative multi-objective architecture called Multi-Objective Multi-Agent Deep Deterministic Policy Gradient (MOMA-DDPG), which estimates multiple reward terms for traffic signal control optimization using age-decaying weigh… ▽ More Existing traffic signal control systems rely on oversimplified rule-based methods, and even RL-based methods are often suboptimal and unstable. To address this, we propose a cooperative multi-objective architecture called Multi-Objective Multi-Agent Deep Deterministic Policy Gradient (MOMA-DDPG), which estimates multiple reward terms for traffic signal control optimization using age-decaying weights. Our approach involves two types of agents: one focuses on optimizing local traffic at each intersection, while the other aims to optimize global traffic throughput. We evaluate our method using real-world traffic data collected from an Asian country's traffic cameras. Despite the inclusion of a global agent, our solution remains decentralized as this agent is no longer necessary during the inference stage. Our results demonstrate the effectiveness of MOMA-DDPG, outperforming state-of-the-art methods across all performance metrics. Additionally, our proposed system minimizes both waiting time and carbon emissions. Notably, this paper is the first to link carbon emissions and global agents in traffic signal control. △ Less

Submitted 16 July, 2023; v1 submitted 16 June, 2023; originally announced June 2023.

Comments: arXiv admin note: substantial text overlap with arXiv:2205.11291

arXiv:2305.19170 [pdf]

Forward-Forward Training of an Optical Neural Network

Authors: Ilker Oguz, Junjie Ke, Qifei Wang, Feng Yang, Mustafa Yildirim, Niyazi Ulas Dinc, Jih-Liang Hsieh, Christophe Moser, Demetri Psaltis

Abstract: Neural networks (NN) have demonstrated remarkable capabilities in various tasks, but their computation-intensive nature demands faster and more energy-efficient hardware implementations. Optics-based platforms, using technologies such as silicon photonics and spatial light modulators, offer promising avenues for achieving this goal. However, training multiple trainable layers in tandem with these… ▽ More Neural networks (NN) have demonstrated remarkable capabilities in various tasks, but their computation-intensive nature demands faster and more energy-efficient hardware implementations. Optics-based platforms, using technologies such as silicon photonics and spatial light modulators, offer promising avenues for achieving this goal. However, training multiple trainable layers in tandem with these physical systems poses challenges, as they are difficult to fully characterize and describe with differentiable functions, hindering the use of error backpropagation algorithm. The recently introduced Forward-Forward Algorithm (FFA) eliminates the need for perfect characterization of the learning system and shows promise for efficient training with large numbers of programmable parameters. The FFA does not require backpropagating an error signal to update the weights, rather the weights are updated by only sending information in one direction. The local loss function for each set of trainable weights enables low-power analog hardware implementations without resorting to metaheuristic algorithms or reinforcement learning. In this paper, we present an experiment utilizing multimode nonlinear wave propagation in an optical fiber demonstrating the feasibility of the FFA approach using an optical system. The results show that incorporating optical transforms in multilayer NN architectures trained with the FFA, can lead to performance improvements, even with a relatively small number of trainable weights. The proposed method offers a new path to the challenge of training optical NNs and provides insights into leveraging physical transformations for enhancing NN performance. △ Less

Submitted 10 August, 2023; v1 submitted 30 May, 2023; originally announced May 2023.

arXiv:2305.17449 [pdf, ps, other]

FishEye8K: A Benchmark and Dataset for Fisheye Camera Object Detection

Authors: Munkhjargal Gochoo, Munkh-Erdene Otgonbold, Erkhembayar Ganbold, Jun-Wei Hsieh, Ming-Ching Chang, Ping-Yang Chen, Byambaa Dorj, Hamad Al Jassmi, Ganzorig Batnasan, Fady Alnajjar, Mohammed Abduljabbar, Fang-Pang Lin

Abstract: With the advance of AI, road object detection has been a prominent topic in computer vision, mostly using perspective cameras. Fisheye lens provides omnidirectional wide coverage for using fewer cameras to monitor road intersections, however with view distortions. To our knowledge, there is no existing open dataset prepared for traffic surveillance on fisheye cameras. This paper introduces an open… ▽ More With the advance of AI, road object detection has been a prominent topic in computer vision, mostly using perspective cameras. Fisheye lens provides omnidirectional wide coverage for using fewer cameras to monitor road intersections, however with view distortions. To our knowledge, there is no existing open dataset prepared for traffic surveillance on fisheye cameras. This paper introduces an open FishEye8K benchmark dataset for road object detection tasks, which comprises 157K bounding boxes across five classes (Pedestrian, Bike, Car, Bus, and Truck). In addition, we present benchmark results of State-of-The-Art (SoTA) models, including variations of YOLOv5, YOLOR, YOLO7, and YOLOv8. The dataset comprises 8,000 images recorded in 22 videos using 18 fisheye cameras for traffic monitoring in Hsinchu, Taiwan, at resolutions of 1080$\times$1080 and 1280$\times$1280. The data annotation and validation process were arduous and time-consuming, due to the ultra-wide panoramic and hemispherical fisheye camera images with large distortion and numerous road participants, particularly people riding scooters. To avoid bias, frames from a particular camera were assigned to either the training or test sets, maintaining a ratio of about 70:30 for both the number of images and bounding boxes in each class. Experimental results show that YOLOv8 and YOLOR outperform on input sizes 640$\times$640 and 1280$\times$1280, respectively. The dataset will be available on GitHub with PASCAL VOC, MS COCO, and YOLO annotation formats. The FishEye8K benchmark will provide significant contributions to the fisheye video analytics and smart city applications. △ Less

Submitted 6 June, 2023; v1 submitted 27 May, 2023; originally announced May 2023.

Comments: CVPR Workshops 2023

arXiv:2305.04206 [pdf, other]

RATs-NAS: Redirection of Adjacent Trails on GCN for Neural Architecture Search

Authors: Yu-Ming Zhang, Jun-Wei Hsieh, Chun-Chieh Lee, Kuo-Chin Fan

Abstract: Various hand-designed CNN architectures have been developed, such as VGG, ResNet, DenseNet, etc., and achieve State-of-the-Art (SoTA) levels on different tasks. Neural Architecture Search (NAS) now focuses on automatically finding the best CNN architecture to handle the above tasks. However, the verification of a searched architecture is very time-consuming and makes predictor-based methods become… ▽ More Various hand-designed CNN architectures have been developed, such as VGG, ResNet, DenseNet, etc., and achieve State-of-the-Art (SoTA) levels on different tasks. Neural Architecture Search (NAS) now focuses on automatically finding the best CNN architecture to handle the above tasks. However, the verification of a searched architecture is very time-consuming and makes predictor-based methods become an essential and important branch of NAS. Two commonly used techniques to build predictors are graph-convolution networks (GCN) and multilayer perceptron (MLP). In this paper, we consider the difference between GCN and MLP on adjacent operation trails and then propose the Redirected Adjacent Trails NAS (RATs-NAS) to quickly search for the desired neural network architecture. The RATs-NAS consists of two components: the Redirected Adjacent Trails GCN (RATs-GCN) and the Predictor-based Search Space Sampling (P3S) module. RATs-GCN can change trails and their strengths to search for a better neural network architecture. P3S can rapidly focus on tighter intervals of FLOPs in the search space. Based on our observations on cell-based NAS, we believe that architectures with similar FLOPs will perform similarly. Finally, the RATs-NAS consisting of RATs-GCN and P3S beats WeakNAS, Arch-Graph, and others by a significant margin on three sub-datasets of NASBench-201. △ Less

Submitted 8 May, 2023; v1 submitted 7 May, 2023; originally announced May 2023.

arXiv:2302.13436 [pdf, other]

doi 10.1145/3563657.3595982

Navigating Multi-Stakeholder Incentives and Preferences: Co-Designing Alternatives for the Future of Gig Worker Well-Being

Authors: Jane Hsieh, Miranda Karger, Lucas Zagal, Haiyi Zhu

Abstract: Gig workers, and the products and services they provide, play an increasingly ubiquitous role in our daily lives. But despite growing evidence suggesting that worker well-being in gig economy platforms have become significant societal problems, few studies have investigated possible solutions. We take a stride in this direction by engaging workers, platform employees, and local regulators in a ser… ▽ More Gig workers, and the products and services they provide, play an increasingly ubiquitous role in our daily lives. But despite growing evidence suggesting that worker well-being in gig economy platforms have become significant societal problems, few studies have investigated possible solutions. We take a stride in this direction by engaging workers, platform employees, and local regulators in a series of speed dating workshops using storyboards based on real-life situations to rapidly elicit stakeholder preferences for addressing financial, physical, and social issues related to worker well-being. Our results reveal that existing public and platformic infrastructures fall short in providing workers with resources needed to perform gigs, surfacing a need for multi-platform collaborations, technological innovations, as well as changes in regulations, labor laws, and the public's perception of gig workers, among others. Drawing from multi-stakeholder findings, we discuss these implications for technology, policy, and service as well as avenues for collaboration. △ Less

Submitted 5 June, 2023; v1 submitted 26 February, 2023; originally announced February 2023.

arXiv:2302.01212 [pdf, other]

Explicit two-sided unique-neighbor expanders

Authors: Jun-Ting Hsieh, Theo McKenzie, Sidhanth Mohanty, Pedro Paredes

Abstract: We study the problem of constructing explicit sparse graphs that exhibit strong vertex expansion. Our main result is the first two-sided construction of imbalanced unique-neighbor expanders, meaning bipartite graphs where small sets contained in both the left and right bipartitions exhibit unique-neighbor expansion, along with algebraic properties relevant to constructing quantum codes. Our cons… ▽ More We study the problem of constructing explicit sparse graphs that exhibit strong vertex expansion. Our main result is the first two-sided construction of imbalanced unique-neighbor expanders, meaning bipartite graphs where small sets contained in both the left and right bipartitions exhibit unique-neighbor expansion, along with algebraic properties relevant to constructing quantum codes. Our constructions are obtained from instantiations of the tripartite line product of a large tripartite spectral expander and a sufficiently good constant-sized unique-neighbor expander, a new graph product we defined that generalizes the line product in the work of Alon and Capalbo and the routed product in the work of Asherov and Dinur. To analyze the vertex expansion of graphs arising from the tripartite line product, we develop a sharp characterization of subgraphs that can arise in bipartite spectral expanders, generalizing results of Kahale, which may be of independent interest. By picking appropriate graphs to apply our product to, we give a strongly explicit construction of an infinite family of $(d_1,d_2)$-biregular graphs $(G_n)_{n\ge 1}$ (for large enough $d_1$ and $d_2$) where all sets $S$ with fewer than a small constant fraction of vertices have $Ω(d_1\cdot |S|)$ unique-neighbors (assuming $d_1 \leq d_2$). Additionally, we can also guarantee that subsets of vertices of size up to $\exp(Ω(\sqrt{\log |V(G_n)|}))$ expand losslessly. △ Less

Submitted 15 January, 2024; v1 submitted 2 February, 2023; originally announced February 2023.

Comments: New version contains stronger result, and many new technical ingredients. 45 pages, 2 figures

MSC Class: 05C48 ACM Class: G.2.1; G.2.2

arXiv:2212.01287 [pdf, other]

SARAS-Net: Scale and Relation Aware Siamese Network for Change Detection

Authors: Chao-Peng Chen, Jun-Wei Hsieh, Ping-Yang Chen, Yi-Kuan Hsieh, Bor-Shiun Wang

Abstract: Change detection (CD) aims to find the difference between two images at different times and outputs a change map to represent whether the region has changed or not. To achieve a better result in generating the change map, many State-of-The-Art (SoTA) methods design a deep learning model that has a powerful discriminative ability. However, these methods still get lower performance because they igno… ▽ More Change detection (CD) aims to find the difference between two images at different times and outputs a change map to represent whether the region has changed or not. To achieve a better result in generating the change map, many State-of-The-Art (SoTA) methods design a deep learning model that has a powerful discriminative ability. However, these methods still get lower performance because they ignore spatial information and scaling changes between objects, giving rise to blurry or wrong boundaries. In addition to these, they also neglect the interactive information of two different images. To alleviate these problems, we propose our network, the Scale and Relation-Aware Siamese Network (SARAS-Net) to deal with this issue. In this paper, three modules are proposed that include relation-aware, scale-aware, and cross-transformer to tackle the problem of scene change detection more effectively. To verify our model, we tested three public datasets, including LEVIR-CD, WHU-CD, and DSFIN, and obtained SoTA accuracy. Our code is available at https://github.com/f64051041/SARAS-Net. △ Less

Submitted 2 December, 2022; originally announced December 2022.

arXiv:2211.08824 [pdf, other]

SMILEtrack: SiMIlarity LEarning for Occlusion-Aware Multiple Object Tracking

Authors: Yu-Hsiang Wang, Jun-Wei Hsieh, Ping-Yang Chen, Ming-Ching Chang, Hung Hin So, Xin Li

Abstract: Despite recent progress in Multiple Object Tracking (MOT), several obstacles such as occlusions, similar objects, and complex scenes remain an open challenge. Meanwhile, a systematic study of the cost-performance tradeoff for the popular tracking-by-detection paradigm is still lacking. This paper introduces SMILEtrack, an innovative object tracker that effectively addresses these challenges by int… ▽ More Despite recent progress in Multiple Object Tracking (MOT), several obstacles such as occlusions, similar objects, and complex scenes remain an open challenge. Meanwhile, a systematic study of the cost-performance tradeoff for the popular tracking-by-detection paradigm is still lacking. This paper introduces SMILEtrack, an innovative object tracker that effectively addresses these challenges by integrating an efficient object detector with a Siamese network-based Similarity Learning Module (SLM). The technical contributions of SMILETrack are twofold. First, we propose an SLM that calculates the appearance similarity between two objects, overcoming the limitations of feature descriptors in Separate Detection and Embedding (SDE) models. The SLM incorporates a Patch Self-Attention (PSA) block inspired by the vision Transformer, which generates reliable features for accurate similarity matching. Second, we develop a Similarity Matching Cascade (SMC) module with a novel GATE function for robust object matching across consecutive video frames, further enhancing MOT performance. Together, these innovations help SMILETrack achieve an improved trade-off between the cost ({\em e.g.}, running speed) and performance (e.g., tracking accuracy) over several existing state-of-the-art benchmarks, including the popular BYTETrack method. SMILETrack outperforms BYTETrack by 0.4-0.8 MOTA and 2.1-2.2 HOTA points on MOT17 and MOT20 datasets. Code is available at https://github.com/pingyang1117/SMILEtrack_Official △ Less

Submitted 22 January, 2024; v1 submitted 16 November, 2022; originally announced November 2022.

Comments: Our paper was accepted by AAAI2024

arXiv:2211.06835 [pdf, other]

Scale-Aware Crowd Counting Using a Joint Likelihood Density Map and Synthetic Fusion Pyramid Network

Authors: Yi-Kuan Hsieh, Jun-Wei Hsieh, Yu-Chee Tseng, Ming-Ching Chang, Bor-Shiun Wang

Abstract: We develop a Synthetic Fusion Pyramid Network (SPF-Net) with a scale-aware loss function design for accurate crowd counting. Existing crowd-counting methods assume that the training annotation points were accurate and thus ignore the fact that noisy annotations can lead to large model-learning bias and counting error, especially for counting highly dense crowds that appear far away. To the best of… ▽ More We develop a Synthetic Fusion Pyramid Network (SPF-Net) with a scale-aware loss function design for accurate crowd counting. Existing crowd-counting methods assume that the training annotation points were accurate and thus ignore the fact that noisy annotations can lead to large model-learning bias and counting error, especially for counting highly dense crowds that appear far away. To the best of our knowledge, this work is the first to properly handle such noise at multiple scales in end-to-end loss design and thus push the crowd counting state-of-the-art. We model the noise of crowd annotation points as a Gaussian and derive the crowd probability density map from the input image. We then approximate the joint distribution of crowd density maps with the full covariance of multiple scales and derive a low-rank approximation for tractability and efficient implementation. The derived scale-aware loss function is used to train the SPF-Net. We show that it outperforms various loss functions on four public datasets: UCF-QNRF, UCF CC 50, NWPU and ShanghaiTech A-B datasets. The proposed SPF-Net can accurately predict the locations of people in the crowd, despite training on noisy training annotations. △ Less

Submitted 2 January, 2023; v1 submitted 13 November, 2022; originally announced November 2022.

Comments: 8 pages, 8 figures, 4 tables

arXiv:2210.11173 [pdf, other]

Mathematical Justification of Hard Negative Mining via Isometric Approximation Theorem

Authors: Albert Xu, Jhih-Yi Hsieh, Bhaskar Vundurthy, Eliana Cohen, Howie Choset, Lu Li

Abstract: In deep metric learning, the Triplet Loss has emerged as a popular method to learn many computer vision and natural language processing tasks such as facial recognition, object detection, and visual-semantic embeddings. One issue that plagues the Triplet Loss is network collapse, an undesirable phenomenon where the network projects the embeddings of all data onto a single point. Researchers predom… ▽ More In deep metric learning, the Triplet Loss has emerged as a popular method to learn many computer vision and natural language processing tasks such as facial recognition, object detection, and visual-semantic embeddings. One issue that plagues the Triplet Loss is network collapse, an undesirable phenomenon where the network projects the embeddings of all data onto a single point. Researchers predominately solve this problem by using triplet mining strategies. While hard negative mining is the most effective of these strategies, existing formulations lack strong theoretical justification for their empirical success. In this paper, we utilize the mathematical theory of isometric approximation to show an equivalence between the Triplet Loss sampled by hard negative mining and an optimization problem that minimizes a Hausdorff-like distance between the neural network and its ideal counterpart function. This provides the theoretical justifications for hard negative mining's empirical efficacy. In addition, our novel application of the isometric approximation theorem provides the groundwork for future forms of hard negative mining that avoid network collapse. Our theory can also be extended to analyze other Euclidean space-based metric learning methods like Ladder Loss or Contrastive Learning. △ Less

Submitted 20 October, 2022; originally announced October 2022.

Comments: 9 pages, 6 figures, submitted to AAAI 2023

arXiv:2210.00698 [pdf, other]

NAS-based Recursive Stage Partial Network (RSPNet) for Light-Weight Semantic Segmentation

Authors: Yi-Chun Wang, Jun-Wei Hsieh, Ming-Ching Chang

Abstract: Current NAS-based semantic segmentation methods focus on accuracy improvements rather than light-weight design. In this paper, we proposed a two-stage framework to design our NAS-based RSPNet model for light-weight semantic segmentation. The first architecture search determines the inner cell structure, and the second architecture search considers exponentially growing paths to finalize the outer… ▽ More Current NAS-based semantic segmentation methods focus on accuracy improvements rather than light-weight design. In this paper, we proposed a two-stage framework to design our NAS-based RSPNet model for light-weight semantic segmentation. The first architecture search determines the inner cell structure, and the second architecture search considers exponentially growing paths to finalize the outer structure of the network. It was shown in the literature that the fusion of high- and low-resolution feature maps produces stronger representations. To find the expected macro structure without manual design, we adopt a new path-attention mechanism to efficiently search for suitable paths to fuse useful information for better segmentation. Our search for repeatable micro-structures from cells leads to a superior network architecture in semantic segmentation. In addition, we propose an RSP (recursive Stage Partial) architecture to search a light-weight design for NAS-based semantic segmentation. The proposed architecture is very efficient, simple, and effective that both the macro- and micro- structure searches can be completed in five days of computation on two V100 GPUs. The light-weight NAS architecture with only 1/4 parameter size of SoTA architectures can achieve SoTA performance on semantic segmentation on the Cityscapes dataset without using any backbones. △ Less

Submitted 2 October, 2022; originally announced October 2022.

arXiv:2210.00546 [pdf, other]

Siamese-NAS: Using Trained Samples Efficiently to Find Lightweight Neural Architecture by Prior Knowledge

Authors: Yu-Ming Zhang, Jun-Wei Hsieh, Chun-Chieh Lee, Kuo-Chin Fan

Abstract: In the past decade, many architectures of convolution neural networks were designed by handcraft, such as Vgg16, ResNet, DenseNet, etc. They all achieve state-of-the-art level on different tasks in their time. However, it still relies on human intuition and experience, and it also takes so much time consumption for trial and error. Neural Architecture Search (NAS) focused on this issue. In recent… ▽ More In the past decade, many architectures of convolution neural networks were designed by handcraft, such as Vgg16, ResNet, DenseNet, etc. They all achieve state-of-the-art level on different tasks in their time. However, it still relies on human intuition and experience, and it also takes so much time consumption for trial and error. Neural Architecture Search (NAS) focused on this issue. In recent works, the Neural Predictor has significantly improved with few training architectures as training samples. However, the sampling efficiency is already considerable. In this paper, our proposed Siamese-Predictor is inspired by past works of predictor-based NAS. It is constructed with the proposed Estimation Code, which is the prior knowledge about the training procedure. The proposed Siamese-Predictor gets significant benefits from this idea. This idea causes it to surpass the current SOTA predictor on NASBench-201. In order to explore the impact of the Estimation Code, we analyze the relationship between it and accuracy. We also propose the search space Tiny-NanoBench for lightweight CNN architecture. This well-designed search space is easier to find better architecture with few FLOPs than NASBench-201. In summary, the proposed Siamese-Predictor is a predictor-based NAS. It achieves the SOTA level, especially with limited computation budgets. It applied to the proposed Tiny-NanoBench can just use a few trained samples to find extremely lightweight CNN architecture. △ Less

Submitted 2 October, 2022; originally announced October 2022.

arXiv:2209.01332

Class-Specific Channel Attention for Few-Shot Learning

Authors: Ying-Yu Chen, Jun-Wei Hsieh, Ming-Ching Chang

Abstract: Few-Shot Learning (FSL) has attracted growing attention in computer vision due to its capability in model training without the need for excessive data. FSL is challenging because the training and testing categories (the base vs. novel sets) can be largely diversified. Conventional transfer-based solutions that aim to transfer knowledge learned from large labeled training sets to target testing set… ▽ More Few-Shot Learning (FSL) has attracted growing attention in computer vision due to its capability in model training without the need for excessive data. FSL is challenging because the training and testing categories (the base vs. novel sets) can be largely diversified. Conventional transfer-based solutions that aim to transfer knowledge learned from large labeled training sets to target testing sets are limited, as critical adverse impacts of the shift in task distribution are not adequately addressed. In this paper, we extend the solution of transfer-based methods by incorporating the concept of metric-learning and channel attention. To better exploit the feature representations extracted by the feature backbone, we propose Class-Specific Channel Attention (CSCA) module, which learns to highlight the discriminative channels in each class by assigning each class one CSCA weight vector. Unlike general attention modules designed to learn global-class features, the CSCA module aims to learn local and class-specific features with very effective computation. We evaluated the performance of the CSCA module on standard benchmarks including miniImagenet, Tiered-ImageNet, CIFAR-FS, and CUB-200-2011. Experiments are performed in inductive and in/cross-domain settings. We achieve new state-of-the-art results. △ Less

Submitted 13 December, 2022; v1 submitted 3 September, 2022; originally announced September 2022.

Comments: There are errors in the phase of testing, leading to the wrong results listed in the paper

arXiv:2208.04951 [pdf]

Programming Nonlinear Propagation for Efficient Optical Learning Machines

Authors: Ilker Oguz, Jih-Liang Hsieh, Niyazi Ulas Dinc, Uğur Teğin, Mustafa Yildirim, Carlo Gigli, Christophe Moser, Demetri Psaltis

Abstract: The ever-increasing demand for processing data with larger machine learning models requires more efficient hardware solutions due to limitations such as power dissipation and scalability. Optics is a promising contender for providing lower power computation since light propagation through a non-absorbing medium is a lossless operation. However, to carry out useful and efficient computations with l… ▽ More The ever-increasing demand for processing data with larger machine learning models requires more efficient hardware solutions due to limitations such as power dissipation and scalability. Optics is a promising contender for providing lower power computation since light propagation through a non-absorbing medium is a lossless operation. However, to carry out useful and efficient computations with light, generating and controlling nonlinearity optically is a necessity that is still elusive. Multimode fibers (MMF) have been shown that they can provide nonlinear effects with microwatts of average power while maintaining parallelism and low loss. In this work, we propose an optical neural network architecture, which performs nonlinear optical computation by controlling the propagation of ultrashort pulses in MMF by wavefront shaping. With a surrogate model, optimal sets of parameters are found to program this optical computer for different tasks with minimal utilization of an electronic computer. We show a remarkable decrease of 97% in the number of model parameters, which leads to an overall 99% digital operation reduction compared to an equivalently performing digital neural network. We further demonstrate that a fully optical implementation can also be performed with competitive accuracies. △ Less

Submitted 9 August, 2022; originally announced August 2022.

Comments: 32 pages, 11 figures

arXiv:2208.00122 [pdf, ps, other]

Polynomial-Time Power-Sum Decomposition of Polynomials

Authors: Mitali Bafna, Jun-Ting Hsieh, Pravesh K. Kothari, Jeff Xu

Abstract: We give efficient algorithms for finding power-sum decomposition of an input polynomial $P(x)= \sum_{i\leq m} p_i(x)^d$ with component $p_i$s. The case of linear $p_i$s is equivalent to the well-studied tensor decomposition problem while the quadratic case occurs naturally in studying identifiability of non-spherical Gaussian mixtures from low-order moments. Unlike tensor decomposition, both the… ▽ More We give efficient algorithms for finding power-sum decomposition of an input polynomial $P(x)= \sum_{i\leq m} p_i(x)^d$ with component $p_i$s. The case of linear $p_i$s is equivalent to the well-studied tensor decomposition problem while the quadratic case occurs naturally in studying identifiability of non-spherical Gaussian mixtures from low-order moments. Unlike tensor decomposition, both the unique identifiability and algorithms for this problem are not well-understood. For the simplest setting of quadratic $p_i$s and $d=3$, prior work of Ge, Huang and Kakade yields an algorithm only when $m \leq \tilde{O}(\sqrt{n})$. On the other hand, the more general recent result of Garg, Kayal and Saha builds an algebraic approach to handle any $m=n^{O(1)}$ components but only when $d$ is large enough (while yielding no bounds for $d=3$ or even $d=100$) and only handles an inverse exponential noise. Our results obtain a substantial quantitative improvement on both the prior works above even in the base case of $d=3$ and quadratic $p_i$s. Specifically, our algorithm succeeds in decomposing a sum of $m \sim \tilde{O}(n)$ generic quadratic $p_i$s for $d=3$ and more generally the $d$th power-sum of $m \sim n^{2d/15}$ generic degree-$K$ polynomials for any $K \geq 2$. Our algorithm relies only on basic numerical linear algebraic primitives, is exact (i.e., obtain arbitrarily tiny error up to numerical precision), and handles an inverse polynomial noise when the $p_i$s have random Gaussian coefficients. Our main tool is a new method for extracting the linear span of $p_i$s by studying the linear subspace of low-order partial derivatives of the input $P$. For establishing polynomial stability of our algorithm in average-case, we prove inverse polynomial bounds on the smallest singular value of certain correlated random matrices with low-degree polynomial entries that arise in our analyses. △ Less

Submitted 29 July, 2022; originally announced August 2022.

Comments: To appear in FOCS 2022

arXiv:2207.10850 [pdf, other]

A simple and sharper proof of the hypergraph Moore bound

Authors: Jun-Ting Hsieh, Pravesh K. Kothari, Sidhanth Mohanty

Abstract: The hypergraph Moore bound is an elegant statement that characterizes the extremal trade-off between the girth - the number of hyperedges in the smallest cycle or even cover (a subhypergraph with all degrees even) and size - the number of hyperedges in a hypergraph. For graphs (i.e., $2$-uniform hypergraphs), a bound tight up to the leading constant was proven in a classical work of Alon, Hoory an… ▽ More The hypergraph Moore bound is an elegant statement that characterizes the extremal trade-off between the girth - the number of hyperedges in the smallest cycle or even cover (a subhypergraph with all degrees even) and size - the number of hyperedges in a hypergraph. For graphs (i.e., $2$-uniform hypergraphs), a bound tight up to the leading constant was proven in a classical work of Alon, Hoory and Linial [AHL02]. For hypergraphs of uniformity $k>2$, an appropriate generalization was conjectured by Feige [Fei08]. The conjecture was settled up to an additional $\log^{4k+1} n$ factor in the size in a recent work of Guruswami, Kothari and Manohar [GKM21]. Their argument relies on a connection between the existence of short even covers and the spectrum of a certain randomly signed Kikuchi matrix. Their analysis, especially for the case of odd $k$, is significantly complicated. In this work, we present a substantially simpler and shorter proof of the hypergraph Moore bound. Our key idea is the use of a new reweighted Kikuchi matrix and an edge deletion step that allows us to drop several involved steps in [GKM21]'s analysis such as combinatorial bucketing of rows of the Kikuchi matrix and the use of the Schudy-Sviridenko polynomial concentration. Our simpler proof also obtains tighter parameters: in particular, the argument gives a new proof of the classical Moore bound of [AHL02] with no loss (the proof in [GKM21] loses a $\log^3 n$ factor), and loses only a single logarithmic factor for all $k>2$-uniform hypergraphs. As in [GKM21], our ideas naturally extend to yield a simpler proof of the full trade-off for strongly refuting smoothed instances of constraint satisfaction problems with similarly improved parameters. △ Less

Submitted 21 July, 2022; originally announced July 2022.

arXiv:2206.09204 [pdf, ps, other]

Approximating Max-Cut on Bounded Degree Graphs: Tighter Analysis of the FKL Algorithm

Authors: Jun-Ting Hsieh, Pravesh K. Kothari

Abstract: In this note, we describe a $α_{GW} + \tildeΩ(1/d^2)$-factor approximation algorithm for Max-Cut on weighted graphs of degree $\leq d$. Here, $α_{GW}\approx 0.878$ is the worst-case approximation ratio of the Goemans-Williamson rounding for Max-Cut. This improves on previous results for unweighted graphs by Feige, Karpinski, and Langberg and Florén. Our guarantee is obtained by a tighter analysis… ▽ More In this note, we describe a $α_{GW} + \tildeΩ(1/d^2)$-factor approximation algorithm for Max-Cut on weighted graphs of degree $\leq d$. Here, $α_{GW}\approx 0.878$ is the worst-case approximation ratio of the Goemans-Williamson rounding for Max-Cut. This improves on previous results for unweighted graphs by Feige, Karpinski, and Langberg and Florén. Our guarantee is obtained by a tighter analysis of the solution obtained by applying a natural local improvement procedure to the Goemans-Williamson rounding of the basic SDP strengthened with triangle inequalities. △ Less

Submitted 18 June, 2022; originally announced June 2022.

Showing 1–50 of 66 results for author: Hsieh, J