-
Symbiotic AI: Augmenting Human Cognition from PCs to Cars
Authors:
Riccardo Bovo,
Karan Ahuja,
Ryo Suzuki,
Mustafa Doga Dogan,
Mar Gonzalez-Franco
Abstract:
As AI takes on increasingly complex roles in human-computer interaction, fundamental questions arise: how can HCI help maintain the user as the primary agent while augment human cognition and intelligence? This paper suggests questions to guide researchers in considering the implications for agency, autonomy, the augmentation of human intellect, and the future of human-AI synergies. We observe a k…
▽ More
As AI takes on increasingly complex roles in human-computer interaction, fundamental questions arise: how can HCI help maintain the user as the primary agent while augment human cognition and intelligence? This paper suggests questions to guide researchers in considering the implications for agency, autonomy, the augmentation of human intellect, and the future of human-AI synergies. We observe a key paradigm shift behind the transformation of HCI, shifting from explicit command-and-control models to systems where users define high-level goals directly. This shift will be facilitated by XR technologies, whose multi-modal inputs and outputs offer a more seamless way to convey these goals. This paper considers this transformation through the lens of two cultural milestones: the personal computer and the automobile, moving beyond traditional interfaces like keyboards or steering wheels and thinking of them as vessels for everyday XR.
△ Less
Submitted 3 April, 2025;
originally announced April 2025.
-
Imprinto: Enhancing Infrared Inkjet Watermarking for Human and Machine Perception
Authors:
Martin Feick,
Xuxin Tang,
Raul Garcia-Martin,
Alexandru Luchianov,
Roderick Wei Xiao Huang,
Chang Xiao,
Alexa Siu,
Mustafa Doga Dogan
Abstract:
Hybrid paper interfaces leverage augmented reality to combine the desired tangibility of paper documents with the affordances of interactive digital media. Typically, virtual content can be embedded through direct links (e.g., QR codes); however, this impacts the aesthetics of the paper print and limits the available visual content space. To address this problem, we present Imprinto, an infrared i…
▽ More
Hybrid paper interfaces leverage augmented reality to combine the desired tangibility of paper documents with the affordances of interactive digital media. Typically, virtual content can be embedded through direct links (e.g., QR codes); however, this impacts the aesthetics of the paper print and limits the available visual content space. To address this problem, we present Imprinto, an infrared inkjet watermarking technique that allows for invisible content embeddings only by using off-the-shelf IR inks and a camera. Imprinto was established through a psychophysical experiment, studying how much IR ink can be used while remaining invisible to users regardless of background color. We demonstrate that we can detect invisible IR content through our machine learning pipeline, and we developed an authoring tool that optimizes the amount of IR ink on the color regions of an input document for machine and human detectability. Finally, we demonstrate several applications, including augmenting paper documents and objects.
△ Less
Submitted 24 February, 2025;
originally announced February 2025.
-
FontCraft: Multimodal Font Design Using Interactive Bayesian Optimization
Authors:
Yuki Tatsukawa,
I-Chao Shen,
Mustafa Doga Dogan,
Anran Qi,
Yuki Koyama,
Ariel Shamir,
Takeo Igarashi
Abstract:
Creating new fonts requires a lot of human effort and professional typographic knowledge. Despite the rapid advancements of automatic font generation models, existing methods require users to prepare pre-designed characters with target styles using font-editing software, which poses a problem for non-expert users. To address this limitation, we propose FontCraft, a system that enables font generat…
▽ More
Creating new fonts requires a lot of human effort and professional typographic knowledge. Despite the rapid advancements of automatic font generation models, existing methods require users to prepare pre-designed characters with target styles using font-editing software, which poses a problem for non-expert users. To address this limitation, we propose FontCraft, a system that enables font generation without relying on pre-designed characters. Our approach integrates the exploration of a font-style latent space with human-in-the-loop preferential Bayesian optimization and multimodal references, facilitating efficient exploration and enhancing user control. Moreover, FontCraft allows users to revisit previous designs, retracting their earlier choices in the preferential Bayesian optimization process. Once users finish editing the style of a selected character, they can propagate it to the remaining characters and further refine them as needed. The system then generates a complete outline font in OpenType format. We evaluated the effectiveness of FontCraft through a user study comparing it to a baseline interface. Results from both quantitative and qualitative evaluations demonstrate that FontCraft enables non-expert users to design fonts efficiently.
△ Less
Submitted 16 February, 2025;
originally announced February 2025.
-
Draw2Cut: Direct On-Material Annotations for CNC Milling
Authors:
Xinyue Gui,
Ding Xia,
Wang Gao,
Mustafa Doga Dogan,
Maria Larsson,
Takeo Igarashi
Abstract:
Creating custom artifacts with computer numerical control (CNC) milling machines typically requires mastery of complex computer-aided design (CAD) software. To eliminate this user barrier, we introduced Draw2Cut, a novel system that allows users to design and fabricate artifacts by sketching directly on physical materials. Draw2Cut employs a custom-drawing language to convert user-drawn lines, sym…
▽ More
Creating custom artifacts with computer numerical control (CNC) milling machines typically requires mastery of complex computer-aided design (CAD) software. To eliminate this user barrier, we introduced Draw2Cut, a novel system that allows users to design and fabricate artifacts by sketching directly on physical materials. Draw2Cut employs a custom-drawing language to convert user-drawn lines, symbols, and colors into toolpaths, thereby enabling users to express their creative intent intuitively. The key features include real-time alignment between material and virtual toolpaths, a preview interface for validation, and an open-source platform for customization. Through technical evaluations and user studies, we demonstrate that Draw2Cut lowers the entry barrier for personal fabrication, enabling novices to create customized artifacts with precision and ease. Our findings highlight the potential of the system to enhance creativity, engagement, and accessibility in CNC-based woodworking.
△ Less
Submitted 25 February, 2025; v1 submitted 31 January, 2025;
originally announced January 2025.
-
XR-penter: Material-Aware and In Situ Design of Scrap Wood Assemblies
Authors:
Ramya Iyer,
Mustafa Doga Dogan,
Maria Larsson,
Takeo Igarashi
Abstract:
Woodworkers have to navigate multiple considerations when planning a project, including available resources, skill-level, and intended effort. Do it yourself (DIY) woodworkers face these challenges most acutely because of tight material constraints and a desire for custom designs tailored to specific spaces. To address these needs, we present XR-penter, an extended reality (XR) application that su…
▽ More
Woodworkers have to navigate multiple considerations when planning a project, including available resources, skill-level, and intended effort. Do it yourself (DIY) woodworkers face these challenges most acutely because of tight material constraints and a desire for custom designs tailored to specific spaces. To address these needs, we present XR-penter, an extended reality (XR) application that supports in situ, material-aware woodworking for casual makers. Our system enables users to design virtual scrap wood assemblies directly in their workspace, encouraging sustainable practices through the use of discarded materials. Users register physical material as virtual twins, manipulate these twins into an assembly in XR, and preview cuts needed for fabrication. We conducted a case study and feedback sessions to demonstrate how XR-penter supports improvisational workflows in practice, the type of woodworker who would benefit most from our system, and insights on integrating similar spatial and material considerations into future work.
△ Less
Submitted 26 January, 2025;
originally announced January 2025.
-
AvatarPerfect: User-Assisted 3D Gaussian Splatting Avatar Refinement with Automatic Pose Suggestion
Authors:
Jotaro Sakamiya,
I-Chao Shen,
Jinsong Zhang,
Mustafa Doga Dogan,
Takeo Igarashi
Abstract:
Creating high-quality 3D avatars using 3D Gaussian Splatting (3DGS) from a monocular video benefits virtual reality and telecommunication applications. However, existing automatic methods exhibit artifacts under novel poses due to limited information in the input video. We propose AvatarPerfect, a novel system that allows users to iteratively refine 3DGS avatars by manually editing the rendered av…
▽ More
Creating high-quality 3D avatars using 3D Gaussian Splatting (3DGS) from a monocular video benefits virtual reality and telecommunication applications. However, existing automatic methods exhibit artifacts under novel poses due to limited information in the input video. We propose AvatarPerfect, a novel system that allows users to iteratively refine 3DGS avatars by manually editing the rendered avatar images. In each iteration, our system suggests a new body and camera pose to help users identify and correct artifacts. The edited images are then used to update the current avatar, and our system suggests the next body and camera pose for further refinement. To investigate the effectiveness of AvatarPerfect, we conducted a user study comparing our method to an existing 3DGS editor SuperSplat, which allows direct manipulation of Gaussians without automatic pose suggestions. The results indicate that our system enables users to obtain higher quality refined 3DGS avatars than the existing 3DGS editor.
△ Less
Submitted 20 December, 2024;
originally announced December 2024.
-
RAMPA: Robotic Augmented Reality for Machine Programming by DemonstrAtion
Authors:
Fatih Dogangun,
Serdar Bahar,
Yigit Yildirim,
Bora Toprak Temir,
Emre Ugur,
Mustafa Doga Dogan
Abstract:
This paper introduces Robotic Augmented Reality for Machine Programming by Demonstration (RAMPA), the first ML-integrated, XR-driven end-to-end robotic system, allowing training and deployment of ML models such as ProMPs on the fly, and utilizing the capabilities of state-of-the-art and commercially available AR headsets, e.g., Meta Quest 3, to facilitate the application of Programming by Demonstr…
▽ More
This paper introduces Robotic Augmented Reality for Machine Programming by Demonstration (RAMPA), the first ML-integrated, XR-driven end-to-end robotic system, allowing training and deployment of ML models such as ProMPs on the fly, and utilizing the capabilities of state-of-the-art and commercially available AR headsets, e.g., Meta Quest 3, to facilitate the application of Programming by Demonstration (PbD) approaches on industrial robotic arms, e.g., Universal Robots UR10. Our approach enables in-situ data recording, visualization, and fine-tuning of skill demonstrations directly within the user's physical environment. RAMPA addresses critical challenges of PbD, such as safety concerns, programming barriers, and the inefficiency of collecting demonstrations on the actual hardware. The performance of our system is evaluated against the traditional method of kinesthetic control in teaching three different robotic manipulation tasks and analyzed with quantitative metrics, measuring task performance and completion time, trajectory smoothness, system usability, user experience, and task load using standardized surveys. Our findings indicate a substantial advancement in how robotic tasks are taught and refined, promising improvements in operational safety, efficiency, and user engagement in robotic programming.
△ Less
Submitted 18 February, 2025; v1 submitted 17 October, 2024;
originally announced October 2024.
-
TinkerXR: In-Situ, Reality-Aware CAD and 3D Printing Interface for Novices
Authors:
Oğuz Arslan,
Artun Akdoğan,
Mustafa Doga Dogan
Abstract:
Despite the growing accessibility of augmented reality (AR) for visualization, existing computer-aided design (CAD) systems remain confined to traditional screens or require complex setups or predefined parameters, limiting immersion and accessibility for novices. We present TinkerXR, an open-sourced interface enabling in-situ design and fabrication through Constructive Solid Geometry (CSG) modeli…
▽ More
Despite the growing accessibility of augmented reality (AR) for visualization, existing computer-aided design (CAD) systems remain confined to traditional screens or require complex setups or predefined parameters, limiting immersion and accessibility for novices. We present TinkerXR, an open-sourced interface enabling in-situ design and fabrication through Constructive Solid Geometry (CSG) modeling. TinkerXR operates solely with a headset and 3D printer, allowing users to design directly in and for their physical environments. By leveraging spatial awareness, depth occlusion, recognition of physical constraints, reference objects, and intuitive hand movement controls, TinkerXR enhances realism, precision, and ease of use. Its AR-based workflow integrates design and 3D printing with a drag-and-drop interface for a 3D printer's virtual twin. A user study comparing TinkerXR with Tinkercad demonstrates higher accessibility, engagement, and ease of use for novices. By bridging the gap between digital creation and physical output, TinkerXR transforms everyday spaces into accessible and expressive creative studios.
△ Less
Submitted 29 January, 2025; v1 submitted 8 October, 2024;
originally announced October 2024.
-
SpecTrack: Learned Multi-Rotation Tracking via Speckle Imaging
Authors:
Ziyang Chen,
Mustafa Doğa Doğan,
Josef Spjut,
Kaan Akşit
Abstract:
Precision pose detection is increasingly demanded in fields such as personal fabrication, Virtual Reality (VR), and robotics due to its critical role in ensuring accurate positioning information. However, conventional vision-based systems used in these systems often struggle with achieving high precision and accuracy, particularly when dealing with complex environments or fast-moving objects. To a…
▽ More
Precision pose detection is increasingly demanded in fields such as personal fabrication, Virtual Reality (VR), and robotics due to its critical role in ensuring accurate positioning information. However, conventional vision-based systems used in these systems often struggle with achieving high precision and accuracy, particularly when dealing with complex environments or fast-moving objects. To address these limitations, we investigate Laser Speckle Imaging (LSI), an emerging optical tracking method that offers promising potential for improving pose estimation accuracy. Specifically, our proposed LSI-Based Tracking (SpecTrack) leverages the captures from a lensless camera and a retro-reflector marker with a coded aperture to achieve multi-axis rotational pose estimation with high precision. Our extensive trials using our in-house built testbed have shown that SpecTrack achieves an accuracy of 0.31° (std=0.43°), significantly outperforming state-of-the-art approaches and improving accuracy up to 200%.
△ Less
Submitted 8 October, 2024;
originally announced October 2024.
-
Evaluating Linguistic Capabilities of Multimodal LLMs in the Lens of Few-Shot Learning
Authors:
Mustafa Dogan,
Ilker Kesen,
Iacer Calixto,
Aykut Erdem,
Erkut Erdem
Abstract:
The linguistic capabilities of Multimodal Large Language Models (MLLMs) are critical for their effective application across diverse tasks. This study aims to evaluate the performance of MLLMs on the VALSE benchmark, focusing on the efficacy of few-shot In-Context Learning (ICL), and Chain-of-Thought (CoT) prompting. We conducted a comprehensive assessment of state-of-the-art MLLMs, varying in mode…
▽ More
The linguistic capabilities of Multimodal Large Language Models (MLLMs) are critical for their effective application across diverse tasks. This study aims to evaluate the performance of MLLMs on the VALSE benchmark, focusing on the efficacy of few-shot In-Context Learning (ICL), and Chain-of-Thought (CoT) prompting. We conducted a comprehensive assessment of state-of-the-art MLLMs, varying in model size and pretraining datasets. The experimental results reveal that ICL and CoT prompting significantly boost model performance, particularly in tasks requiring complex reasoning and contextual understanding. Models pretrained on captioning datasets show superior zero-shot performance, while those trained on interleaved image-text data benefit from few-shot learning. Our findings provide valuable insights into optimizing MLLMs for better grounding of language in visual contexts, highlighting the importance of the composition of pretraining data and the potential of few-shot learning strategies to improve the reasoning abilities of MLLMs.
△ Less
Submitted 17 July, 2024;
originally announced July 2024.
-
Ubiquitous Metadata: Design and Fabrication of Embedded Markers for Real-World Object Identification and Interaction
Authors:
Mustafa Doga Dogan
Abstract:
The convergence of the physical and digital realms has ushered in a new era of immersive experiences and seamless interactions. As the boundaries between the real world and virtual environments blur and result in a "mixed reality," there arises a need for robust and efficient methods to connect physical objects with their virtual counterparts. In this thesis, we present a novel approach to bridgin…
▽ More
The convergence of the physical and digital realms has ushered in a new era of immersive experiences and seamless interactions. As the boundaries between the real world and virtual environments blur and result in a "mixed reality," there arises a need for robust and efficient methods to connect physical objects with their virtual counterparts. In this thesis, we present a novel approach to bridging this gap through the design, fabrication, and detection of embedded machine-readable markers.
We categorize the proposed marking approaches into three distinct categories: natural markers, structural markers, and internal markers. Natural markers, such as those used in SensiCut, are inherent fingerprints of objects repurposed as machine-readable identifiers, while structural markers, such as StructCode and G-ID, leverage the structural artifacts in objects that emerge during the fabrication process itself. Internal markers, such as InfraredTag and BrightMarker, are embedded inside fabricated objects using specialized materials. Leveraging a combination of methods from computer vision, machine learning, computational imaging, and material science, the presented approaches offer robust and versatile solutions for object identification, tracking, and interaction.
These markers, seamlessly integrated into real-world objects, effectively communicate an object's identity, origin, function, and interaction, functioning as gateways to "ubiquitous metadata" - a concept where metadata is embedded into physical objects, similar to metadata in digital files. Across the different chapters, we demonstrate the applications of the presented methods in diverse domains, including product design, manufacturing, retail, logistics, education, entertainment, security, and sustainability.
△ Less
Submitted 16 July, 2024;
originally announced July 2024.
-
Complexity of Robust Orbit Problems for Torus Actions and the abc-conjecture
Authors:
Peter Bürgisser,
Mahmut Levent Doğan,
Visu Makam,
Michael Walter,
Avi Wigderson
Abstract:
When a group acts on a set, it naturally partitions it into orbits, giving rise to orbit problems. These are natural algorithmic problems, as symmetries are central in numerous questions and structures in physics, mathematics, computer science, optimization, and more. Accordingly, it is of high interest to understand their computational complexity. Recently, Bürgisser et al. gave the first polynom…
▽ More
When a group acts on a set, it naturally partitions it into orbits, giving rise to orbit problems. These are natural algorithmic problems, as symmetries are central in numerous questions and structures in physics, mathematics, computer science, optimization, and more. Accordingly, it is of high interest to understand their computational complexity. Recently, Bürgisser et al. gave the first polynomial-time algorithms for orbit problems of torus actions, that is, actions of commutative continuous groups on Euclidean space. In this work, motivated by theoretical and practical applications, we study the computational complexity of robust generalizations of these orbit problems, which amount to approximating the distance of orbits in $\mathbb{C}^n$ up to a factor $γ>1$. In particular, this allows deciding whether two inputs are approximately in the same orbit or far from being so. On the one hand, we prove the NP-hardness of this problem for $γ= n^{Ω(1/\log\log n)}$ by reducing the closest vector problem for lattices to it. On the other hand, we describe algorithms for solving this problem for an approximation factor $γ= \exp(\mathrm{poly}(n))$. Our algorithms combine tools from invariant theory and algorithmic lattice theory, and they also provide group elements witnessing the proximity of the given orbits (in contrast to the algebraic algorithms of prior work). We prove that they run in polynomial time if and only if a version of the famous number-theoretic $abc$-conjecture holds -- establishing a new and surprising connection between computational complexity and number theory.
△ Less
Submitted 24 May, 2024;
originally announced May 2024.
-
Augmented Object Intelligence with XR-Objects
Authors:
Mustafa Doga Dogan,
Eric J. Gonzalez,
Karan Ahuja,
Ruofei Du,
Andrea Colaço,
Johnny Lee,
Mar Gonzalez-Franco,
David Kim
Abstract:
Seamless integration of physical objects as interactive digital entities remains a challenge for spatial computing. This paper explores Augmented Object Intelligence (AOI) in the context of XR, an interaction paradigm that aims to blur the lines between digital and physical by equipping real-world objects with the ability to interact as if they were digital, where every object has the potential to…
▽ More
Seamless integration of physical objects as interactive digital entities remains a challenge for spatial computing. This paper explores Augmented Object Intelligence (AOI) in the context of XR, an interaction paradigm that aims to blur the lines between digital and physical by equipping real-world objects with the ability to interact as if they were digital, where every object has the potential to serve as a portal to digital functionalities. Our approach utilizes real-time object segmentation and classification, combined with the power of Multimodal Large Language Models (MLLMs), to facilitate these interactions without the need for object pre-registration. We implement the AOI concept in the form of XR-Objects, an open-source prototype system that provides a platform for users to engage with their physical environment in contextually relevant ways using object-based context menus. This system enables analog objects to not only convey information but also to initiate digital actions, such as querying for details or executing tasks. Our contributions are threefold: (1) we define the AOI concept and detail its advantages over traditional AI assistants, (2) detail the XR-Objects system's open-source design and implementation, and (3) show its versatility through various use cases and a user study.
△ Less
Submitted 16 May, 2025; v1 submitted 20 April, 2024;
originally announced April 2024.
-
Achieving Low Latency at Low Outage: Multilevel Coding for mmWave Channels
Authors:
Mine Gokce Dogan,
Jaimin Shah,
Martina Cardone,
Christina Fragouli,
Wei Mao,
Hosein Nikopour,
Rath Vannithamby
Abstract:
Millimeter-wave (mmWave) spectrum is expected to support data-intensive applications that require ultra-reliable low-latency communications (URLLC). However, mmWave links are highly sensitive to blockage, which may lead to disruptions in the communication. Traditional techniques that build resilience against such blockages (among which are interleaving and feedback mechanisms) incur delays that ar…
▽ More
Millimeter-wave (mmWave) spectrum is expected to support data-intensive applications that require ultra-reliable low-latency communications (URLLC). However, mmWave links are highly sensitive to blockage, which may lead to disruptions in the communication. Traditional techniques that build resilience against such blockages (among which are interleaving and feedback mechanisms) incur delays that are too large to effectively support URLLC. This calls for novel techniques that ensure resilient URLLC. In this paper, we propose to deploy multilevel codes over space and over time. These codes offer several benefits, such as they allow to control what information is received and they provide different reliability guarantees for different information streams based on their priority. We also show that deploying these codes leads to attractive trade-offs between rate, delay, and outage probability. A practically-relevant aspect of the proposed technique is that it offers resilience while incurring a low operational complexity.
△ Less
Submitted 10 February, 2024;
originally announced February 2024.
-
ViLMA: A Zero-Shot Benchmark for Linguistic and Temporal Grounding in Video-Language Models
Authors:
Ilker Kesen,
Andrea Pedrotti,
Mustafa Dogan,
Michele Cafagna,
Emre Can Acikgoz,
Letitia Parcalabescu,
Iacer Calixto,
Anette Frank,
Albert Gatt,
Aykut Erdem,
Erkut Erdem
Abstract:
With the ever-increasing popularity of pretrained Video-Language Models (VidLMs), there is a pressing need to develop robust evaluation methodologies that delve deeper into their visio-linguistic capabilities. To address this challenge, we present ViLMA (Video Language Model Assessment), a task-agnostic benchmark that places the assessment of fine-grained capabilities of these models on a firm foo…
▽ More
With the ever-increasing popularity of pretrained Video-Language Models (VidLMs), there is a pressing need to develop robust evaluation methodologies that delve deeper into their visio-linguistic capabilities. To address this challenge, we present ViLMA (Video Language Model Assessment), a task-agnostic benchmark that places the assessment of fine-grained capabilities of these models on a firm footing. Task-based evaluations, while valuable, fail to capture the complexities and specific temporal aspects of moving images that VidLMs need to process. Through carefully curated counterfactuals, ViLMA offers a controlled evaluation suite that sheds light on the true potential of these models, as well as their performance gaps compared to human-level understanding. ViLMA also includes proficiency tests, which assess basic capabilities deemed essential to solving the main counterfactual tests. We show that current VidLMs' grounding abilities are no better than those of vision-language models which use static images. This is especially striking once the performance on proficiency tests is factored in. Our benchmark serves as a catalyst for future research on VidLMs, helping to highlight areas that still need to be explored.
△ Less
Submitted 12 November, 2023;
originally announced November 2023.
-
Supporting Passive Users in mmWave Networks
Authors:
Mine Gokce Dogan,
Martina Cardone,
Christina Fragouli
Abstract:
The interference from active to passive users is a well-recognized challenge in millimeter-wave (mmWave) communications. We propose a method that enables to limit the interference on passive users (whose presence may not be detected since they do not transmit) with a small penalty to the throughput of active users. Our approach abstracts away (in a simple, yet informative way) the physical layer c…
▽ More
The interference from active to passive users is a well-recognized challenge in millimeter-wave (mmWave) communications. We propose a method that enables to limit the interference on passive users (whose presence may not be detected since they do not transmit) with a small penalty to the throughput of active users. Our approach abstracts away (in a simple, yet informative way) the physical layer component and it leverages the directivity of mmWave links and the available network path diversity. We provide linear programming formulations, lower bounds on active users rates, numerical evaluations, and we establish a connection with the problem of (information theoretically) secure communication over mmWave networks.
△ Less
Submitted 27 August, 2023;
originally announced August 2023.
-
Deterministic Approximation Algorithms for Volumes of Spectrahedra
Authors:
Mahmut Levent Doğan,
Jonathan Leake,
Mohan Ravichandran
Abstract:
We give a method for computing asymptotic formulas and approximations for the volumes of spectrahedra, based on the maximum-entropy principle from statistical physics. The method gives an approximate volume formula based on a single convex optimization problem of minimizing $-\log \det P$ over the spectrahedron. Spectrahedra can be described as affine slices of the convex cone of positive semi-def…
▽ More
We give a method for computing asymptotic formulas and approximations for the volumes of spectrahedra, based on the maximum-entropy principle from statistical physics. The method gives an approximate volume formula based on a single convex optimization problem of minimizing $-\log \det P$ over the spectrahedron. Spectrahedra can be described as affine slices of the convex cone of positive semi-definite (PSD) matrices, and the method yields efficient deterministic approximation algorithms and asymptotic formulas whenever the number of affine constraints is sufficiently dominated by the dimension of the PSD cone.
Our approach is inspired by the work of Barvinok and Hartigan who used an analogous framework for approximately computing volumes of polytopes. Spectrahedra, however, possess a remarkable feature not shared by polytopes, a new fact that we also prove: central sections of the set of density matrices (the quantum version of the simplex) all have asymptotically the same volume. This allows for very general approximation algorithms, which apply to large classes of naturally occurring spectrahedra.
We give two main applications of this method. First, we apply this method to what we call the "multi-way Birkhoff spectrahedron" and obtain an explicit asymptotic formula for its volume. This spectrahedron is the set of quantum states with maximal entanglement (i.e., the quantum states having univariant quantum marginals equal to the identity matrix) and is the quantum analog of the multi-way Birkhoff polytope. Second, we apply this method to explicitly compute the asymptotic volume of central sections of the set of density matrices.
△ Less
Submitted 22 November, 2022;
originally announced November 2022.
-
Proactive Resilient Transmission and Scheduling Mechanisms for mmWave Networks
Authors:
Mine Gokce Dogan,
Martina Cardone,
Christina Fragouli
Abstract:
This paper aims to develop resilient transmission mechanisms to suitably distribute traffic across multiple paths in an arbitrary millimeter-wave (mmWave) network. The main contributions include: (a) the development of proactive transmission mechanisms that build resilience against network disruptions in advance, while achieving a high end-to-end packet rate; (b) the design of a heuristic path sel…
▽ More
This paper aims to develop resilient transmission mechanisms to suitably distribute traffic across multiple paths in an arbitrary millimeter-wave (mmWave) network. The main contributions include: (a) the development of proactive transmission mechanisms that build resilience against network disruptions in advance, while achieving a high end-to-end packet rate; (b) the design of a heuristic path selection algorithm that efficiently selects (in polynomial time in the network size) multiple proactively resilient paths with high packet rates; and (c) the development of a hybrid scheduling algorithm that combines the proposed path selection algorithm with a deep reinforcement learning (DRL) based online approach for decentralized adaptation to blocked links and failed paths. To achieve resilience to link failures, a state-of-the-art Soft Actor-Critic DRL algorithm, which adapts the information flow through the network, is investigated. The proposed scheduling algorithm robustly adapts to link failures over different topologies, channel and blockage realizations while offering a superior performance to alternative algorithms.
△ Less
Submitted 16 November, 2022;
originally announced November 2022.
-
On the complexity of Chow and Hurwitz forms
Authors:
Mahmut Levent Doğan,
Alperen Ali Ergür,
Elias Tsigaridas
Abstract:
We consider the bit complexity of computing Chow forms and their generalization to multiprojective spaces. We develop a deterministic algorithm using resultants and obtain a single exponential complexity upper bound. Earlier computational results for Chow forms were in the arithmetic complexity model, and our result represents the first bit complexity bound. We also extend our algorithm to Hurwitz…
▽ More
We consider the bit complexity of computing Chow forms and their generalization to multiprojective spaces. We develop a deterministic algorithm using resultants and obtain a single exponential complexity upper bound. Earlier computational results for Chow forms were in the arithmetic complexity model, and our result represents the first bit complexity bound. We also extend our algorithm to Hurwitz forms in projective space, and explore connections between multiprojective Hurwitz forms and matroid theory. The motivation for our work comes from incidence geometry where intriguing computational algebra problems remain open.
△ Less
Submitted 12 April, 2024; v1 submitted 23 February, 2022;
originally announced February 2022.
-
InfraredTags: Embedding Invisible AR Markers and Barcodes Using Low-Cost, Infrared-Based 3D Printing and Imaging Tools
Authors:
Mustafa Doga Dogan,
Ahmad Taka,
Michael Lu,
Yunyi Zhu,
Akshat Kumar,
Aakar Gupta,
Stefanie Mueller
Abstract:
Existing approaches for embedding unobtrusive tags inside 3D objects require either complex fabrication or high-cost imaging equipment. We present InfraredTags, which are 2D markers and barcodes imperceptible to the naked eye that can be 3D printed as part of objects, and detected rapidly by low-cost near-infrared cameras. We achieve this by printing objects from an infrared-transmitting filament,…
▽ More
Existing approaches for embedding unobtrusive tags inside 3D objects require either complex fabrication or high-cost imaging equipment. We present InfraredTags, which are 2D markers and barcodes imperceptible to the naked eye that can be 3D printed as part of objects, and detected rapidly by low-cost near-infrared cameras. We achieve this by printing objects from an infrared-transmitting filament, which infrared cameras can see through, and by having air gaps inside for the tag's bits, which appear at a different intensity in the infrared image.
We built a user interface that facilitates the integration of common tags (QR codes, ArUco markers) with the object geometry to make them 3D printable as InfraredTags. We also developed a low-cost infrared imaging module that augments existing mobile devices and decodes tags using our image processing pipeline. Our evaluation shows that the tags can be detected with little near-infrared illumination (0.2lux) and from distances as far as 250cm. We demonstrate how our method enables various applications, such as object tracking and embedding metadata for augmented reality and tangible interactions.
△ Less
Submitted 12 February, 2022;
originally announced February 2022.
-
A Reinforcement Learning Approach for Scheduling in mmWave Networks
Authors:
Mine Gokce Dogan,
Yahya H. Ezzeldin,
Christina Fragouli,
Addison W. Bohannon
Abstract:
We consider a source that wishes to communicate with a destination at a desired rate, over a mmWave network where links are subject to blockage and nodes to failure (e.g., in a hostile military environment). To achieve resilience to link and node failures, we here explore a state-of-the-art Soft Actor-Critic (SAC) deep reinforcement learning algorithm, that adapts the information flow through the…
▽ More
We consider a source that wishes to communicate with a destination at a desired rate, over a mmWave network where links are subject to blockage and nodes to failure (e.g., in a hostile military environment). To achieve resilience to link and node failures, we here explore a state-of-the-art Soft Actor-Critic (SAC) deep reinforcement learning algorithm, that adapts the information flow through the network, without using knowledge of the link capacities or network topology. Numerical evaluations show that our algorithm can achieve the desired rate even in dynamic environments and it is robust against blockage.
△ Less
Submitted 1 August, 2021;
originally announced August 2021.
-
Hazelcast Jet: Low-latency Stream Processing at the 99.99th Percentile
Authors:
Can Gencer,
Marko Topolnik,
Viliam Ďurina,
Emin Demirci,
Ensar B. Kahveci,
Ali Gürbüz Ondřej Lukáš,
József Bartók,
Grzegorz Gierlach,
František Hartman,
Ufuk Yılmaz,
Mehmet Doğan,
Mohamed Mandouh,
Marios Fragkoulis,
Asterios Katsifodimos
Abstract:
Jet is an open-source, high-performance, distributed stream processor built at Hazelcast during the last five years. Jet was engineered with millisecond latency on the 99.99th percentile as its primary design goal. Originally Jet's purpose was to be an execution engine that performs complex business logic on top of streams generated by Hazelcast's In-memory Data Grid (IMDG): a set of high-performa…
▽ More
Jet is an open-source, high-performance, distributed stream processor built at Hazelcast during the last five years. Jet was engineered with millisecond latency on the 99.99th percentile as its primary design goal. Originally Jet's purpose was to be an execution engine that performs complex business logic on top of streams generated by Hazelcast's In-memory Data Grid (IMDG): a set of high-performance, in-memory, partitioned and replicated data structures. With time, Jet evolved into a full-fledged, scale-out stream processor that can handle out-of-order streams and exactly-once processing guarantees. Jet's end-to-end latency lies in the order of milliseconds, and its throughput in the order of millions of events per CPU-core. This paper presents main design decisions we made in order to maximize the performance per CPU-core, alongside lessons learned, and an empirical performance evaluation.
△ Less
Submitted 18 March, 2021;
originally announced March 2021.
-
Polynomial time algorithms in invariant theory for torus actions
Authors:
Peter Bürgisser,
M. Levent Doğan,
Visu Makam,
Michael Walter,
Avi Wigderson
Abstract:
An action of a group on a vector space partitions the latter into a set of orbits. We consider three natural and useful algorithmic "isomorphism" or "classification" problems, namely, orbit equality, orbit closure intersection, and orbit closure containment. These capture and relate to a variety of problems within mathematics, physics and computer science, optimization and statistics. These orbit…
▽ More
An action of a group on a vector space partitions the latter into a set of orbits. We consider three natural and useful algorithmic "isomorphism" or "classification" problems, namely, orbit equality, orbit closure intersection, and orbit closure containment. These capture and relate to a variety of problems within mathematics, physics and computer science, optimization and statistics. These orbit problems extend the more basic null cone problem, whose algorithmic complexity has seen significant progress in recent years.
In this paper, we initiate a study of these problems by focusing on the actions of commutative groups (namely, tori). We explain how this setting is motivated from questions in algebraic complexity, and is still rich enough to capture interesting combinatorial algorithmic problems. While the structural theory of commutative actions is well understood, no general efficient algorithms were known for the aforementioned problems. Our main results are polynomial time algorithms for all three problems. We also show how to efficiently find separating invariants for orbits, and how to compute systems of generating rational invariants for these actions (in contrast, for polynomial invariants the latter is known to be hard). Our techniques are based on a combination of fundamental results in invariant theory, linear programming, and algorithmic lattice theory.
△ Less
Submitted 15 February, 2021;
originally announced February 2021.
-
On optimal relay placement in directional networks
Authors:
Mine Gokce Dogan,
Yahya H. Ezzeldin,
Christina Fragouli
Abstract:
In this paper, we study the problem of optimal topology design in wireless networks equipped with highly-directional transmission antennas. We use the 1-2-1 network model to characterize the optimal placement of two relays that assist the communication between a source-destination pair. We analytically show that under some conditions on the distance between the source-destination pair, the optimal…
▽ More
In this paper, we study the problem of optimal topology design in wireless networks equipped with highly-directional transmission antennas. We use the 1-2-1 network model to characterize the optimal placement of two relays that assist the communication between a source-destination pair. We analytically show that under some conditions on the distance between the source-destination pair, the optimal topology in terms of maximizing the network throughput is to place the relays as close as possible to the source and the destination.
△ Less
Submitted 6 February, 2021; v1 submitted 1 February, 2021;
originally announced February 2021.
-
The Multivariate Schwartz-Zippel Lemma
Authors:
M. Levent Doğan,
Alperen A. Ergür,
Jake D. Mundo,
Elias Tsigaridas
Abstract:
Motivated by applications in combinatorial geometry, we consider the following question: Let $λ=(λ_1,λ_2,\ldots,λ_m)$ be an $m$-partition of a positive integer $n$, $S_i \subseteq \mathbb{C}^{λ_i}$ be finite sets, and let $S:=S_1 \times S_2 \times \ldots \times S_m \subset \mathbb{C}^n$ be the multi-grid defined by $S_i$. Suppose $p$ is an $n$-variate degree $d$ polynomial. How many zeros does…
▽ More
Motivated by applications in combinatorial geometry, we consider the following question: Let $λ=(λ_1,λ_2,\ldots,λ_m)$ be an $m$-partition of a positive integer $n$, $S_i \subseteq \mathbb{C}^{λ_i}$ be finite sets, and let $S:=S_1 \times S_2 \times \ldots \times S_m \subset \mathbb{C}^n$ be the multi-grid defined by $S_i$. Suppose $p$ is an $n$-variate degree $d$ polynomial. How many zeros does $p$ have on $S$?
We first develop a multivariate generalization of Combinatorial Nullstellensatz that certifies existence of a point $t \in S$ so that $p(t) \neq 0$. Then we show that a natural multivariate generalization of the DeMillo-Lipton-Schwartz-Zippel lemma holds, except for a special family of polynomials that we call $λ$-reducible. This yields a simultaneous generalization of Szemerédi-Trotter theorem and Schwartz-Zippel lemma into higher dimensions, and has applications in incidence geometry. Finally, we develop a symbolic algorithm that identifies certain $λ$-reducible polynomials. More precisely, our symbolic algorithm detects polynomials that include a cartesian product of hypersurfaces in their zero set. It is likely that using Chow forms the algorithm can be generalized to handle arbitrary $λ$-reducible polynomials, which we leave as an open problem.
△ Less
Submitted 21 November, 2021; v1 submitted 2 October, 2019;
originally announced October 2019.
-
IDMoB: IoT Data Marketplace on Blockchain
Authors:
Kazım Rıfat Özyılmaz,
Mehmet Doğan,
Arda Yurdakul
Abstract:
Today, Internet of Things (IoT) devices are the powerhouse of data generation with their ever-increasing numbers and widespread penetration. Similarly, artificial intelligence (AI) and machine learning (ML) solutions are getting integrated to all kinds of services, making products significantly more "smarter". The centerpiece of these technologies is "data". IoT device vendors should be able keep…
▽ More
Today, Internet of Things (IoT) devices are the powerhouse of data generation with their ever-increasing numbers and widespread penetration. Similarly, artificial intelligence (AI) and machine learning (ML) solutions are getting integrated to all kinds of services, making products significantly more "smarter". The centerpiece of these technologies is "data". IoT device vendors should be able keep up with the increased throughput and come up with new business models. On the other hand, AI/ML solutions will produce better results if training data is diverse and plentiful.
In this paper, we propose a blockchain-based, decentralized and trustless data marketplace where IoT device vendors and AI/ML solution providers may interact and collaborate. By facilitating a transparent data exchange platform, access to consented data will be democratized and the variety of services targeting end-users will increase. Proposed data marketplace is implemented as a smart contract on Ethereum blockchain and Swarm is used as the distributed storage platform.
△ Less
Submitted 30 September, 2018;
originally announced October 2018.