Search | arXiv e-print repository

C2PO: Coherent Co-packaged Optics using offset-QAM-16 for Beyond PAM-4 Optical I/O

Authors: Dan Sturm, Marzieyh Rezaei, Alana Dee, Sajjad Moazeni

Abstract: Co-packaged optics (CPO) has emerged as a promising solution for achieving the ultra-high bandwidths, shoreline densities, and energy efficiencies required by future GPUs and network switches for AI. Microring modulators (MRMs) are well suited for transmitters due to their compact size, high energy efficiency, and natural compatibility with dense wavelength-division multiplexing (DWDM). However, e… ▽ More Co-packaged optics (CPO) has emerged as a promising solution for achieving the ultra-high bandwidths, shoreline densities, and energy efficiencies required by future GPUs and network switches for AI. Microring modulators (MRMs) are well suited for transmitters due to their compact size, high energy efficiency, and natural compatibility with dense wavelength-division multiplexing (DWDM). However, extending beyond the recently demonstrated 200 Gb/s will require more advanced modulation formats, such as higher-order coherent modulation (e.g., QAM-16). In this work, we show how microring resonators (MRMs) can be efficiently used to implement phase-constant amplitude modulators and form the building blocks of a transmitter for offset QAM-16, which has been shown to simplify carrier-phase recovery relative to conventional QAM. We simulate and evaluate the performance of our proposed MRM-based coherent CPO (C2PO) transmitters using a foundry-provided commercial silicon photonics process, demonstrating an input-normalized electric field amplitude contrast of 0.64 per dimension. Through full link-level bit error rate modeling, we show that our design achieves 400 Gb/s using offset QAM-16 at a total optical laser power of 9.65 dBm-comparable to that required by conventional QAM-16 MZI-based links, despite using 10-100x less area. We further conduct a thermal simulation to assess the transmitter's thermal stability at the MRM input optical power required to meet a target BER at the desired data rates. Finally, as a proof of concept, we demonstrate 25 Gb/s MRM-based offset QAM-4 modulation with a chip fabricated in the GlobalFoundries 45 nm monolithic silicon photonics process. △ Less

Submitted 13 June, 2025; originally announced June 2025.

arXiv:2505.18534 [pdf, ps, other]

A DSP-Free Carrier Phase Recovery System using 16-Offset-QAM Laser Forwarded Links for 400Gb/s and Beyond

Authors: Marziyeh Rezaei, Dan Sturm, Pengyu Zeng, Sajjad Moazeni

Abstract: Optical interconnects are becoming a major bottleneck in scaling up future GPU racks and network switches within data centers. Although 200 Gb/s optical transceivers using PAM-4 modulation have been demonstrated, achieving higher data rates and energy efficiencies requires high-order coherent modulations like 16-QAM. Current coherent links rely on energy-intensive digital signal processing (DSP) f… ▽ More Optical interconnects are becoming a major bottleneck in scaling up future GPU racks and network switches within data centers. Although 200 Gb/s optical transceivers using PAM-4 modulation have been demonstrated, achieving higher data rates and energy efficiencies requires high-order coherent modulations like 16-QAM. Current coherent links rely on energy-intensive digital signal processing (DSP) for channel impairment compensation and carrier phase recovery (CPR), which consumes approximately 50pJ/b - 10x higher than future intra-data center requirements. For shorter links, simpler or DSP-free CPR methods can significantly reduce power and complexity. While Costas loops enable CPR for QPSK, they face challenges in scaling to higher-order modulations (e.g., 16/64-QAM) due to varying symbol amplitudes. In this work, we propose an optical coherent link architecture using laser forwarding and a novel DSP-free CPR system using offset-QAM modulation. The proposed analog CPR feedback loop is highly scalable, capable of supporting arbitrary offset-QAM modulations without requiring architectural modifications. This scalability is achieved through its phase error detection mechanism, which operates independently of the data rate and modulation type. We validated this method using GlobalFoundry's monolithic 45nm silicon photonics PDK models, with circuit- and system-level implementation at 100GBaud in the O-band. We will investigate the feedback loop dynamics, circuit-level implementations, and phase-noise performance of the proposed CPR loop. Our method can be adopted to realize low-power QAM optical interconnects for future coherent-lite pluggable transceivers as well as co-packaged optics (CPO) applications. △ Less

Submitted 24 May, 2025; originally announced May 2025.

arXiv:2504.10384 [pdf, other]

A 10.8mW Mixed-Signal Simulated Bifurcation Ising Solver using SRAM Compute-In-Memory with 0.6us Time-to-Solution

Authors: Alana Marie Dee, Sajjad Moazeni

Abstract: Combinatorial optimization problems are funda- mental for various fields ranging from finance to wireless net- works. This work presents a simulated bifurcation (SB) Ising solver in CMOS for NP-hard optimization problems. Analog domain computing led to a superior implementation of this algorithm as inherent and injected noise is required in SB Ising solvers. The architecture novelties include the… ▽ More Combinatorial optimization problems are funda- mental for various fields ranging from finance to wireless net- works. This work presents a simulated bifurcation (SB) Ising solver in CMOS for NP-hard optimization problems. Analog domain computing led to a superior implementation of this algorithm as inherent and injected noise is required in SB Ising solvers. The architecture novelties include the use of SRAM compute-in-memory (CIM) to accelerate bifurcation as well as the generation and injection of optimal decaying noise in the analog domain. We propose a novel 10-T SRAM cell capable of performing ternary multiplication. When measured with 60- node, 50% density, random, binary MAXCUT graphs, this all- to-all connected Ising solver reliably achieves above 93% of the ground state solution in 0.6us with 10.8mW average power in TSMC 180nm CMOS. Our chip achieves an order of magnitude improvement in time-to-solution and power compared to previously proposed Ising solvers in CMOS and other platforms. △ Less

Submitted 14 April, 2025; originally announced April 2025.

arXiv:2303.01744 [pdf]

Next-generation Co-Packaged Optics for Future Disaggregated AI Systems

Authors: Sajjad Moazeni

Abstract: Co-packaged optics is poised to solve the interconnect bandwidth bottleneck for GPUs and AI accelerators in near future. This technology can immediately boost today's AI/ML compute power to train larger neural networks that can perform more complex tasks. More importantly, co-packaged optics unlocks new system-level opportunities to rethink our conventional supercomputing and datacenter architectu… ▽ More Co-packaged optics is poised to solve the interconnect bandwidth bottleneck for GPUs and AI accelerators in near future. This technology can immediately boost today's AI/ML compute power to train larger neural networks that can perform more complex tasks. More importantly, co-packaged optics unlocks new system-level opportunities to rethink our conventional supercomputing and datacenter architectures. Disaggregation of memory and compute units is one of such new paradigms that can greatly speed up AI/ML workloads by providing low-latency and high-throughput performance, while maintaining flexibility to support conventional cloud computing applications as well. This paper gives a brief overview of state-of-the-art of co-packaged optical I/O and requirements of its next generations. We also discuss ideas to exploit co-packaged optics in disaggregated AI systems and possible future directions. △ Less

Submitted 3 March, 2023; originally announced March 2023.

arXiv:2210.15756 [pdf, other]

Scaling up Superconducting Quantum Computers with Cryogenic RF-photonics

Authors: Sanskriti Joshi, Sajjad Moazeni

Abstract: Today's hundred-qubit quantum computers require a dramatic scale up to millions of qubits to become practical for solving real-world problems. Although a variety of qubit technologies have been demonstrated, scalability remains a major hurdle. Superconducting (SC) qubits are one of the most mature and promising technologies to overcome this challenge. However, these qubits reside in a millikelvin… ▽ More Today's hundred-qubit quantum computers require a dramatic scale up to millions of qubits to become practical for solving real-world problems. Although a variety of qubit technologies have been demonstrated, scalability remains a major hurdle. Superconducting (SC) qubits are one of the most mature and promising technologies to overcome this challenge. However, these qubits reside in a millikelvin cryogenic dilution fridge, isolating them from thermal and electrical noise. They are controlled by a rack-full of external electronics through extremely complex wiring and cables. Although thousands of qubits can be fabricated on a single chip and cooled down to millikelvin temperatures, scaling up the control and readout electronics remains an elusive goal. This is mainly due to the limited available cooling power in cryogenic systems constraining the wiring capacity and cabling heat load management. In this paper, we focus on scaling up the number of XY-control lines by using cryogenic RF-photonic links. This is one of the major roadblocks to build a thousand qubit superconducting QC. We will first review and study the challenges of state-of-the-art proposed approaches, including cryogenic CMOS and deep-cryogenic photonic methods, to scale up the control interface for SC qubit systems. We will discuss their limitations due to the active power dissipation and passive heat leakage in detail. By analytically modeling the noise sources and thermal budget limits, we will show that our solution can achieve a scale up to a thousand of qubits. Our proposed method can be seamlessly implemented using advanced silicon photonic processes, and the number of required optical fibers can be further reduced by using wavelength division multiplexing (WDM). △ Less

Submitted 27 October, 2022; originally announced October 2022.

Comments: 10 pages, 8 figures

Showing 1–5 of 5 results for author: Moazeni, S