Search | arXiv e-print repository

Equity Impacts of Public Transit Network Redesign with Shared Autonomous Mobility Services

Authors: Max T. M. Ng, Meredith Raymer, Hani S. Mahmassani, Omer Verbas, Taner Cokyasar

Abstract: This study examines the equity impacts of integrating shared autonomous mobility services (SAMS) into transit system redesign. Using the Greater Chicago area as a case study, we compare two optimization objectives in multimodal transit network redesign: minimizing total generalized costs (equity-agnostic) versus prioritizing service in low-income areas (equity-focused). We evaluate the achieved ac… ▽ More This study examines the equity impacts of integrating shared autonomous mobility services (SAMS) into transit system redesign. Using the Greater Chicago area as a case study, we compare two optimization objectives in multimodal transit network redesign: minimizing total generalized costs (equity-agnostic) versus prioritizing service in low-income areas (equity-focused). We evaluate the achieved accessibility of clustered zones with redesigned transit networks under two objectives, compared to driving and the existing transit network. The transit access gaps across zones and between transit and driving are found to be generally reduced with the introduction of SAMS, but less so with the subsequent improved infrastructure under budget. Differential improvement in equity is seen across suburbs and areas of the city, reflecting the disparity in current transit access and improvement potential. In particular, SAMS bridges the transit access gaps in suburban and city areas currently underserved by transit. The City of Chicago, which is also disproportionately home to vulnerable populations, offers an avenue to improve vertical equity. These findings demonstrate that SAMS can enhance both horizontal and vertical equity in transit systems, particularly when equity is explicitly incorporated into the design objective. △ Less

Submitted 8 January, 2025; v1 submitted 2 January, 2025; originally announced January 2025.

Comments: Restructuring the paper for more precise research direction

arXiv:2501.01614 [pdf]

doi 10.1177/03611981231170182

Evaluation of Rail Decarbonization Alternatives: Framework and Application

Authors: Adrian Hernandez, Max TM Ng, Nazib Siddique, Pablo L. Durango-Cohen, Amgad Elgowainy, Hani S. Mahmassani, Michael Wang, Yan Zhou

Abstract: The Northwestern University Freight Rail Infrastructure and Energy Network Decarbonization (NUFRIEND) framework is a comprehensive industry-oriented tool for simulating the deployment of new energy technologies including biofuels, e-fuels, battery-electric, and hydrogen locomotives. By classifying fuel types into two categories based on deployment requirements, the associated optimal charging/fuel… ▽ More The Northwestern University Freight Rail Infrastructure and Energy Network Decarbonization (NUFRIEND) framework is a comprehensive industry-oriented tool for simulating the deployment of new energy technologies including biofuels, e-fuels, battery-electric, and hydrogen locomotives. By classifying fuel types into two categories based on deployment requirements, the associated optimal charging/fueling facility location and sizing problem are solved with a five-step framework. Life cycle analyses (LCA) and techno-economic analyses (TEA) are used to estimate carbon reduction, capital investments, cost of carbon reduction, and operational impacts, enabling sensitivity analysis with operational and technological parameters. The framework is illustrated on lower-carbon drop-in fuels as well as battery-electric technology deployments for US Eastern and Western Class I railroad networks. Drop-in fuel deployments are modeled as admixtures with diesel in existing locomotives, while battery-electric deployments are shown for varying technology penetration levels and locomotive ranges. When mixed in a 50 percent ratio with diesel, results show biodiesel's capacity to reduce emissions at 36 percent with a cost of 0.13 USD per kilogram of CO2 reduced, while e-fuels offer a 50 percent emissions reduction potential at a cost of 0.22 USD per kilogram of CO2 reduced. Battery-electric results for 50 percent deployment over all ton-miles highlight the value of future innovations in battery energy densities as scenarios assuming 800-mile range locomotives show an estimated emissions reduction of 46 percent with a cost of 0.06 USD per kilogram of CO2 reduced, compared to 16 percent emissions reduction at a cost of 0.11 USD per kilogram of CO2 reduced for 400-mile range locomotives. △ Less

Submitted 2 January, 2025; originally announced January 2025.

Comments: 29 pages, 17 figures. This is the accepted version of a work that was published in Transportation Research Record

Journal ref: Transportation Research Record 2678.1 (2024): 102-121

arXiv:2501.00219 [pdf]

doi 10.1177/03611981221098660

Autonomous Minibus Service with Semi-on-demand Routes in Grid Networks

Authors: Max T. M. Ng, Hani S. Mahmassani

Abstract: This paper investigates the potential of autonomous minibuses which take on-demand directional routes for pick-up and drop-off in a grid network of wider area with low density, followed by fixed routes in areas with demand. Mathematical formulation for generalized costs demonstrates its benefits, with indicators proposed to select existing bus routes for conversion with the options of zonal expres… ▽ More This paper investigates the potential of autonomous minibuses which take on-demand directional routes for pick-up and drop-off in a grid network of wider area with low density, followed by fixed routes in areas with demand. Mathematical formulation for generalized costs demonstrates its benefits, with indicators proposed to select existing bus routes for conversion with the options of zonal express and parallel routes. Simulations on modeled scenarios and case studies with bus routes in Chicago show reductions in both passenger costs and generalized costs over existing fixed-route bus service between suburban areas and CBD. △ Less

Submitted 30 December, 2024; originally announced January 2025.

Comments: 38 pages, 35 figures. This is the accepted version of a work that was published in Transportation Research Record

Journal ref: Transportation Research Record 2677.1 (2023): 178-200

arXiv:2412.20667 [pdf]

doi 10.1177/03611981231185145

Highway Managed Lane Usage and Tolling for Mixed Traffic Flows with Connected Automated Vehicles (CAVs) and High-Occupancy Vehicles (HOVs)

Authors: Max T. M. Ng, Hani S. Mahmassani

Abstract: This paper investigates managed lane (ML) toll setting and its effect under mixed traffic of connected automated vehicles (CAVs), high-occupancy vehicles (HOVs), and human-driven vehicles (HDVs), with a goal to avoid flow breakdown and minimize total social cost. A mesoscopic finite-difference traffic simulation model considers the flow-density relationship at different CAV market penetration rate… ▽ More This paper investigates managed lane (ML) toll setting and its effect under mixed traffic of connected automated vehicles (CAVs), high-occupancy vehicles (HOVs), and human-driven vehicles (HDVs), with a goal to avoid flow breakdown and minimize total social cost. A mesoscopic finite-difference traffic simulation model considers the flow-density relationship at different CAV market penetration rates, lane-changing behavior, and multiple entries/exits, interacting with a reactive toll setting mechanism. The results of the Monte Carlo simulation suggest an optimal policy of untolled HOV/CAV use with HDV tolls in particular scenarios of limited CAV market penetration. Small and targeted tolling avoids flow breakdown in ML while prioritizing HOVs and other vehicles with high values of time. Extensions of the formulation and sensitivity analysis quantify the benefits of converting high-occupancy HDVs to CAVs. The optimal tolling regime combines traffic science notions of flow stability and the economics of resource allocation. △ Less

Submitted 29 December, 2024; originally announced December 2024.

Comments: 38 pages, 23 figures. This is the accepted version of a work that was published in Transportation Research Record

Journal ref: Transportation Research Record 2678.4 (2024): 505-526

arXiv:2412.19719 [pdf]

doi 10.1016/j.tre.2024.103601

Trading Off Energy Storage and Payload -- An Analytical Model for Freight Train Configuration

Authors: Max T. M. Ng, Adrian Hernandez, Pablo L. Durango-Cohen, Hani S. Mahmassani

Abstract: To support planning of alternative fuel technology (e.g., battery-electric locomotives) deployment for decarbonizing non-electrified freight rail, we develop a convex optimization formulation with a closed-form solution to determine the optimal number of energy storage tender cars in a train. The formulation shares a similar structure to an Economic Order Quantity (EOQ) model. For given market cha… ▽ More To support planning of alternative fuel technology (e.g., battery-electric locomotives) deployment for decarbonizing non-electrified freight rail, we develop a convex optimization formulation with a closed-form solution to determine the optimal number of energy storage tender cars in a train. The formulation shares a similar structure to an Economic Order Quantity (EOQ) model. For given market characteristics, cost forecasts, and technology parameters, our model captures the trade-offs between inventory carrying costs associated with trip times (including delays due to charging/refueling) and ordering costs associated with train dispatch and operation (energy, amortized equipment, and labor costs). To illustrate the framework, we find the optimal number of battery-electric energy tender cars in 22,501 freight markets (origin-destination pairs and commodities) for U.S. Class I railroads. The results display heterogeneity in optimal configurations with lighter, yet more time-sensitive shipments (e.g., intermodal) utilizing more battery tender cars. For heavier commodities (e.g., coal) with lower holding costs, single battery tender car configurations are generally optimal. The results also show that the optimal train configurations are sensitive to delays associated with recharging or swapping tender cars. △ Less

Submitted 27 December, 2024; originally announced December 2024.

Comments: 42 pages, 19 figures. This is the accepted version of a work that was published in Transportation Research Part E: Logistics and Transportation Review

Journal ref: Transportation Research Part E: Logistics and Transportation Review Volume 187, July 2024, 103601

arXiv:2412.19401 [pdf]

Joint Optimization of Multimodal Transit Frequency and Shared Autonomous Vehicle Fleet Size with Hybrid Metaheuristic and Nonlinear Programming

Authors: Max T. M. Ng, Hani S. Mahmassani, Draco Tong, Omer Verbas, Taner Cokyasar

Abstract: Shared autonomous vehicles (SAVs) bring competition to traditional transit services but redesigning multimodal transit network can utilize SAVs as feeders to enhance service efficiency and coverage. This paper presents an optimization framework for the joint multimodal transit frequency and SAV fleet size problem, a variant of the transit network frequency setting problem. The objective is to maxi… ▽ More Shared autonomous vehicles (SAVs) bring competition to traditional transit services but redesigning multimodal transit network can utilize SAVs as feeders to enhance service efficiency and coverage. This paper presents an optimization framework for the joint multimodal transit frequency and SAV fleet size problem, a variant of the transit network frequency setting problem. The objective is to maximize total transit ridership (including SAV-fed trips and subtracting boarding rejections) across multiple time periods under budget constraints, considering endogenous mode choice (transit, point-to-point SAVs, driving) and route selection, while allowing for strategic route removal by setting frequencies to zero. Due to the problem's non-linear, non-convex nature and the computational challenges of large-scale networks, we develop a hybrid solution approach that combines a metaheuristic approach (particle swarm optimization) with nonlinear programming for local solution refinement. To ensure computational tractability, the framework integrates analytical approximation models for SAV waiting times based on fleet utilization, multimodal network assignment for route choice, and multinomial logit mode choice behavior, bypassing the need for computationally intensive simulations within the main optimization loop. Applied to the Chicago metropolitan area's multimodal network, our method illustrates a 33.3% increase in transit ridership through optimized transit route frequencies and SAV integration, particularly enhancing off-peak service accessibility and strategically reallocating resources. △ Less

Submitted 22 April, 2025; v1 submitted 26 December, 2024; originally announced December 2024.

Comments: 23 pages, 5 figures, a previous version is accepted for presentation at the Conference on Advanced Systems in Public Transport and TransitData 2025 in Kyoto, Japan on 1 - 4 July 2025

arXiv:2409.19068 [pdf]

Joint Optimization of Pattern, Headway, and Fleet Size of Multiple Urban Transit Lines with Perceived Headway Consideration and Passenger Flow Allocation

Authors: Max T. M. Ng, Draco Tong, Hani S. Mahmassani, Omer Verbas, Taner Cokyasar

Abstract: This study addresses the urban transit pattern design problem, optimizing stop sequences, headways, and fleet sizes across multiple routes and periods simultaneously to minimize user costs (composed of riding, waiting, and transfer times) under operational constraints (e.g., vehicle capacity and fleet size). A destination-labeled multi-commodity network flow (MCNF) formulation is developed to solv… ▽ More This study addresses the urban transit pattern design problem, optimizing stop sequences, headways, and fleet sizes across multiple routes and periods simultaneously to minimize user costs (composed of riding, waiting, and transfer times) under operational constraints (e.g., vehicle capacity and fleet size). A destination-labeled multi-commodity network flow (MCNF) formulation is developed to solve the problem at a large scale more efficiently compared to the previous literature. The model allows for flexible pattern options without relying on pre-defined candidate sets and simultaneously considers multiple operational strategies such as express/local services, short-turning, and deadheading. It evaluates perceived headways of joint patterns for passengers, assigns passenger flows to each pattern accordingly, and allows transfers across patterns in different directions. The mixed-integer linear programming (MILP) model is demonstrated with a city-sized network of metro lines in Chicago, USA, achieving near-optimal solutions in hours. The total weighted journey times are reduced by 0.61% and 5.76% under single-route and multi-period multi-route scenarios respectively. The model provides transit agencies with an efficient tool for comprehensive service design and resource allocation, improving service quality and resource utilization without additional operational costs. △ Less

Submitted 26 December, 2024; v1 submitted 27 September, 2024; originally announced September 2024.

Comments: 25 pages, 4 figures, a previous version accepted for presentation in the 104th Transportation Research Board Annual Meeting in Washington, D.C. in January 2025

arXiv:2408.10547 [pdf, other]

Semi-on-Demand Off-Peak Transit Services with Shared Autonomous Vehicles -- Service Planning, Simulation, and Analysis in Munich, Germany

Authors: Max T. M. Ng, Roman Engelhardt, Florian Dandl, Vasileios Volakakis, Hani S. Mahmassani, Klaus Bogenberger

Abstract: This study investigates the implementation of semi-on-demand (SoD) hybrid-route services using Shared Autonomous Vehicles (SAVs) on existing transit lines. SoD services combine the cost efficiency of fixed-route buses with the flexibility of on-demand services. SAVs first serve all scheduled fixed-route stops, then drop off and pick up passengers in the pre-determined flexible-route portion, and r… ▽ More This study investigates the implementation of semi-on-demand (SoD) hybrid-route services using Shared Autonomous Vehicles (SAVs) on existing transit lines. SoD services combine the cost efficiency of fixed-route buses with the flexibility of on-demand services. SAVs first serve all scheduled fixed-route stops, then drop off and pick up passengers in the pre-determined flexible-route portion, and return to the fixed route. This study addresses four key questions: optimal fleet and vehicle sizes for peak-hour fixed-route services with SAVs and during transition (from drivers to autonomous vehicles), optimal off-peak SoD service planning, and suitable use cases. The methodology combines analytical modeling for service planning with agent-based simulation for operational analysis. We examine ten bus routes in Munich, Germany, considering full SAV and transition scenarios with varying proportions of drivers. Our findings demonstrate that the lower operating costs of SAVs improve service quality through increased frequency and smaller vehicles, even in transition scenarios. The reduced headway lowers waiting time and also favors more flexible-route operation in SoD services. The optimal SoD settings range from fully flexible to hybrid routes, where higher occupancy from the terminus favors shorter flexible routes. During the transition phase, limited fleet size and higher headways constrain the benefits of flexible-route operations. The simulation results corroborate the SoD benefits of door-to-door convenience, attracting more passengers without excessive detours and operator costs at moderate flexible-route lengths, and validate the analytical model. △ Less

Submitted 18 December, 2024; v1 submitted 20 August, 2024; originally announced August 2024.

Comments: 38 pages, 10 figures, previous version accepted for presentation at the 104th Transportation Research Board Annual Meeting, Washington, D.C

arXiv:2408.03508 [pdf]

SemiEpi: Self-driving, Closed-loop Multi-Step Growth of Semiconductor Heterostructures Guided by Machine Learning

Authors: Chao Shen, Wenkang Zhan, Kaiyao Xin, Shujie Pan, Xiaotian Cheng, Ruixiang Liu, Zhe Feng, Chaoyuan Jin, Hui Cong, Chi Xu, Bo Xu, Tien Khee Ng, Siming Chen, Chunlai Xue, Zhanguo Wang, Chao Zhao

Abstract: The semiconductor industry has prioritized automating repetitive tasks through closed-loop, self-driving experimentation, accelerating the optimization of complex multi-step processes. The emergence of machine learning (ML) has ushered in self-driving processes with minimal human intervention. This work introduces SemiEpi, a self-driving platform designed to execute molecular beam epitaxy (MBE) gr… ▽ More The semiconductor industry has prioritized automating repetitive tasks through closed-loop, self-driving experimentation, accelerating the optimization of complex multi-step processes. The emergence of machine learning (ML) has ushered in self-driving processes with minimal human intervention. This work introduces SemiEpi, a self-driving platform designed to execute molecular beam epitaxy (MBE) growth of semiconductor heterostructures through multi-step processes, in-situ monitoring, and on-the-fly feedback control. By integrating standard reactor, parameter initialization, and multiple ML models, SemiEpi identifies optimal initial conditions and proposes experiments for multi-step heterostructure growth, eliminating the need for extensive expertise in MBE processes. SemiEpi initializes material growth parameters tailored to specific material characteristics, and fine-tuned control over the growth process is then achieved through ML optimization. We optimize the growth for InAs quantum dots (QDs) heterostructures to showcase the power of SemiEpi, achieving a QD density of 5E10/cm2, 1.6-fold increased photoluminescence (PL) intensity and reduced full width at half maximum (FWHM) of 29.13 meV. This work highlights the potential of closed-loop, ML-guided systems to address challenges in multi-step growth. Our method is critical to achieve repeatable materials growth using commercially scalable tools. Furthermore, our strategy facilitates developing a hardware-independent process and enhancing process repeatability and stability, even without exhaustive knowledge of growth parameters. △ Less

Submitted 5 January, 2025; v1 submitted 6 August, 2024; originally announced August 2024.

Comments: 5 figures

arXiv:2403.15804 [pdf, other]

Semi-on-Demand Hybrid Transit Route Design with Shared Autonomous Mobility Services

Authors: Max T. M. Ng, Florian Dandl, Hani S. Mahmassani, Klaus Bogenberger

Abstract: This study examines the route design of a semi-on-demand hybrid route directional service in the public transit network, offering on-demand flexible route service in low-density areas and fixed route service in higher-density areas with Shared Autonomous Mobility Service (SAMS). The study develops analytically tractable cost expressions that capture access, waiting, and riding costs for users, and… ▽ More This study examines the route design of a semi-on-demand hybrid route directional service in the public transit network, offering on-demand flexible route service in low-density areas and fixed route service in higher-density areas with Shared Autonomous Mobility Service (SAMS). The study develops analytically tractable cost expressions that capture access, waiting, and riding costs for users, and distance-based operating and time-based vehicle costs for operators. Two formulations are presented for strategic and tactical decisions in flexible route portion, fleet size, headway, and vehicle size optimization, enabling the determination of route types between fixed, hybrid, and flexible routes based on demand, cost, and operational parameters. The practical applications and benefits of semi-on-demand feeders are demonstrated with numerical examples and a large-scale case study in the Chicago metropolitan area. Findings reveal scenarios in which flexible route portions serving passengers located further away reduce total costs, particularly user costs. Lower operating costs in lower-demand areas favor more flexible routes, whereas higher demand densities favor more traditional line-based operations. On two studied lines, a current cost forecast favors smaller vehicles with flexible routes, but operating constraints and higher operating costs would favor bigger vehicles with hybrid routes. The study provides an analytical tool to design SAMS as directional services and transit feeders, and tractable continuous approximation formulations for future research in transit network design. △ Less

Submitted 7 August, 2024; v1 submitted 23 March, 2024; originally announced March 2024.

Comments: 24 pages, 12 figures, previous version presented at the 103rd Transportation Research Board Annual Meeting, Washington, D.C

arXiv:2402.16908 [pdf]

doi 10.1038/s41467-025-59872-2

Lightweight, error-tolerant edge detection using memristor-enabled stochastic logics

Authors: Lekai Song, Pengyu Liu, Jingfang Pei, Yang Liu, Songwei Liu, Shengbo Wang, Leonard W. T. Ng, Tawfique Hasan, Kong-Pang Pun, Shuo Gao, Guohua Hu

Abstract: The demand for efficient edge vision has spurred the interest in developing stochastic computing approaches for performing image processing tasks. Memristors with inherent stochasticity readily introduce probability into the computations and thus enable stochastic image processing computations. Here, we present a stochastic computing approach for edge detection, a fundamental image processing tech… ▽ More The demand for efficient edge vision has spurred the interest in developing stochastic computing approaches for performing image processing tasks. Memristors with inherent stochasticity readily introduce probability into the computations and thus enable stochastic image processing computations. Here, we present a stochastic computing approach for edge detection, a fundamental image processing technique, facilitated with memristor-enabled stochastic logics. Specifically, we integrate the memristors with logic circuits and harness the stochasticity from the memristors to realize compact stochastic logics for stochastic number encoding and processing. The stochastic numbers, exhibiting well-regulated probabilities and correlations, can be processed to perform logic operations with statistical probabilities. This can facilitate lightweight stochastic edge detection for edge visual scenarios characterized with high-level noise errors. As a practical demonstration, we implement a hardware stochastic Roberts cross operator using the stochastic logics, and prove its exceptional edge detection performance, remarkably, with 95% less computational cost while withstanding 50% bit-flip errors. The results underscore the great potential of our stochastic edge detection approach in developing lightweight, error-tolerant edge vision hardware and systems for autonomous driving, virtual/augmented reality, medical imaging diagnosis, industrial automation, and beyond. △ Less

Submitted 20 March, 2024; v1 submitted 25 February, 2024; originally announced February 2024.

arXiv:2402.08788 [pdf]

Syllable based DNN-HMM Cantonese Speech to Text System

Authors: Timothy Wong, Claire Li, Sam Lam, Billy Chiu, Qin Lu, Minglei Li, Dan Xiong, Roy Shing Yu, Vincent T. Y. Ng

Abstract: This paper reports our work on building up a Cantonese Speech-to-Text (STT) system with a syllable based acoustic model. This is a part of an effort in building a STT system to aid dyslexic students who have cognitive deficiency in writing skills but have no problem expressing their ideas through speech. For Cantonese speech recognition, the basic unit of acoustic models can either be the conventi… ▽ More This paper reports our work on building up a Cantonese Speech-to-Text (STT) system with a syllable based acoustic model. This is a part of an effort in building a STT system to aid dyslexic students who have cognitive deficiency in writing skills but have no problem expressing their ideas through speech. For Cantonese speech recognition, the basic unit of acoustic models can either be the conventional Initial-Final (IF) syllables, or the Onset-Nucleus-Coda (ONC) syllables where finals are further split into nucleus and coda to reflect the intra-syllable variations in Cantonese. By using the Kaldi toolkit, our system is trained using the stochastic gradient descent optimization model with the aid of GPUs for the hybrid Deep Neural Network and Hidden Markov Model (DNN-HMM) with and without I-vector based speaker adaptive training technique. The input features of the same Gaussian Mixture Model with speaker adaptive training (GMM-SAT) to DNN are used in all cases. Experiments show that the ONC-based syllable acoustic modeling with I-vector based DNN-HMM achieves the best performance with the word error rate (WER) of 9.66% and the real time factor (RTF) of 1.38812. △ Less

Submitted 13 February, 2024; originally announced February 2024.

Comments: 7 pages, 3 figures, LREC 2016

MSC Class: 94-06 ACM Class: I.2.7

arXiv:2311.16047 [pdf]

doi 10.1117/12.2652661

Observer study-based evaluation of TGAN architecture used to generate oncological PET images

Authors: Roberto Fedrigo, Fereshteh Yousefirizi, Ziping Liu, Abhinav K. Jha, Robert V. Bergen, Jean-Francois Rajotte, Raymond T. Ng, Ingrid Bloise, Sara Harsini, Dan J. Kadrmas, Carlos Uribe, Arman Rahmim

Abstract: The application of computer-vision algorithms in medical imaging has increased rapidly in recent years. However, algorithm training is challenging due to limited sample sizes, lack of labeled samples, as well as privacy concerns regarding data sharing. To address these issues, we previously developed (Bergen et al. 2022) a synthetic PET dataset for Head and Neck (H and N) cancer using the temporal… ▽ More The application of computer-vision algorithms in medical imaging has increased rapidly in recent years. However, algorithm training is challenging due to limited sample sizes, lack of labeled samples, as well as privacy concerns regarding data sharing. To address these issues, we previously developed (Bergen et al. 2022) a synthetic PET dataset for Head and Neck (H and N) cancer using the temporal generative adversarial network (TGAN) architecture and evaluated its performance segmenting lesions and identifying radiomics features in synthesized images. In this work, a two-alternative forced-choice (2AFC) observer study was performed to quantitatively evaluate the ability of human observers to distinguish between real and synthesized oncological PET images. In the study eight trained readers, including two board-certified nuclear medicine physicians, read 170 real/synthetic image pairs presented as 2D-transaxial using a dedicated web app. For each image pair, the observer was asked to identify the real image and input their confidence level with a 5-point Likert scale. P-values were computed using the binomial test and Wilcoxon signed-rank test. A heat map was used to compare the response accuracy distribution for the signed-rank test. Response accuracy for all observers ranged from 36.2% [27.9-44.4] to 63.1% [54.8-71.3]. Six out of eight observers did not identify the real image with statistical significance, indicating that the synthetic dataset was reasonably representative of oncological PET images. Overall, this study adds validity to the realism of our simulated H&N cancer dataset, which may be implemented in the future to train AI algorithms while favoring patient confidentiality and privacy protection. △ Less

Submitted 27 November, 2023; v1 submitted 27 November, 2023; originally announced November 2023.

arXiv:2310.07062 [pdf, other]

Acoustic Model Fusion for End-to-end Speech Recognition

Authors: Zhihong Lei, Mingbin Xu, Shiyi Han, Leo Liu, Zhen Huang, Tim Ng, Yuanyuan Zhang, Ernest Pusateri, Mirko Hannemann, Yaqiao Deng, Man-Hung Siu

Abstract: Recent advances in deep learning and automatic speech recognition (ASR) have enabled the end-to-end (E2E) ASR system and boosted the accuracy to a new level. The E2E systems implicitly model all conventional ASR components, such as the acoustic model (AM) and the language model (LM), in a single network trained on audio-text pairs. Despite this simpler system architecture, fusing a separate LM, tr… ▽ More Recent advances in deep learning and automatic speech recognition (ASR) have enabled the end-to-end (E2E) ASR system and boosted the accuracy to a new level. The E2E systems implicitly model all conventional ASR components, such as the acoustic model (AM) and the language model (LM), in a single network trained on audio-text pairs. Despite this simpler system architecture, fusing a separate LM, trained exclusively on text corpora, into the E2E system has proven to be beneficial. However, the application of LM fusion presents certain drawbacks, such as its inability to address the domain mismatch issue inherent to the internal AM. Drawing inspiration from the concept of LM fusion, we propose the integration of an external AM into the E2E system to better address the domain mismatch. By implementing this novel approach, we have achieved a significant reduction in the word error rate, with an impressive drop of up to 14.3% across varied test sets. We also discovered that this AM fusion approach is particularly beneficial in enhancing named entity recognition. △ Less

Submitted 10 October, 2023; originally announced October 2023.

arXiv:2308.13163 [pdf, other]

doi 10.1109/LRA.2023.3308015

Snapp: An Agile Robotic Fish with 3-D Maneuverability for Open Water Swim

Authors: Timothy J. K. Ng, Nan Chen, Fu Zhang

Abstract: Fish exhibit impressive locomotive performance and agility in complex underwater environments, using their undulating tails and pectoral fins for propulsion and maneuverability. Replicating these abilities in robotic fish is challenging; existing designs focus on either fast swimming or directional control at limited speeds, mainly within a confined environment. To address these limitations, we de… ▽ More Fish exhibit impressive locomotive performance and agility in complex underwater environments, using their undulating tails and pectoral fins for propulsion and maneuverability. Replicating these abilities in robotic fish is challenging; existing designs focus on either fast swimming or directional control at limited speeds, mainly within a confined environment. To address these limitations, we designed Snapp, an integrated robotic fish capable of swimming in open water with high speeds and full 3-dimensional maneuverability. A novel cyclic-differential method is layered on the mechanism. It integrates propulsion and yaw-steering for fast course corrections. Two independent pectoral fins provide pitch and roll control. We evaluated Snapp in open water environments. We demonstrated significant improvements in speed and maneuverability, achieving swimming speeds of 1.5 m/s (1.7 Body Lengths per second) and performing complex maneuvers, such as a figure-8 and S-shape trajectory. Instantaneous yaw changes of 15$^{\circ}$ in 0.4 s, a minimum turn radius of 0.85 m, and maximum pitch and roll rates of 3.5 rad/s and 1 rad/s, respectively, were recorded. Our results suggest that Snapp's swimming capabilities have excellent practical prospects for open seas and contribute significantly to developing agile robotic fishes. △ Less

Submitted 24 August, 2023; originally announced August 2023.

Comments: 8 pages, 17 figures, to be publish in IEEE Robotics and Automation Letters The accompanying video can be found at this link: https://youtu.be/1bGmlN0Jriw

arXiv:2307.16075 [pdf]

doi 10.1016/j.trc.2024.104575

Redesigning Large-Scale Multimodal Transit Networks with Shared Autonomous Mobility Services

Authors: Max T. M. Ng, Hani S. Mahmassani, Ömer Verbas, Taner Cokyasar, Roman Engelhardt

Abstract: This study addresses a large-scale multimodal transit network design problem, with Shared Autonomous Mobility Services (SAMS) as both transit feeders and an origin-to-destination mode. The framework captures spatial demand and modal characteristics, considers intermodal transfers and express services, determines transit infrastructure investment and path flows, and generates transit routes. A syst… ▽ More This study addresses a large-scale multimodal transit network design problem, with Shared Autonomous Mobility Services (SAMS) as both transit feeders and an origin-to-destination mode. The framework captures spatial demand and modal characteristics, considers intermodal transfers and express services, determines transit infrastructure investment and path flows, and generates transit routes. A system-optimal multimodal transit network is designed with minimum total door-to-door generalized costs of users and operators, satisfying transit origin-destination demand within a pre-set infrastructure budget. Firstly, the geography, demand, and modes in each zone are characterized with continuous approximation. The decisions of network link investment and multimodal path flows in zonal connection optimization are formulated as a minimum-cost multi-commodity network flow (MCNF) problem and solved efficiently with a mixed-integer linear programming (MILP) solver. Subsequently, the route generation problem is solved by expanding the MCNF formulation to minimize intramodal transfers. The model is illustrated through a set of experiments with the Chicago network comprised of 50 zones and seven modes, under three scenarios. The computational results present savings in traveler journey time and operator cost demonstrating the potential benefits of collaboration between multimodal transit systems and SAMS. △ Less

Submitted 27 March, 2024; v1 submitted 29 July, 2023; originally announced July 2023.

Comments: 48 pages, 18 figures, accepted for publication in Transportation Research Part C: Emerging Technologies, and presentation in the 25th International Symposium on Transportation and Traffic Theory (ISTTT25)

arXiv:2206.06448 [pdf]

Assessing Privacy Leakage in Synthetic 3-D PET Imaging using Transversal GAN

Authors: Robert V. Bergen, Jean-Francois Rajotte, Fereshteh Yousefirizi, Arman Rahmim, Raymond T. Ng

Abstract: Training computer-vision related algorithms on medical images for disease diagnosis or image segmentation is difficult in large part due to privacy concerns. For this reason, generative image models are highly sought after to facilitate data sharing. However, 3-D generative models are understudied, and investigation of their privacy leakage is needed. We introduce our 3-D generative model, Transve… ▽ More Training computer-vision related algorithms on medical images for disease diagnosis or image segmentation is difficult in large part due to privacy concerns. For this reason, generative image models are highly sought after to facilitate data sharing. However, 3-D generative models are understudied, and investigation of their privacy leakage is needed. We introduce our 3-D generative model, Transversal GAN (TrGAN), using head & neck PET images which are conditioned on tumour masks as a case study. We define quantitative measures of image fidelity, utility and privacy for our model. These metrics are evaluated in the course of training to identify ideal fidelity, utility and privacy trade-offs and establish the relationships between these parameters. We show that the discriminator of the TrGAN is vulnerable to attack, and that an attacker can identify which samples were used in training with almost perfect accuracy (AUC = 0.99). We also show that an attacker with access to only the generator cannot reliably classify whether a sample had been used for training (AUC = 0.51). This suggests that TrGAN generators, but not discriminators, may be used for sharing synthetic 3-D PET data with minimal privacy risk while maintaining good utility and fidelity. △ Less

Submitted 31 October, 2023; v1 submitted 13 June, 2022; originally announced June 2022.

Comments: arXiv admin note: text overlap with arXiv:2111.01866

arXiv:2111.01866 [pdf]

3-D PET Image Generation with tumour masks using TGAN

Authors: Robert V Bergen, Jean-Francois Rajotte, Fereshteh Yousefirizi, Ivan S Klyuzhin, Arman Rahmim, Raymond T. Ng

Abstract: Training computer-vision related algorithms on medical images for disease diagnosis or image segmentation is difficult due to the lack of training data, labeled samples, and privacy concerns. For this reason, a robust generative method to create synthetic data is highly sought after. However, most three-dimensional image generators require additional image input or are extremely memory intensive.… ▽ More Training computer-vision related algorithms on medical images for disease diagnosis or image segmentation is difficult due to the lack of training data, labeled samples, and privacy concerns. For this reason, a robust generative method to create synthetic data is highly sought after. However, most three-dimensional image generators require additional image input or are extremely memory intensive. To address these issues we propose adapting video generation techniques for 3-D image generation. Using the temporal GAN (TGAN) architecture, we show we are able to generate realistic head and neck PET images. We also show that by conditioning the generator on tumour masks, we are able to control the geometry and location of the tumour in the generated images. To test the utility of the synthetic images, we train a segmentation model using the synthetic images. Synthetic images conditioned on real tumour masks are automatically segmented, and the corresponding real images are also segmented. We evaluate the segmentations using the Dice score and find the segmentation algorithm performs similarly on both datasets (0.65 synthetic data, 0.70 real data). Various radionomic features are then calculated over the segmented tumour volumes for each data set. A comparison of the real and synthetic feature distributions show that seven of eight feature distributions had statistically insignificant differences (p>0.05). Correlation coefficients were also calculated between all radionomic features and it is shown that all of the strong statistical correlations in the real data set are preserved in the synthetic data set. △ Less

Submitted 2 November, 2021; originally announced November 2021.

arXiv:2008.05514 [pdf, other]

doi 10.1109/LSP.2020.3031480

Online Automatic Speech Recognition with Listen, Attend and Spell Model

Authors: Roger Hsiao, Dogan Can, Tim Ng, Ruchir Travadi, Arnab Ghoshal

Abstract: The Listen, Attend and Spell (LAS) model and other attention-based automatic speech recognition (ASR) models have known limitations when operated in a fully online mode. In this paper, we analyze the online operation of LAS models to demonstrate that these limitations stem from the handling of silence regions and the reliability of online attention mechanism at the edge of input buffers. We propos… ▽ More The Listen, Attend and Spell (LAS) model and other attention-based automatic speech recognition (ASR) models have known limitations when operated in a fully online mode. In this paper, we analyze the online operation of LAS models to demonstrate that these limitations stem from the handling of silence regions and the reliability of online attention mechanism at the edge of input buffers. We propose a novel and simple technique that can achieve fully online recognition while meeting accuracy and latency targets. For the Mandarin dictation task, our proposed approach can achieve a character error rate in online operation that is within 4% relative to an offline LAS model. The proposed online LAS model operates at 12% lower latency relative to a conventional neural network hidden Markov model hybrid of comparable accuracy. We have validated the proposed method through a production scale deployment, which, to the best of our knowledge, is the first such deployment of a fully online LAS model. △ Less

Submitted 13 October, 2020; v1 submitted 12 August, 2020; originally announced August 2020.

Comments: 5 pages, 4 figures, this version is submitted to IEEE Signal Processing Letters

arXiv:2005.13605 [pdf, other]

D2D: Keypoint Extraction with Describe to Detect Approach

Authors: Yurun Tian, Vassileios Balntas, Tony Ng, Axel Barroso-Laguna, Yiannis Demiris, Krystian Mikolajczyk

Abstract: In this paper, we present a novel approach that exploits the information within the descriptor space to propose keypoint locations. Detect then describe, or detect and describe jointly are two typical strategies for extracting local descriptors. In contrast, we propose an approach that inverts this process by first describing and then detecting the keypoint locations. % Describe-to-Detect (D2D) le… ▽ More In this paper, we present a novel approach that exploits the information within the descriptor space to propose keypoint locations. Detect then describe, or detect and describe jointly are two typical strategies for extracting local descriptors. In contrast, we propose an approach that inverts this process by first describing and then detecting the keypoint locations. % Describe-to-Detect (D2D) leverages successful descriptor models without the need for any additional training. Our method selects keypoints as salient locations with high information content which is defined by the descriptors rather than some independent operators. We perform experiments on multiple benchmarks including image matching, camera localisation, and 3D reconstruction. The results indicate that our method improves the matching performance of various descriptors and that it generalises across methods and tasks. △ Less

Submitted 27 May, 2020; originally announced May 2020.

arXiv:2005.09336 [pdf, ps, other]

A systematic comparison of grapheme-based vs. phoneme-based label units for encoder-decoder-attention models

Authors: Mohammad Zeineldeen, Albert Zeyer, Wei Zhou, Thomas Ng, Ralf Schlüter, Hermann Ney

Abstract: Following the rationale of end-to-end modeling, CTC, RNN-T or encoder-decoder-attention models for automatic speech recognition (ASR) use graphemes or grapheme-based subword units based on e.g. byte-pair encoding (BPE). The mapping from pronunciation to spelling is learned completely from data. In contrast to this, classical approaches to ASR employ secondary knowledge sources in the form of phone… ▽ More Following the rationale of end-to-end modeling, CTC, RNN-T or encoder-decoder-attention models for automatic speech recognition (ASR) use graphemes or grapheme-based subword units based on e.g. byte-pair encoding (BPE). The mapping from pronunciation to spelling is learned completely from data. In contrast to this, classical approaches to ASR employ secondary knowledge sources in the form of phoneme lists to define phonetic output labels and pronunciation lexica. In this work, we do a systematic comparison between grapheme- and phoneme-based output labels for an encoder-decoder-attention ASR model. We investigate the use of single phonemes as well as BPE-based phoneme groups as output labels of our model. To preserve a simplified and efficient decoder design, we also extend the phoneme set by auxiliary units to be able to distinguish homophones. Experiments performed on the Switchboard 300h and LibriSpeech benchmarks show that phoneme-based modeling is competitive to grapheme-based encoder-decoder-attention modeling. △ Less

Submitted 15 April, 2021; v1 submitted 19 May, 2020; originally announced May 2020.

Comments: 5 pages, 6 tables

arXiv:1910.01992 [pdf, other]

SNDCNN: Self-normalizing deep CNNs with scaled exponential linear units for speech recognition

Authors: Zhen Huang, Tim Ng, Leo Liu, Henry Mason, Xiaodan Zhuang, Daben Liu

Abstract: Very deep CNNs achieve state-of-the-art results in both computer vision and speech recognition, but are difficult to train. The most popular way to train very deep CNNs is to use shortcut connections (SC) together with batch normalization (BN). Inspired by Self- Normalizing Neural Networks, we propose the self-normalizing deep CNN (SNDCNN) based acoustic model topology, by removing the SC/BN and r… ▽ More Very deep CNNs achieve state-of-the-art results in both computer vision and speech recognition, but are difficult to train. The most popular way to train very deep CNNs is to use shortcut connections (SC) together with batch normalization (BN). Inspired by Self- Normalizing Neural Networks, we propose the self-normalizing deep CNN (SNDCNN) based acoustic model topology, by removing the SC/BN and replacing the typical RELU activations with scaled exponential linear unit (SELU) in ResNet-50. SELU activations make the network self-normalizing and remove the need for both shortcut connections and batch normalization. Compared to ResNet- 50, we can achieve the same or lower (up to 4.5% relative) word error rate (WER) while boosting both training and inference speed by 60%-80%. We also explore other model inference optimization schemes to further reduce latency for production use. △ Less

Submitted 23 March, 2020; v1 submitted 4 October, 2019; originally announced October 2019.

Showing 1–22 of 22 results for author: Ng, T