Search | arXiv e-print repository

Evaluation of Traffic Signals for Daily Traffic Pattern

Authors: Mohammad Shokrolah Shirazi, Hung-Fu Chang

Abstract: The turning movement count data is crucial for traffic signal design, intersection geometry planning, traffic flow, and congestion analysis. This work proposes three methods called dynamic, static, and hybrid configuration for TMC-based traffic signals. A vision-based tracking system is developed to estimate the TMC of six intersections in Las Vegas using traffic cameras. The intersection design,… ▽ More The turning movement count data is crucial for traffic signal design, intersection geometry planning, traffic flow, and congestion analysis. This work proposes three methods called dynamic, static, and hybrid configuration for TMC-based traffic signals. A vision-based tracking system is developed to estimate the TMC of six intersections in Las Vegas using traffic cameras. The intersection design, route (e.g. vehicle movement directions), and signal configuration files with compatible formats are synthesized and imported into Simulation of Urban MObility for signal evaluation with realistic data. The initial experimental results based on estimated waiting times indicate that the cycle time of 90 and 120 seconds works best for all intersections. In addition, four intersections show better performance for dynamic signal timing configuration, and the other two with lower performance have a lower ratio of total vehicle count to total lanes of the intersection leg. Since daily traffic flow often exhibits a bimodal pattern, we propose a hybrid signal method that switches between dynamic and static methods, adapting to peak and off-peak traffic conditions for improved flow management. So, a built-in traffic generator module creates vehicle routes for 4 hours, including peak hours, and a signal design module produces signal schedule cycles according to static, dynamic, and hybrid methods. Vehicle count distributions are weighted differently for each zone (i.e., West, North, East, South) to generate diverse traffic patterns. The extended experimental results for 6 intersections with 4 hours of simulation time imply that zone-based traffic pattern distributions affect signal design selection. Although the static method works great for evenly zone-based traffic distribution, the hybrid method works well for highly weighted traffic at intersection pairs of the West-East and North-South zones. △ Less

Submitted 26 June, 2025; originally announced June 2025.

arXiv:2506.19141 [pdf, ps, other]

EEG Foundation Challenge: From Cross-Task to Cross-Subject EEG Decoding

Authors: Bruno Aristimunha, Dung Truong, Pierre Guetschel, Seyed Yahya Shirazi, Isabelle Guyon, Alexandre R. Franco, Michael P. Milham, Aviv Dotan, Scott Makeig, Alexandre Gramfort, Jean-Remi King, Marie-Constance Corsi, Pedro A. Valdés-Sosa, Amit Majumdar, Alan Evans, Terrence J Sejnowski, Oren Shriki, Sylvain Chevallier, Arnaud Delorme

Abstract: Current electroencephalogram (EEG) decoding models are typically trained on small numbers of subjects performing a single task. Here, we introduce a large-scale, code-submission-based competition comprising two challenges. First, the Transfer Challenge asks participants to build and test a model that can zero-shot decode new tasks and new subjects from their EEG data. Second, the Psychopathology f… ▽ More Current electroencephalogram (EEG) decoding models are typically trained on small numbers of subjects performing a single task. Here, we introduce a large-scale, code-submission-based competition comprising two challenges. First, the Transfer Challenge asks participants to build and test a model that can zero-shot decode new tasks and new subjects from their EEG data. Second, the Psychopathology factor prediction Challenge asks participants to infer subject measures of mental health from EEG data. For this, we use an unprecedented, multi-terabyte dataset of high-density EEG signals (128 channels) recorded from over 3,000 child to young adult subjects engaged in multiple active and passive tasks. We provide several tunable neural network baselines for each of these two challenges, including a simple network and demographic-based regression models. Developing models that generalise across tasks and individuals will pave the way for ML network architectures capable of adapting to EEG data collected from diverse tasks and individuals. Similarly, predicting mental health-relevant personality trait values from EEG might identify objective biomarkers useful for clinical diagnosis and design of personalised treatment for psychological conditions. Ultimately, the advances spurred by this challenge could contribute to the development of computational psychiatry and useful neurotechnology, and contribute to breakthroughs in both fundamental neuroscience and applied clinical research. △ Less

Submitted 23 June, 2025; originally announced June 2025.

Comments: Approved at Neurips Competition track. webpage: https://eeg2025.github.io/

Journal ref: The Thirty-Ninth Annual Conference on Neural Information Processing Systems (NeurIPS), 2025

arXiv:2502.02866 [pdf, other]

A Systematic Approach for Assessing Large Language Models' Test Case Generation Capability

Authors: Hung-Fu Chang, Mohammad Shokrolah Shirazi

Abstract: Software testing ensures the quality and reliability of software products, but manual test case creation is labor-intensive. With the rise of large language models (LLMs), there is growing interest in unit test creation with LLMs. However, effective assessment of LLM-generated test cases is limited by the lack of standardized benchmarks that comprehensively cover diverse programming scenarios. To… ▽ More Software testing ensures the quality and reliability of software products, but manual test case creation is labor-intensive. With the rise of large language models (LLMs), there is growing interest in unit test creation with LLMs. However, effective assessment of LLM-generated test cases is limited by the lack of standardized benchmarks that comprehensively cover diverse programming scenarios. To address the assessment of LLM's test case generation ability and lacking dataset for evaluation, we propose the Generated Benchmark from Control-Flow Structure and Variable Usage Composition (GBCV) approach, which systematically generates programs used for evaluating LLMs' test generation capabilities. By leveraging basic control-flow structures and variable usage, GBCV provides a flexible framework to create a spectrum of programs ranging from simple to complex. Because GPT-4o and GPT-3-Turbo are publicly accessible models, to present real-world regular user's use case, we use GBCV to assess LLM performance on them. Our findings indicate that GPT-4o performs better on complex program structures, while all models effectively detect boundary values in simple conditions but face challenges with arithmetic computations. This study highlights the strengths and limitations of LLMs in test generation, provides a benchmark framework, and suggests directions for future improvement. △ Less

Submitted 4 February, 2025; originally announced February 2025.

Comments: 17 pages, 9 figures

ACM Class: D.2.5; I.2.7

arXiv:2412.14519 [pdf, other]

Optimizing Big Active Data Management Systems

Authors: Shahrzad Haji Amin Shirazi, Xikui Wang, Michael J. Carey, Vassilis J. Tsotras

Abstract: Within the dynamic world of Big Data, traditional systems typically operate in a passive mode, processing and responding to user queries by returning the requested data. However, this methodology falls short of meeting the evolving demands of users who not only wish to analyze data but also to receive proactive updates on topics of interest. To bridge this gap, Big Active Data (BAD) frameworks hav… ▽ More Within the dynamic world of Big Data, traditional systems typically operate in a passive mode, processing and responding to user queries by returning the requested data. However, this methodology falls short of meeting the evolving demands of users who not only wish to analyze data but also to receive proactive updates on topics of interest. To bridge this gap, Big Active Data (BAD) frameworks have been proposed to support extensive data subscriptions and analytics for millions of subscribers. As data volumes and the number of interested users continue to increase, the imperative to optimize BAD systems for enhanced scalability, performance, and efficiency becomes paramount. To this end, this paper introduces three main optimizations, namely: strategic aggregation, intelligent modifications to the query plan, and early result filtering, all aimed at reinforcing a BAD platform's capability to actively manage and efficiently process soaring rates of incoming data and distribute notifications to larger numbers of subscribers. △ Less

Submitted 20 December, 2024; v1 submitted 18 December, 2024; originally announced December 2024.

arXiv:2302.10057 [pdf, ps, other]

An Accurate Model to Estimate 5G Propagation Path Loss for the Indoor Environment

Authors: Hassan Zakeri, Reza Sarraf Shirazi, Gholamreza Moradi

Abstract: This paper presents a new large-scale propagation path loss model to design a fifth-generation (5G) wireless communication system for indoor environments. Simulations for the indoor environment, for all polarization at non-line-of-sight (NLOS) and line-of-sight (LOS), which are performed per meter over a distance of 47 m between each of the separated transmitter antenna (TX) and the receiver anten… ▽ More This paper presents a new large-scale propagation path loss model to design a fifth-generation (5G) wireless communication system for indoor environments. Simulations for the indoor environment, for all polarization at non-line-of-sight (NLOS) and line-of-sight (LOS), which are performed per meter over a distance of 47 m between each of the separated transmitter antenna (TX) and the receiver antenna (RX) positions to compare better the proposed extensive flexible path loss model with previous models. All the simulations are conducted at the Abu-rayhan buildings at the Amirkabir University of Technology. The results demonstrated that the simple presented model with a single parameter denoted ZMS can predict the expansive path loss over distance more accurately. The values of the path loss exponent (PLE) for the LOS scenario are simulated and achieved at 3.63, 1.81, and 3.42 for the V-H, V-V, and V-Omni antenna polarizations, and for NLOS is 6.11, 4.21, and 5.23 at the 28 GHz frequency for all the polarization antenna type V-H, V-V, and V-Omni, appropriately. △ Less

Submitted 26 February, 2023; v1 submitted 20 February, 2023; originally announced February 2023.

arXiv:2202.04650 [pdf]

doi 10.1109/ACCESS.2021.3131768

Semantic Segmentation of Anaemic RBCs Using Multilevel Deep Convolutional Encoder-Decoder Network

Authors: Muhammad Shahzad, Arif Iqbal Umar, Syed Hamad Shirazi, Israr Ahmed Shaikh

Abstract: Pixel-level analysis of blood images plays a pivotal role in diagnosing blood-related diseases, especially Anaemia. These analyses mainly rely on an accurate diagnosis of morphological deformities like shape, size, and precise pixel counting. In traditional segmentation approaches, instance or object-based approaches have been adopted that are not feasible for pixel-level analysis. The convolution… ▽ More Pixel-level analysis of blood images plays a pivotal role in diagnosing blood-related diseases, especially Anaemia. These analyses mainly rely on an accurate diagnosis of morphological deformities like shape, size, and precise pixel counting. In traditional segmentation approaches, instance or object-based approaches have been adopted that are not feasible for pixel-level analysis. The convolutional neural network (CNN) model required a large dataset with detailed pixel-level information for the semantic segmentation of red blood cells in the deep learning domain. In current research work, we address these problems by proposing a multi-level deep convolutional encoder-decoder network along with two state-of-the-art healthy and Anaemic-RBC datasets. The proposed multi-level CNN model preserved pixel-level semantic information extracted in one layer and then passed to the next layer to choose relevant features. This phenomenon helps to precise pixel-level counting of healthy and anaemic-RBC elements along with morphological analysis. For experimental purposes, we proposed two state-of-the-art RBC datasets, i.e., Healthy-RBCs and Anaemic-RBCs dataset. Each dataset contains 1000 images, ground truth masks, relevant, complete blood count (CBC), and morphology reports for performance evaluation. The proposed model results were evaluated using crossmatch analysis with ground truth mask by finding IoU, individual training, validation, testing accuracies, and global accuracies using a 05-fold training procedure. This model got training, validation, and testing accuracies as 0.9856, 0.9760, and 0.9720 on the Healthy-RBC dataset and 0.9736, 0.9696, and 0.9591 on an Anaemic-RBC dataset. The IoU and BFScore of the proposed model were 0.9311, 0.9138, and 0.9032, 0.8978 on healthy and anaemic datasets, respectively. △ Less

Submitted 9 February, 2022; originally announced February 2022.

arXiv:2001.10188 [pdf]

doi 10.1155/2020/4015323

Robust Method for Semantic Segmentation of Whole-Slide Blood Cell Microscopic Image

Authors: Muhammad Shahzad, Arif Iqbal Umar, Muazzam A. Khan, Syed Hamad Shirazi, Zakir Khan, Waqas Yousaf

Abstract: Previous works on segmentation of SEM (scanning electron microscope) blood cell image ignore the semantic segmentation approach of whole-slide blood cell segmentation. In the proposed work, we address the problem of whole-slide blood cell segmentation using the semantic segmentation approach. We design a novel convolutional encoder-decoder framework along with VGG-16 as the pixel-level feature ext… ▽ More Previous works on segmentation of SEM (scanning electron microscope) blood cell image ignore the semantic segmentation approach of whole-slide blood cell segmentation. In the proposed work, we address the problem of whole-slide blood cell segmentation using the semantic segmentation approach. We design a novel convolutional encoder-decoder framework along with VGG-16 as the pixel-level feature extraction model. -e proposed framework comprises 3 main steps: First, all the original images along with manually generated ground truth masks of each blood cell type are passed through the preprocessing stage. In the preprocessing stage, pixel-level labeling, RGB to grayscale conversion of masked image and pixel fusing, and unity mask generation are performed. After that, VGG16 is loaded into the system, which acts as a pretrained pixel-level feature extraction model. In the third step, the training process is initiated on the proposed model. We have evaluated our network performance on three evaluation metrics. We obtained outstanding results with respect to classwise, as well as global and mean accuracies. Our system achieved classwise accuracies of 97.45%, 93.34%, and 85.11% for RBCs, WBCs, and platelets, respectively, while global and mean accuracies remain 97.18% and 91.96%, respectively. △ Less

Submitted 28 January, 2020; originally announced January 2020.

Comments: 13 pages, 13 figures

Journal ref: Volume 2020, Article ID 4015323, 13 pages

arXiv:1906.01345 [pdf, other]

SPECCFI: Mitigating Spectre Attacks using CFI Informed Speculation

Authors: Esmaeil Mohammadian Koruyeh, Shirin Haji Amin Shirazi, Khaled N. Khasawneh, Chengyu Song, Nael Abu-Ghazaleh

Abstract: Spectre attacks and their many subsequent variants are a new vulnerability class affecting modern CPUs. The attacks rely on the ability to misguide speculative execution, generally by exploiting the branch prediction structures, to execute a vulnerable code sequence speculatively. In this paper, we propose to use Control-Flow Integrity (CFI), a security technique used to stop control-flow hijackin… ▽ More Spectre attacks and their many subsequent variants are a new vulnerability class affecting modern CPUs. The attacks rely on the ability to misguide speculative execution, generally by exploiting the branch prediction structures, to execute a vulnerable code sequence speculatively. In this paper, we propose to use Control-Flow Integrity (CFI), a security technique used to stop control-flow hijacking attacks, on the committed path, to prevent speculative control-flow from being hijacked to launch the most dangerous variants of the Spectre attacks (Spectre-BTB and Spectre-RSB). Specifically, CFI attempts to constrain the possible targets of an indirect branch to a set of legal targets defined by a pre-calculated control-flow graph (CFG). As CFI is being adopted by commodity software (e.g., Windows and Android) and commodity hardware (e.g., Intel's CET and ARM's BTI), the CFI information becomes readily available through the hardware CFI extensions. With the CFI information, we apply CFI principles to also constrain illegal control-flow during speculative execution. Specifically, our proposed defense, SPECCFI, ensures that control flow instructions target legal destinations to constrain dangerous speculation on forward control-flow paths (indirect calls and branches). We augment this protection with a precise speculation-aware hardware stack to constrain speculation on backward control-flow edges (returns). We combine this solution with existing solutions against branch target predictor attacks (Spectre-PHT) to close all known non-vendor-specific Spectre vulnerabilities. We show that SPECCFI results in small overheads both in terms of performance and additional hardware complexity. △ Less

Submitted 4 December, 2019; v1 submitted 4 June, 2019; originally announced June 2019.

Comments: To appear in IEEE S&P 2020

arXiv:1901.05322 [pdf, other]

Learning and Reasoning for Robot Sequential Decision Making under Uncertainty

Authors: Saeid Amiri, Mohammad Shokrolah Shirazi, Shiqi Zhang

Abstract: Robots frequently face complex tasks that require more than one action, where sequential decision-making (SDM) capabilities become necessary. The key contribution of this work is a robot SDM framework, called LCORPP, that supports the simultaneous capabilities of supervised learning for passive state estimation, automated reasoning with declarative human knowledge, and planning under uncertainty t… ▽ More Robots frequently face complex tasks that require more than one action, where sequential decision-making (SDM) capabilities become necessary. The key contribution of this work is a robot SDM framework, called LCORPP, that supports the simultaneous capabilities of supervised learning for passive state estimation, automated reasoning with declarative human knowledge, and planning under uncertainty toward achieving long-term goals. In particular, we use a hybrid reasoning paradigm to refine the state estimator, and provide informative priors for the probabilistic planner. In experiments, a mobile robot is tasked with estimating human intentions using their motion trajectories, declarative contextual knowledge, and human-robot interaction (dialog-based and motion-based). Results suggest that, in efficiency and accuracy, our framework performs better than its no-learning and no-reasoning counterparts in office environment. △ Less

Submitted 10 December, 2019; v1 submitted 16 January, 2019; originally announced January 2019.

Comments: In proceedings of 34th AAAI conference on Artificial Intelligence, 2020

arXiv:1803.10932 [pdf, other]

Learning Free-Form Deformations for 3D Object Reconstruction

Authors: Dominic Jack, Jhony K. Pontes, Sridha Sridharan, Clinton Fookes, Sareh Shirazi, Frederic Maire, Anders Eriksson

Abstract: Representing 3D shape in deep learning frameworks in an accurate, efficient and compact manner still remains an open challenge. Most existing work addresses this issue by employing voxel-based representations. While these approaches benefit greatly from advances in computer vision by generalizing 2D convolutions to the 3D setting, they also have several considerable drawbacks. The computational co… ▽ More Representing 3D shape in deep learning frameworks in an accurate, efficient and compact manner still remains an open challenge. Most existing work addresses this issue by employing voxel-based representations. While these approaches benefit greatly from advances in computer vision by generalizing 2D convolutions to the 3D setting, they also have several considerable drawbacks. The computational complexity of voxel-encodings grows cubically with the resolution thus limiting such representations to low-resolution 3D reconstruction. In an attempt to solve this problem, point cloud representations have been proposed. Although point clouds are more efficient than voxel representations as they only cover surfaces rather than volumes, they do not encode detailed geometric information about relationships between points. In this paper we propose a method to learn free-form deformations (FFD) for the task of 3D reconstruction from a single image. By learning to deform points sampled from a high-quality mesh, our trained model can be used to produce arbitrarily dense point clouds or meshes with fine-grained geometry. We evaluate our proposed framework on both synthetic and real-world data and achieve state-of-the-art results on point-cloud and volumetric metrics. Additionally, we qualitatively demonstrate its applicability to label transferring for 3D semantic segmentation. △ Less

Submitted 29 March, 2018; originally announced March 2018.

Comments: 16 pages, 7 figures, 3 tables

Journal ref: Asian Conference on Computer Vision (ACCV) 2018

arXiv:1709.07894 [pdf]

On Encoding Temporal Evolution for Real-time Action Prediction

Authors: Fahimeh Rezazadegan, Sareh Shirazi, Mahsa Baktashmotlagh, Larry S. Davis

Abstract: Anticipating future actions is a key component of intelligence, specifically when it applies to real-time systems, such as robots or autonomous cars. While recent works have addressed prediction of raw RGB pixel values, we focus on anticipating the motion evolution in future video frames. To this end, we construct dynamic images (DIs) by summarising moving pixels through a sequence of future frame… ▽ More Anticipating future actions is a key component of intelligence, specifically when it applies to real-time systems, such as robots or autonomous cars. While recent works have addressed prediction of raw RGB pixel values, we focus on anticipating the motion evolution in future video frames. To this end, we construct dynamic images (DIs) by summarising moving pixels through a sequence of future frames. We train a convolutional LSTMs to predict the next DIs based on an unsupervised learning process, and then recognise the activity associated with the predicted DI. We demonstrate the effectiveness of our approach on 3 benchmark action datasets showing that despite running on videos with complex activities, our approach is able to anticipate the next human action with high accuracy and obtain better results than the state-of-the-art methods. △ Less

Submitted 7 February, 2018; v1 submitted 22 September, 2017; originally announced September 2017.

Comments: Submitted Version

arXiv:1703.02658 [pdf, other]

What Would You Do? Acting by Learning to Predict

Authors: Adam Tow, Niko Sünderhauf, Sareh Shirazi, Michael Milford, Jürgen Leitner

Abstract: We propose to learn tasks directly from visual demonstrations by learning to predict the outcome of human and robot actions on an environment. We enable a robot to physically perform a human demonstrated task without knowledge of the thought processes or actions of the human, only their visually observable state transitions. We evaluate our approach on two table-top, object manipulation tasks and… ▽ More We propose to learn tasks directly from visual demonstrations by learning to predict the outcome of human and robot actions on an environment. We enable a robot to physically perform a human demonstrated task without knowledge of the thought processes or actions of the human, only their visually observable state transitions. We evaluate our approach on two table-top, object manipulation tasks and demonstrate generalisation to previously unseen states. Our approach reduces the priors required to implement a robot task learning system compared with the existing approaches of Learning from Demonstration, Reinforcement Learning and Inverse Reinforcement Learning. △ Less

Submitted 7 March, 2017; originally announced March 2017.

Comments: Submitted to International Conference on Intelligent Robots and Systems (IROS 2017)

arXiv:1701.04925 [pdf]

doi 10.1109/ICRA.2017.7989361

Action Recognition: From Static Datasets to Moving Robots

Authors: Fahimeh Rezazadegan, Sareh Shirazi, Ben Upcroft, Michael Milford

Abstract: Deep learning models have achieved state-of-the- art performance in recognizing human activities, but often rely on utilizing background cues present in typical computer vision datasets that predominantly have a stationary camera. If these models are to be employed by autonomous robots in real world environments, they must be adapted to perform independently of background cues and camera motion ef… ▽ More Deep learning models have achieved state-of-the- art performance in recognizing human activities, but often rely on utilizing background cues present in typical computer vision datasets that predominantly have a stationary camera. If these models are to be employed by autonomous robots in real world environments, they must be adapted to perform independently of background cues and camera motion effects. To address these challenges, we propose a new method that firstly generates generic action region proposals with good potential to locate one human action in unconstrained videos regardless of camera motion and then uses action proposals to extract and classify effective shape and motion features by a ConvNet framework. In a range of experiments, we demonstrate that by actively proposing action regions during both training and testing, state-of-the-art or better performance is achieved on benchmarks. We show the outperformance of our approach compared to the state-of-the-art in two new datasets; one emphasizes on irrelevant background, the other highlights the camera motion. We also validate our action recognition method in an abnormal behavior detection scenario to improve workplace safety. The results verify a higher success rate for our method due to the ability of our system to recognize human actions regardless of environment and camera motion. △ Less

Submitted 17 January, 2017; originally announced January 2017.

Journal ref: Robotics and Automation (ICRA), 2017 IEEE International Conference on

arXiv:1701.02694 [pdf, other]

Limited individual attention and online virality of low-quality information

Authors: Xiaoyan Qiu, Diego F. M. Oliveira, Alireza Sahami Shirazi, Alessandro Flammini, Filippo Menczer

Abstract: Social media are massive marketplaces where ideas and news compete for our attention. Previous studies have shown that quality is not a necessary condition for online virality and that knowledge about peer choices can distort the relationship between quality and popularity. However, these results do not explain the viral spread of low-quality information, such as the digital misinformation that th… ▽ More Social media are massive marketplaces where ideas and news compete for our attention. Previous studies have shown that quality is not a necessary condition for online virality and that knowledge about peer choices can distort the relationship between quality and popularity. However, these results do not explain the viral spread of low-quality information, such as the digital misinformation that threatens our democracy. We investigate quality discrimination in a stylized model of online social network, where individual agents prefer quality information, but have behavioral limitations in managing a heavy flow of information. We measure the relationship between the quality of an idea and its likelihood to become prevalent at the system level. We find that both information overload and limited attention contribute to a degradation in the market's discriminative power. A good tradeoff between discriminative power and diversity of information is possible according to the model. However, calibration with empirical data characterizing information load and finite attention in real social media reveals a weak correlation between quality and popularity of information. In these realistic conditions, the model predicts that high-quality information has little advantage over low-quality information. △ Less

Submitted 10 January, 2019; v1 submitted 10 January, 2017; originally announced January 2017.

Comments: The original paper was retracted (see http://doi.org/10.1038/s41562-017-0132). This is a corrected version of the preprint

arXiv:1612.00558 [pdf, other]

Unsupervised Human Action Detection by Action Matching

Authors: Basura Fernando, Sareh Shirazi, Stephen Gould

Abstract: We propose a new task of unsupervised action detection by action matching. Given two long videos, the objective is to temporally detect all pairs of matching video segments. A pair of video segments are matched if they share the same human action. The task is category independent---it does not matter what action is being performed---and no supervision is used to discover such video segments. Unsup… ▽ More We propose a new task of unsupervised action detection by action matching. Given two long videos, the objective is to temporally detect all pairs of matching video segments. A pair of video segments are matched if they share the same human action. The task is category independent---it does not matter what action is being performed---and no supervision is used to discover such video segments. Unsupervised action detection by action matching allows us to align videos in a meaningful manner. As such, it can be used to discover new action categories or as an action proposal technique within, say, an action detection pipeline. Moreover, it is a useful pre-processing step for generating video highlights, e.g., from sports videos. We present an effective and efficient method for unsupervised action detection. We use an unsupervised temporal encoding method and exploit the temporal consistency in human actions to obtain candidate action segments. We evaluate our method on this challenging task using three activity recognition benchmarks, namely, the MPII Cooking activities dataset, the THUMOS15 action detection benchmark and a new dataset called the IKEA dataset. On the MPII Cooking dataset we detect action segments with a precision of 21.6% and recall of 11.7% over 946 long video pairs and over 5000 ground truth action segments. Similarly, on THUMOS dataset we obtain 18.4% precision and 25.1% recall over 5094 ground truth action segment pairs. △ Less

Submitted 15 May, 2017; v1 submitted 1 December, 2016; originally announced December 2016.

Comments: IEEE International Conference on Computer Vision and Pattern Recognition CVPR 2017 Workshops

arXiv:1610.05432 [pdf, other]

ARTiS: Appearance-based Action Recognition in Task Space for Real-Time Human-Robot Collaboration

Authors: Markus Eich, Sareh Shirazi, Gordon Wyeth

Abstract: To have a robot actively supporting a human during a collaborative task, it is crucial that robots are able to identify the current action in order to predict the next one. Common approaches make use of high-level knowledge, such as object affordances, semantics or understanding of actions in terms of pre- and post-conditions. These approaches often require hand-coded a priori knowledge, time- and… ▽ More To have a robot actively supporting a human during a collaborative task, it is crucial that robots are able to identify the current action in order to predict the next one. Common approaches make use of high-level knowledge, such as object affordances, semantics or understanding of actions in terms of pre- and post-conditions. These approaches often require hand-coded a priori knowledge, time- and resource-intensive or supervised learning techniques. We propose to reframe this problem as an appearance-based place recognition problem. In our framework, we regard sequences of visual images of human actions as a map in analogy to the visual place recognition problem. Observing the task for the second time, our approach is able to recognize pre-observed actions in a one-shot learning approach and is thereby able to recognize the current observation in the task space. We propose two new methods for creating and aligning action observations within a task map. We compare and verify our approaches with real data of humans assembling several types of IKEA flat packs. △ Less

Submitted 6 March, 2017; v1 submitted 18 October, 2016; originally announced October 2016.

arXiv:1512.03424 [pdf]

Evaluation of Object Detection Proposals Under Condition Variations

Authors: Fahimeh Rezazadegan, Sareh Shirazi, Michael Milford, Ben Upcroft

Abstract: Object detection is a fundamental task in many computer vision applications, therefore the importance of evaluating the quality of object detection is well acknowledged in this domain. This process gives insight into the capabilities of methods in handling environmental changes. In this paper, a new method for object detection is introduced that combines the Selective Search and EdgeBoxes. We test… ▽ More Object detection is a fundamental task in many computer vision applications, therefore the importance of evaluating the quality of object detection is well acknowledged in this domain. This process gives insight into the capabilities of methods in handling environmental changes. In this paper, a new method for object detection is introduced that combines the Selective Search and EdgeBoxes. We tested these three methods under environmental variations. Our experiments demonstrate the outperformance of the combination method under illumination and view point variations. △ Less

Submitted 10 December, 2015; originally announced December 2015.

Comments: 2 pages, 6 figures, CVPR Workshop, 2015

arXiv:1501.04158 [pdf, other]

On the Performance of ConvNet Features for Place Recognition

Authors: Niko Sünderhauf, Feras Dayoub, Sareh Shirazi, Ben Upcroft, Michael Milford

Abstract: After the incredible success of deep learning in the computer vision domain, there has been much interest in applying Convolutional Network (ConvNet) features in robotic fields such as visual navigation and SLAM. Unfortunately, there are fundamental differences and challenges involved. Computer vision datasets are very different in character to robotic camera data, real-time performance is essenti… ▽ More After the incredible success of deep learning in the computer vision domain, there has been much interest in applying Convolutional Network (ConvNet) features in robotic fields such as visual navigation and SLAM. Unfortunately, there are fundamental differences and challenges involved. Computer vision datasets are very different in character to robotic camera data, real-time performance is essential, and performance priorities can be different. This paper comprehensively evaluates and compares the utility of three state-of-the-art ConvNets on the problems of particular relevance to navigation for robots; viewpoint-invariance and condition-invariance, and for the first time enables real-time place recognition performance using ConvNets with large maps by integrating a variety of existing (locality-sensitive hashing) and novel (semantic search space partitioning) optimization techniques. We present extensive experiments on four real world datasets cultivated to evaluate each of the specific challenges in place recognition. The results demonstrate that speed-ups of two orders of magnitude can be achieved with minimal accuracy degradation, enabling real-time performance. We confirm that networks trained for semantic place categorization also perform better at (specific) place recognition when faced with severe appearance changes and provide a reference for which networks and layers are optimal for different aspects of the place recognition problem. △ Less

Submitted 28 July, 2015; v1 submitted 17 January, 2015; originally announced January 2015.

arXiv:1408.2313 [pdf, other]

doi 10.1109/DICTA.2015.7371239

Bags of Affine Subspaces for Robust Object Tracking

Authors: Sareh Shirazi, Conrad Sanderson, Chris McCool, Mehrtash T. Harandi

Abstract: We propose an adaptive tracking algorithm where the object is modelled as a continuously updated bag of affine subspaces, with each subspace constructed from the object's appearance over several consecutive frames. In contrast to linear subspaces, affine subspaces explicitly model the origin of subspaces. Furthermore, instead of using a brittle point-to-subspace distance during the search for the… ▽ More We propose an adaptive tracking algorithm where the object is modelled as a continuously updated bag of affine subspaces, with each subspace constructed from the object's appearance over several consecutive frames. In contrast to linear subspaces, affine subspaces explicitly model the origin of subspaces. Furthermore, instead of using a brittle point-to-subspace distance during the search for the object in a new frame, we propose to use a subspace-to-subspace distance by representing candidate image areas also as affine subspaces. Distances between subspaces are then obtained by exploiting the non-Euclidean geometry of Grassmann manifolds. Experiments on challenging videos (containing object occlusions, deformations, as well as variations in pose and illumination) indicate that the proposed method achieves higher tracking accuracy than several recent discriminative trackers. △ Less

Submitted 5 February, 2016; v1 submitted 11 August, 2014; originally announced August 2014.

Comments: in International Conference on Digital Image Computing: Techniques and Applications, 2015

MSC Class: 14M15; 54B05 ACM Class: I.2.10; I.4.8; I.5.4

arXiv:1403.0309 [pdf, other]

doi 10.1109/WACV.2014.6836008

Object Tracking via Non-Euclidean Geometry: A Grassmann Approach

Authors: Sareh Shirazi, Mehrtash T. Harandi, Brian C. Lovell, Conrad Sanderson

Abstract: A robust visual tracking system requires an object appearance model that is able to handle occlusion, pose, and illumination variations in the video stream. This can be difficult to accomplish when the model is trained using only a single image. In this paper, we first propose a tracking approach based on affine subspaces (constructed from several images) which are able to accommodate the abovemen… ▽ More A robust visual tracking system requires an object appearance model that is able to handle occlusion, pose, and illumination variations in the video stream. This can be difficult to accomplish when the model is trained using only a single image. In this paper, we first propose a tracking approach based on affine subspaces (constructed from several images) which are able to accommodate the abovementioned variations. We use affine subspaces not only to represent the object, but also the candidate areas that the object may occupy. We furthermore propose a novel approach to measure affine subspace-to-subspace distance via the use of non-Euclidean geometry of Grassmann manifolds. The tracking problem is then considered as an inference task in a Markov Chain Monte Carlo framework via particle filtering. Quantitative evaluation on challenging video sequences indicates that the proposed approach obtains considerably better performance than several recent state-of-the-art methods such as Tracking-Learning-Detection and MILtrack. △ Less

Submitted 2 March, 2014; originally announced March 2014.

Comments: IEEE Winter Conference on Applications of Computer Vision (WACV), 2014

ACM Class: I.2.10; I.4.6; I.4.7; I.4.8; I.5.1; I.5.4; G.3

Showing 1–20 of 20 results for author: Shirazi, S