DeepDive: An Integrative Algorithm/Architecture Co-Design for Deep Separable Convolutional Neural Networks
Authors:
Mohammadreza Baharani,
Ushma Sunil,
Kaustubh Manohar,
Steven Furgurson,
Hamed Tabkhi
Abstract:
Deep Separable Convolutional Neural Networks (DSCNNs) have become the emerging paradigm by offering modular networks with structural sparsity in order to achieve higher accuracy with relatively lower operations and parameters. However, there is a lack of customized architectures that can provide flexible solutions that fit the sparsity of the DSCNNs. This paper introduces DeepDive, which is a full…
▽ More
Deep Separable Convolutional Neural Networks (DSCNNs) have become the emerging paradigm by offering modular networks with structural sparsity in order to achieve higher accuracy with relatively lower operations and parameters. However, there is a lack of customized architectures that can provide flexible solutions that fit the sparsity of the DSCNNs. This paper introduces DeepDive, which is a fully-functional, vertical co-design framework, for power-efficient implementation of DSCNNs on edge FPGAs. DeepDive's architecture supports crucial heterogeneous Compute Units (CUs) to fully support DSCNNs with various convolutional operators interconnected with structural sparsity. It offers an FPGA-aware training and online quantization combined with modular synthesizable C++ CUs, customized for DSCNNs. The execution results on Xilinx's ZCU102 FPGA board, demonstrate 47.4 and 233.3 FPS/Watt for MobileNet-V2 and a compact version of EfficientNet, respectively, as two state-of-the-art depthwise separable CNNs. These comparisons showcase how DeepDive improves FPS/Watt by 2.2$\times$ and 1.51$\times$ over Jetson Nano high and low power modes, respectively. It also enhances FPS/Watt about 2.27$\times$ and 37.25$\times$ over two other FPGA implementations. The DeepDive output for MobileNetV2 is available at https://github.com/TeCSAR-UNCC/DeepDive.
△ Less
Submitted 18 July, 2020;
originally announced July 2020.
EfficientHRNet: Efficient Scaling for Lightweight High-Resolution Multi-Person Pose Estimation
Authors:
Christopher Neff,
Aneri Sheth,
Steven Furgurson,
Hamed Tabkhi
Abstract:
There is an increasing demand for lightweight multi-person pose estimation for many emerging smart IoT applications. However, the existing algorithms tend to have large model sizes and intense computational requirements, making them ill-suited for real-time applications and deployment on resource-constrained hardware. Lightweight and real-time approaches are exceedingly rare and come at the cost o…
▽ More
There is an increasing demand for lightweight multi-person pose estimation for many emerging smart IoT applications. However, the existing algorithms tend to have large model sizes and intense computational requirements, making them ill-suited for real-time applications and deployment on resource-constrained hardware. Lightweight and real-time approaches are exceedingly rare and come at the cost of inferior accuracy. In this paper, we present EfficientHRNet, a family of lightweight multi-person human pose estimators that are able to perform in real-time on resource-constrained devices. By unifying recent advances in model scaling with high-resolution feature representations, EfficientHRNet creates highly accurate models while reducing computation enough to achieve real-time performance. The largest model is able to come within 4.4% accuracy of the current state-of-the-art, while having 1/3 the model size and 1/6 the computation, achieving 23 FPS on Nvidia Jetson Xavier. Compared to the top real-time approach, EfficientHRNet increases accuracy by 22% while achieving similar FPS with 1/3 the power. At every level, EfficientHRNet proves to be more computationally efficient than other bottom-up 2D human pose estimation approaches, while achieving highly competitive accuracy.
△ Less
Submitted 30 December, 2020; v1 submitted 15 July, 2020;
originally announced July 2020.