Optimizing Deep Learning Inference via Global Analysis and Tensor Expression

Dec 30, 2018·

Jiacheng Zhao

Yisong Chang

Denghui Li

Chunwei Xia

Huimin Cui

Ke Zhang

Xiaobing Feng

· 0 min read

PDF Cite DOI

Abstract

Nowadays, a large number of accelerators are proposed to increase the performance of AI applications, making it a big challenge to enhance existing AI programming frameworks to support these new accelerators. In this paper, we select TensorFlow to demonstrate how to port the AI programming framework to new hardwares, i.e., FPGA and Sunway TaihuLight here. FPGA and Sunway TaihuLight represent two distinct and significant hardware architectures for considering the retargeting process. We introduce our retargeting processes and experiences for these two platforms, from the source codes to the compilation processes. We compare the two retargeting approaches and demonstrate some preliminary experimental results.

Type

Conference paper

Publication

In Network and Parallel Computing

Last updated on Dec 30, 2018

Deep Learning Framework

Authors

Chunwei Xia

Lecturer (Assistant Professor)

← DNNTune: Automatic Benchmarking DNN Models for Mobile-cloud Computing Dec 26, 2019

Characterizing DNN Models for Edge-Cloud Computing Oct 13, 2018 →