- M.Sc School of Electronics and Information, NPU, 2015.09 – 2018.03.
- B.Sc School of Electronics and Information, NPU, GPA: 3.66/4, 2011.09 – 2015.07.
- PhD Department of Electrical Engineering, USC, 2018.08 - Now
FPGA; Parallel Computing; Deep Learning Acceleration
 Wei Zhou, Yue Niu, Xiaocong Lian, Xin Zhou, Jiamin Yang, "A Stepped-RAM Reading and Multiplierless VLSI Architecture for Intra Prediction in HEVC", The Pacific-Rim Conference on Multimedia (PCM 2016), Part I, LNCS 9916, pp. 469-478, Xi'an, China, September 2016. (Prof. Wei Zhou is my supervisor. The main work is done by me.) [PDF]
 Yue Niu, Chunsheng Mei, Zhenyu Liu, Xiangyang Ji, Wei Zhou, Dongsheng Wang, “SENSITIVITY-BASED ACCELERATION AND COMPRESSION ALGORITHM FOR CONVOLUTION NEURAL NETWORK”. Present at 2017 IEEE Global Conference on Signal and Information Processing (GlobalSIP)(Oral).[PDF]
 Chunsheng Mei*, Zhenyu Liu, Yue Niu*, Xiangyang Ji, Wei Zhou, Dongsheng Wang, “A 200MHZ 202.4GFLOPS@10.8W VGG16 ACCELERATOR IN XILINX VX690T”. Present at 2017 IEEE Global Conference on Signal and Information Processing (GlobalSIP)(Oral) (*Equal work).[PDF]
 Yue Niu, Guanwen Zhang, Wei Zhou, et al.: Sensitivity-oriented Layer-wise Acceleration and Compression for Convolution Neural Network. IEEE Transactions on Multimedia (under review).
 Zeheng Li, Yue Niu, “Hyperspectral Unmixing Based on Nonnegative Matrix Factorization with Minimum Distance Constraint”. submitted to IET Signal Processing (under review).
Demo for implementing accelerator embedded in computer.
- The accelerator is embedded in the PCIE slot, the right computer is the running computer which first transfers data to FPGA accelerator, then keeps waiting until the accelerator finish the Forward computations. Then the right computer read output from FPGA via PCIE. Finally, the program in the right computer looking for classification label from pre-define syntax words and calculate the computing error and classification accuracy.
- The left computer is used for debugging and display some necessary signals, such as signal for starting and finishing computation. In addition, the debug and wave window in the left computer can also be used to count time of Forward computation in the FPGA.
C/C++, Python, Matlab, Lua; Cuda, Verilog
Caffe, Tensorflow, Torch; Vivado, Synopsys; Git