SNPE Tutorial (tensorflow)

Qualcomm 의 Snapdragon AP에서 딥러닝 기반 Super Resolution 네트워크를 동작시키기 위한 예제

(Tensorflow version)

https://developer.qualcomm.com/docs/snpe/setup.html

이걸 따라 해보자

먼저, Prerequisites를 맞춰보자

Currently the SNPE SDK development environment is limited to Ubuntu, specifically version 14.04.
The SDK requires either Caffe, Caffe2, ONNX or TensorFlow.
1. Instructions for Caffe: Caffe and Caffe2 Setup
2. Instructions for TensorFlow: TensorFlow Setup
3. Instructions for ONNX: ONNX Setup
Python 2.7
Android NDK (android-ndk-r11-linux-x86) is optional and only required to build the native CPP example that ships with the SDK
- SDK Android binaries built with gcc require libgnustl_shared.so which can be found in the Android NDK. (See Platform Runtime Libraries below).
- SDK Android binaries build with clang require libc++_shared.so which is shipped with the SDK.
Android SDK (SDK version 23 and build tools version 23.0.2) is optional and only required to build the Android APK that ships with the SDK.

리눅스 OS 설치
GPU 그래픽 드라이버 설치 (설치 확인: nvidia-smi)
CUDA 9.0 설치
- NVIDIA 홈페이지에서 CUDA 9.0 RUN file download(ubuntu 16.04에 맞는)
- sudo sh cuda_9.2.148_396.37_linux.run
- export PATH
  - /home/.bashrc (home폴더에서 숨김파일 ctrl+h로 볼수 있음, 또는 vi ~/.bashrc로 수정)
  - 아래 네 줄 추가 후 command 종료 한 뒤 다시 새로운 command 창 켜서 확인 (참고 : ":~/anaconda3/bin" 은 나중에 anaconda3 설치 한 뒤에 추가)
- 설치 확인 : nvcc -V
cuDNN 설치
- NVIDIA 홈페이지에서 cuDNN 7.0 library 버전 파일을 다운로드
- Tensorflow 1.12 이상에서는 cuDNN 7.2 버전 이상을 요구
  - sudo tar -xzvf cudnn-9.0-linux-x64-v7.0.tgz
  - cd cuda
  - sudo cp include/cudnn.h /usr/local/cuda/include
  - sudo cp lib64/libcudnn* /usr/local/cuda/lib64
  - sudo chmod a+r /usr/local/cuda/lib64/libcudnn*
- 설치 확인 : cat /usr/local/cuda/include/cudnn.h | grep CUDNN_MAJOR -A 2
anaconda3 설치
- anaconda 홈페이지에서 Anaconda3-5.2.0-Linux-x86_64.sh 다운로드
- bash Anaconda3-5.2.0-Linux-x86_64.sh
  - 엔터를 살살 치면서 설치 진행 , 한번에 막막 치면서 진행하면 조금 꼬일 수도?!
  - 그래픽 카드 드라이버는 설치 안함 위에서 했으므로
- export PATH
  - ~/.bashrc 열어서
  - export PATH=${PATH}:/usr/local/cuda-9.0/bin:~/anaconda3/bin
    - 아까 위에서 참고로 이야기 했던 부분임
- 설치 확인 : conda
anaconda 가상환경 설정
- conda update conda
- conda create -n name python=x.x anaconda
  - 예) conda create -n tensorflow_27 python=2.7 anaconda
- 가상환경 활성
  - source activate name
- 비활성
  - source deactivate
- 가상환경 제거
  - conda remove -n name --all
- 앞으로 설치할 것들은 모두 가상환경을 켠 후 가상환경 내에서 설치하도록 한다
Tensorflow 설치
- source activate tensorflow_27
- pip install --upgrade tensorflow-gpu
- 설치 확인
  - python
  - import tensorflow as tf
  - sess = tf.Session(config=tf.ConfigProto(log_device_placement=True))
    - GPU의 정보들이 출력 되면 GPU 버전의 tensorflow가 올바르게 설치된 것임.
onnx 설치
- pip install onnx
  - caffe와 서로 충돌이 발생되는데 일단 tensorflow만 설치하는 경우에는 그냥 onnx를 설치해도 될까(?)
SNPE 설치
- SNPE를 다운받아보자
- https://developer.qualcomm.com/software/snapdragon-neural-processing-engine
- 회원가입을 하고 다운로드 하자.
- snpe-1.17.0 version 을 다운 받고 압축을 푼다.
- 설치 하기 이전에 필요로 하는 패키지를 미리 설치한다
  - sudo apt-get install python-dev python-matplotlib python-numpy python-protobuf python-scipy python-skimage python-sphinx wget zip
- 그러고 나서
  - source snpe-1.17.0/bin/dependencies.sh
- 실행 한다. 필요로 하는 패키치를 설치하도록 하는 bash 파일인것 같다
- 혹여나 .sh 파일 실행의 permission 문제가 발생되면
  - chmod a+x dependencies.sh
- 를 통해 권한을 변경해주면 된다.
- 마지막으로 check_python_depens.sh 를 통해서 필요로 하는 패키지를 확인한다.
- 실질적으로 SNPE를 설치하는 것은 다운로드 받은 경로를 PATH에 추가해주는 것이라고 볼 수 있다.
- 경로 설정하는 건 bin/envsetup.sh를 사용하면 간단하다 (-t 뒤 경로는 tensorflow 가 설치된 폴더로 하면 된다)
  - cd snpe-1.17.0
  - 예: source bin/envsetup.sh -t /home/kaist/anaconda3/envs/tensorflow_27/lib/python2.7/site-packages/tensorflow
  - 동작 확인
    - snpe-tensorflow-to-dlc 라고 shell에 입력 해서 command not found 가 뜨지 않고 입력 변수를 넣으세요 라는 말이 나온다면 올바르게 설치가 된것 이다.
adb 설치 (Android SDK/NDK 설치)
- 컴퓨터와 안드로이드 폰의 전송 통로? 의 역할을 하는 프로그램으로 android sdk, ndk가 설치가 되어야한다.
- android studio 홈페이지에서 설치
  - https://developer.android.com/studio/?hl=ko
- 실행을 시켜서 최신의 sdk와 ndk를 설치
  - cd andorid-studio
  - source bin/studio.sh
  - File-->settings-->Appearance & Behavior-->System Settings-->Android SDK-->SDK tools 창에서 설치/업데이트 가능
- 경로 설정
  - ~/.bashrc 파일을 열어서 아래 문장을 맨마지막에 추가하고 저장한다.
  - export ANDROID_SDK=/home/kaist/Android/Sdk
  - export ANDROID_NDK=/home/kaist/Android/Sdk/ndk-bundle
  - export PATH=$PATH:$ANDROID_SDK/platform-tools
  - export PATH=$PATH:$ANDROID_SDK/tools
  - export PATH=$PATH:$ANDROID_NDK
  - alias adb='/home/kaist/Android/Sdk/platform-tools/adb'
- 설치 확인 : adb

Tensorflow 를 통해 필요한 네트워크를 학습.
- 학습 예제 파일
  - EDSR_Tensorflow
  - 사실 SR 에서는 요즘 대부분 Depth-to-space (Pixel-shuffle layer)를 사용해서 영상의 크기를 확대하지만 현재 SNPE에서는 아직 이 레이어에 대해서 지원하지 않고 있다.
  - 따라서 다른 Upsampling 방법을 사용할 필요성이 있다.
    - Nearest Neighbor
    - Bilinear Interpolation
    - Transposed Convolution

일반적으로 saver를 사용해서 checkpoint와 모델의 값을 .ckpt 형식으로 저장한 것을 .pb 형태로 변환할 필요가 있다.
frozen_model.pb로 .ckpt 파일들과 checkpoint 값을 바꿔 저장해야지 .dlc 파일인 SNPE 호출 가능한 모델로 변경가능하다.
변경할 때에는 freeze_graph.py 코드를 사용한다.

import os, argparse

import tensorflow as tf

# The original freeze_graph function

# from tensorflow.python.tools.freeze_graph import freeze_graph

dir = os.path.dirname(os.path.realpath(__file__))

def freeze_graph(model_dir, output_node_names):

    """Extract the sub graph defined by the output nodes and convert

    all its variables into constant

    Args:

        model_dir: the root folder containing the checkpoint state file

        output_node_names: a string, containing all the output node's names,

                            comma separated

"""

    if not tf.gfile.Exists(model_dir):

        raise AssertionError(

            "Export directory doesn't exists. Please specify an export "

            "directory: %s" % model_dir)

    if not output_node_names:

        print("You need to supply the name of a node to --output_node_names.")

        return -1

    # We retrieve our checkpoint fullpath

    checkpoint = tf.train.get_checkpoint_state(model_dir)

    input_checkpoint = checkpoint.model_checkpoint_path

    # We precise the file fullname of our freezed graph

    absolute_model_dir = "/".join(input_checkpoint.split('/')[:-1])

    output_graph = absolute_model_dir + "/frozen_model.pb"

    # We clear devices to allow TensorFlow to control on which device it will load operations

    clear_devices = True

    # We start a session using a temporary fresh Graph

    with tf.Session(graph=tf.Graph()) as sess:

        # We import the meta graph in the current default Graph

        saver = tf.train.import_meta_graph(input_checkpoint + '.meta', clear_devices=clear_devices)

        # We restore the weights

        saver.restore(sess, input_checkpoint)

        # We use a built-in TF helper to export variables to constants

        output_graph_def = tf.graph_util.convert_variables_to_constants(

            sess,  # The session is used to retrieve the weights

            tf.get_default_graph().as_graph_def(),  # The graph_def is used to retrieve the nodes

            output_node_names.split(",")  # The output node names are used to select the usefull nodes

        # Finally we serialize and dump the output graph to the filesystem

        with tf.gfile.GFile(output_graph, "wb") as f:

            f.write(output_graph_def.SerializeToString())

        print("%d ops in the final graph." % len(output_graph_def.node))

    return output_graph_def

if __name__ == '__main__':

    parser = argparse.ArgumentParser()

    parser.add_argument("--model_dir", type=str, default="", help="Model folder to export")

    parser.add_argument("--output_node_names", type=str, default="",

                        help="The name of the output nodes, comma separated.")

    args = parser.parse_args()

    freeze_graph(args.model_dir, args.output_node_names)

위의 코드를 사용해서 학습된 모델을 .pb로 변환하는 명령어

python freeze_graph.py --model_dir=./Y_log_transpose --output_node_names='edsr/Conv2d_transepose/BiasAdd'

저장된 checkpoint와 .ckpt 파일들이 들어있는 폴더가 예를 들면 ./Y_log 이고, 학습 네트워크의 출력 노드의 이름이 'edsr/Conv2d_transepose/BiasAdd' 일 때 위와 같이 명령어를 치면 모델이 저장된 폴더에 frozen_model.pb 가 저장된다.

중요한점은 네트워크의 출력 노드 이름을 파악하는 것이 중요하다. 아래와 같이 네트워크의 입력과 출력를 print 해서 이름을 확인 할 수 있다.

self.output = self.generator(self.LR, self.filters)

print(self.LR) # input node name

print(self.output) # output node name

이렇게 확인된 출력 노드 이름을 사용해서 freeze_graph.py 를 사용해서 freeze_model.pb 를 얻을 수 있다.

이제 이 .pb 값이 올바르게 저장된 것인지 확인 하기 위해 테스트를 해본다.

from __future__ import print_function

from utils import *

from imresize import imresize

from scipy.misc import imread

import numpy as np

def CalcuPSNR(target, ref):

    target = np.clip(target, 0, 255.0)

    squared_error = np.square(target.astype('uint8').astype('float32') - ref.astype('uint8').astype('float32'))

    psnr = 10 * np.log10(255.0 * 255.0 / np.mean(squared_error))

    return psnr

def rgb2gray(img):

    y = np.sum(img * np.reshape([65.481, 128.553, 24.966], [1, 1, 3]) / 255.0, axis=2) + 16

    return y.astype('uint8')

graph = tf.Graph()

sess = tf.InteractiveSession(graph=graph)

with tf.gfile.FastGFile('./Y_log/frozen_model.pb', 'rb') as f:

    graph_def = tf.GraphDef()

    graph_def.ParseFromString(f.read())

inputImage = tf.placeholder(tf.float32, [1,256,256,1], name = "inputimage")

groudtruth = tf.placeholder(tf.float32, [1,512,512,1], name = "groundtruth")

model = tf.import_graph_def(graph_def, input_map={'random_shuffle_queue_DequeueMany':inputImage}, return_elements=['edsr/DepthToSpace'])

output = model[0].outputs[0]

print('------------------')

lr_img = imread('datasets/val/LR/Set5_LR_bicubic/X2/babyx2.png')

ori_img = imread('datasets/val/HR/Set5_HR/baby.png')

lr_gray = rgb2gray(lr_img)

ori_gray = rgb2gray(ori_img)

bic_gray = imresize(lr_gray,output_shape=(512, 512))

bic_psnr = CalcuPSNR(bic_gray, ori_gray)

lr_gray = (lr_gray / 255.0) * 2 -1

lr_gray = np.expand_dims(lr_gray, axis=0)

lr_gray = np.expand_dims(lr_gray, axis=3)

recon = sess.run(output, feed_dict={inputImage:lr_gray})

recon_img = (recon + 1) * 255 * 0.5

sr_psnr= CalcuPSNR(recon_img[0,:,:,0], ori_gray)

print(bic_psnr)

print(sr_psnr)

이제 학습한 모델인 frozen_model.pb 를 SNPE 모델인 .dlc 파일로 변경한다.

다시 SNPE로 돌아가서, 터미널에서 snpe의 경로 설정을 해줘야지 SNPE API 함수를 사용가능하다.

tensorflow의 경우

cd snpe-1.17.0
예: source bin/envsetup.sh -t /home/kaist/anaconda3/envs/tensorflow_27/lib/python2.7/site-packages/tensorflow

snpe-tensorflow-to-dlc 라는 tool을 사용해서 변경이 가능하다.

https://developer.qualcomm.com/docs/snpe/tools.html#tools_snpe-tensorflow-to-dlc

usage: snpe-tensorflow-to-dlc [-h] --graph GRAPH -i INPUT_NAME INPUT_DIM

                              --out_node OUT_NODE [--dlc DLC]

                              [--model_version MODEL_VERSION]

                              [--in_type {default,image}]

                              [--allow_unconsumed_nodes] [--verbose]

(SNPE 가 새로 업데이트가 되면서 frozen.pb 파일이 없이도, .ckpt 파일들만 가지고도 가능하다는 것 같다.)

snpe-tensorflow-to-dlc --graph ./Y_log_transpose/frozen_model.pb -i Placeholder_2 1,256,256,1 --out_node edsr/Conv2d_transpose/BiasAdd --allow_unconsumed_nodes

앞서서 찾은 Input node와 output node 이름을 넣어준다. 그리고 입력 데이터의 포맷을 넣어준다. 이때 순서는 Batch, Height, Width, Channel 순서이다.

이렇게 하면 .pb 가 있는 폴더에 frozen_model.dlc 파일이 생성된다.

생성된 .dlc 파일을 snpe-net-run을 사용해서 테스트를 할 수 있다. (리눅스 환경에서 테스트)

DESCRIPTION:

------------

Example application demonstrating how to load and execute a neural network

using the SNPE C++ API.

REQUIRED ARGUMENTS:

-------------------

  --container  <FILE>   Path to the DL container containing the network.

  --input_list <FILE>   Path to a file listing the inputs for the network.

OPTIONAL ARGUMENTS:

-------------------

  --use_fxp_cpu         Use the CPU fixed point runtime for SNPE.

  --use_gpu             Use the GPU runtime for SNPE.

  --use_dsp             Use the DSP fixed point runtime for SNPE.

  --use_aip             Use the AIP fixed point runtime for SNPE.

  --debug               Specifies that output from all layers of the network

                        will be saved.

  --output_dir <DIR>    The directory to save output to. Defaults to ./output

  --storage_dir <DIR>   The directory to store SNPE metadata files

  --encoding_type <VAL> Specifies the encoding type of input file. Valid settings are "nv21".

                        Cannot be combined with --userbuffer*.

  --userbuffer_float    [EXPERIMENTAL] Specifies to use userbuffer for inference, and the input type is float.

                        Cannot be combined with --encoding_type.

  --userbuffer_tf8      [EXPERIMENTAL] Specifies to use userbuffer for inference, and the input type is tf8exact0.

                        Cannot be combined with --encoding_type.

  --perf_profile <VAL>  Specifies perf profile to set. Valid settings are "system_settings" , "power_saver" , "balanced" ,

                        "default" , "high_performance" , "sustained_high_performance" , and "burst".

                        NOTE: "balanced" and "default" are the same.  "default" is being deprecated in the future.

  --profiling_level <VAL> Specifies the profiling level.  Valid settings are "off", "basic" and "detailed".

                          Default is detailed.

                          Basic profiling only applies to DSP runtime.

  --enable_cpu_fallback Enables cpu fallback functionality. Defaults to disable mode.

  --input_name <INPUT_NAME> Specifies the name of input for which dimensions are specified.

  --input_dimensions <INPUT_DIM>  Specifies new dimensions for input whose name is specified in input_name. e.g. "1,224,224,3".

                        For multiple inputs, specify --input_name and --input_dimensions multiple times.

  --gpu_mode <VAL>      Specifies gpu operation mode. Valid settings are "default", "float16".

                        default = float32 math and float16 storage (equiv. use_gpu arg).

                        float16 = float16 math and float16 storage.

  --help                 Show this help message.

  --version              Show SNPE Version Number.

snpe-net-run 의 입력으로는 .dlc 파일과 네트워크 입력 데이터의 리스트 파일이 들어가야한다.

img_list.txt 파일은 .raw 포맷으로 이미지의 이름을 한줄에 하나씩 적어놓은 txt 파일이다.

test_data/input_baby.raw

test_data/input_car.raw

test_data/input_balloon.raw

test_data/input_bus.raw

...

이 때 주의할 점은 .raw 데이터는 딥러닝 네트워크를 학습시에 사용했던 normalization을 그대로 진행한 뒤의 값을 저장한것이다.

만약 예를 들면 8bit 이미지를 -1 ~ 1로 normalization을 했다면, 아래와 같이 정규화를 한 뒤 저장한다. 그리고 4차원의 텐서로 저장한다.

lr_gray = imread('test_data/babyx2_gray.png')

lr_gray = (lr_gray / 255.0) * 2 -1

lr_gray = np.expand_dims(lr_gray, axis=0)

lr_gray = np.expand_dims(lr_gray, axis=3)

lr_gray = lr_gray.astype(np.float32)

lr_gray.tofile('test_data/input_baby.raw')  # NHWC order

사실 예제는 1채널의 gray 영상이기 때문에 어려움 없이 단순히 정규화만 진행했지만, RGB 3채널의 영상을 .raw 포맷으로 저장하기 위해서는 조금 번거롭게 변경해줄 필요가 있다. RRRR...GGGGG...BBBB 이렇게 저장되어있는 것을 RGBRGBRGB 이런식으로 변경해야하기 때문이다. 자세한 내용은 홈페이지에 설명되어 있다.

https://developer.qualcomm.com/docs/snpe/image_input.html

이제 테스트를 위에서 얻은 .dlc 파일과 입력 데이터 리스트 파일을 넣고 아래와 같이 실행 시킨다. (리눅스 환경에서 테스트)

snpe-net-run --container ./Y_log_transpose/frozen_model.dlc --input_list img_list.txt

그러면 출력 결과가 raw 포맷으로 저장된다. 이를 다시 원래 범위로 바꿔서 결과를 확인해볼 수 있다. 이 결과와 .pb를 사용해서 얻은 결과, tensorflow 테스트 상의 결과와 비교해서 성능의 차이가 존재하는 지 확인해볼수 있다.

out_raw = np.fromfile('output_transpose/Result_0/edsr/Conv_2d_transpose/BiasAdd:0.raw', dtype=np.float32)

out_raw = out_raw.reshape((1, 512,512,1))

out_raw = np.transpose(out_raw, [0, 3, 1, 2])

out_raw = (out_raw + 1) * 255.0 * 0.5

out_raw = (out_raw[0, 0, :, :]).clip(0, 255)

sr_snpe_psnr= CalcuPSNR(out_raw, ori_gray)

위의 결과는 .dlc를 사용해서 리눅스 환경에서 PC 하드웨어를 사용한 결과라고 볼 수 있다. 이를 동일하게 모바일 환경인 Snapdragon AP 에서 실행을 해볼 수 있다.

adb를 사용해서 snpe lib 파일들을 전송해준다.

snpe 폴더로 이동한 뒤에 터미널을 열어서 아래와 같이 입력해준다.

export SNPE_TARGET_ARCH=arm-android-gcc4.9

export SNPE_TARGET_STL=libgnustl_shared.so

adb shell "mkdir -p /data/local/tmp/snpeexample/$SNPE_TARGET_ARCH/bin"

adb shell "mkdir -p /data/local/tmp/snpeexample/$SNPE_TARGET_ARCH/lib"

adb shell "mkdir -p /data/local/tmp/snpeexample/dsp/lib"

# 아래는 현재 경로가 snpe-1.17.0 라고 생각하고 진행한다

adb push lib/$SNPE_TARGET_ARCH/$SNPE_TARGET_STL /data/local/tmp/snpeexample/$SNPE_TARGET_ARCH/lib

adb push lib/$SNPE_TARGET_ARCH/libsymphony-cpu.so /data/local/tmp/snpeexample/$SNPE_TARGET_ARCH/lib

adb push lib/$SNPE_TARGET_ARCH/libsymphonypower.so /data/local/tmp/snpeexample/$SNPE_TARGET_ARCH/lib

adb push lib/dsp/libsnpe_dsp_skel.so /data/local/tmp/snpeexample/dsp/lib

adb push lib/dsp/libsnpe_dsp_domains_skel.so /data/local/tmp/snpeexample/dsp/lib

adb push lib/dsp/libsnpe_dsp_v65_domains_v2_skel.so /data/local/tmp/snpeexample/dsp/lib

adb push lib/$SNPE_TARGET_ARCH/libsnpe_adsp.so /data/local/tmp/snpeexample/$SNPE_TARGET_ARCH/lib

adb push lib/$SNPE_TARGET_ARCH/libsnpe_dsp_domains.so /data/local/tmp/snpeexample/$SNPE_TARGET_ARCH/lib

adb push lib/$SNPE_TARGET_ARCH/libSNPE.so /data/local/tmp/snpeexample/$SNPE_TARGET_ARCH/lib

adb push bin/$SNPE_TARGET_ARCH/snpe-net-run /data/local/tmp/snpeexample/$SNPE_TARGET_ARCH/bin

위와 같이 snpe 파일들을 모바일 디바이스에 옮긴 뒤,

dlc 파일과 입력 데이터, 입력데이터 리스트 파일을 모바일로 전송하여 테스트 한다. (모바일 환경에서 테스트)

adb devices

adb push test_data/input_baby.raw /data/local/tmp/edsr

adb push img_list.txt /data/local/edsr

adb push frozen_model.dlc /data/local/edsr

데이터를 모바일 환경에 저장한 뒤에 snpe-net-run 을 실행한다.

adb shell

export SNPE_TARGET_ARCH=arm-android-gcc4.9

export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/data/local/tmp/snpeexample/$SNPE_TARGET_ARCH/lib

export PATH=$PATH:/data/local/tmp/snpeexample/$SNPE_TARGET_ARCH/bin

cd /data/local/tmp/edsr

snpe-net-run --container frozen_model.dlc --input_list img_list.txt --use_gpu

exit

위의 방법대로 모바일 환경의 GPU를 사용해서 model을 동작시켜볼수가 있다.

돌린 결과를 다시 PC로 받아 올 때, 아래와 같이 모바일의 output 폴더의 결과를 PC의 output_android_gpu 폴더로 옮겨오고 이를 위에 PC에서 돌린 것과 동일하게 역정규화를 통해 원래 이미지로 변환해서 성능 비교를 해볼 수 있다.

adb pull /data/local/tmp/edsr/output output_android_gpu

동작시간에 대해서 자세한 결과를 확인하고 싶을 때에는 SNPE에서 제공하는 benchmark 프로그램을 사용할 수 있다.

export SNPE_ROOT=/home/kaist/Desktop/snpe-1.17.0

python snpe_bench.py -c config.json -a

usage: snpe_bench.py [-h] -c CONFIG_FILE [-o OUTPUT_BASE_DIR_OVERRIDE]

                     [-v DEVICE_ID_OVERRIDE] [-a] [-t DEVICE_OS_TYPE_OVERRIDE]

                     [-r HOST_NAME] [-d] [-s SLEEP] [-b USERBUFFER_MODE] [-p PERFPROFILE]

snpe-net-run 대신에 snpe-1.17.0/benchmarks 폴더 내에 있는 snpe_bench.py 를 실행 시켜서 각각의 레이어에서 동작되는 시간 GPU, CPU, DSP 에서 동작되는 시간들을 측정할 수 있다.

Google Sites

Report abuse

SNPE Tutorial (tensorflow)

Qualcomm 의 Snapdragon AP에서 딥러닝 기반 Super Resolution 네트워크를 동작시키기 위한 예제

Contact