Reference Link: https://docs.nvidia.com/deeplearning/dali/user-guide/docs/index.html (NVIDIA DALI Manual)
Reference Link: https://docs.nvidia.com/deeplearning/dali/user-guide/docs/examples/sequence_processing/video/video_reader_simple_example.html#Goal (Video Load Example-I)
Reference Link: https://docs.nvidia.com/deeplearning/dali/user-guide/docs/examples/use_cases/video_superres/README.html (Video Load Example-II)
동작 환경
Ubuntu 18.04
CUDA 11.0
CuDNN 8.0.5
Pytorch 1.7
NVIDIA graphic driver 460
NVVL 이라고 NVIDIA Video Loader Library 가 DALI 안으로 병합된 뒤로는 최신 CUDA 버전에서는 DALI 만 사용 가능하다.
sudo apt install ffmpeg x264 x265
sudo apt-get install libavcodec-dev
sudo apt-get install libavfilter-dev
sudo apt-get install libavformat-dev
sudo apt-get install libavutil-dev
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist nvidia-dali-cuda110
CUDA 버전은 10.0과 11.0 에 대해서 pip 로 설치 가능하다
다른 CUDA 버전에서는 설치가 잘 안되는 것 같다. 따라서 11.0 버전에서 설치 진행하는 것을 추천한다.
AVC(H.264) 와 HEVC(H.265) 코덱을 지원한다.
YUV420p 만을 지원한다.
고정 FPS 로 압축된 영상만 사용가능
파일 정보란에서 Minimum FPS와 Maximum FPS 가 다른 경우에는 제대로 동작이 되지 않기 때문에 재 인코딩이 필요하다.
Transcoding
ffmpeg -i INPUT.mp4 -map v:0 -c:v libx264 -crf 18 -pix_fmt yuv420p -profile:v high INPUT_trans.mp4
import os
import torch
from nvidia.dali.pipeline import Pipeline
import nvidia.dali.ops as ops
import nvidia.dali.types as types
import time
from PIL import Image
from nvidia.dali.plugin import pytorch
### INPUT AUGUMENTS
batch_size = 1 # input batch
sequence_length = 12 # the number of frames at one loading
video_directory = 'data_file/avc' # input video path
video_files=[video_directory + '/' + f for f in os.listdir(video_directory)]
frame_length = 120 # total video frame length
n_iter = int(frame_length / sequence_length) # the number of loading iteration
class VideoPipe(Pipeline):
def __init__(self, batch_size, num_threads, device_id, data, shuffle):
super(VideoPipe, self).__init__(batch_size, num_threads, device_id, seed=16)
self.input = ops.VideoReader(device="gpu", filenames=data, sequence_length=sequence_length,
shard_id=0, num_shards=1, image_type=types.RGB, dtype=types.UINT8,
random_shuffle=False, initial_fill=16)
def define_graph(self):
output = self.input(name="Reader")
return output
pipe = VideoPipe(batch_size=batch_size, num_threads=2, device_id=0, data=video_files, shuffle=False)
pipe.build()
dali_iter = pytorch.DALIGenericIterator(pipe, ["data"], reader_name="Reader")
start_total = time.time()
start_load = time.time()
for i, inputs in enumerate(dali_iter):
if i >= n_iter:
break
inputs = inputs[0]["data"] # Load input data
torch.cuda.synchronize()
load_time = time.time() - start_load
for j in range(sequence_length):
frame = inputs[:, j, :, :, :].squeeze(0).squeeze(0)
## move GPU to CPU
start_gputocpu = time.time()
frame = frame.cpu().numpy()
gputocpu_time = time.time() - start_gputocpu
im = Image.fromarray(frame)
## Image Save JPG file
start_save = time.time()
# quailty default = 75 (0-100)
im.save(os.path.join('output/', '{:010d}_.jpg'.format(i*sequence_length+j+1)), quality=75)
save_time = time.time() - start_save
print('(CPU-to-GPU): {0:.4f}\t (GPU-to-CPU): {1:.4f}\t Save Time: {2:.4f}'.format(load_time, gputocpu_time,
save_time))
start_load = time.time()
Avg_total_time = (time.time() - start_total) / frame_length
print('Avg Total Time: {0:.4f}'.format(Avg_total_time))