Mediapipe pose

網路大神的參考文獻

原始參考程式碼來源 https://google.github.io/mediapipe/solutions/pose.html

綜合幾種mediapipe官方的程式碼 https://codeleading.com/article/41525633746/

以下是PoseLandmark类中共33个人体骨骼点

SOLUTION的APIS說明 https://google.github.io/mediapipe/solutions/pose.html

其實蠻清楚的說明參數與使用

Solution APIs

Cross-platform Configuration Options

Naming style and availability may differ slightly across platforms/languages.

STATIC_IMAGE_MODE

If set to false, the solution treats the input images as a video stream. It will try to detect the most prominent person in the very first images, and upon a successful detection further localizes the pose landmarks. In subsequent images, it then simply tracks those landmarks without invoking another detection until it loses track, on reducing computation and latency. If set to true, person detection runs every input image, ideal for processing a batch of static, possibly unrelated, images. Default to false.

MODEL_COMPLEXITY

Complexity of the pose landmark model: 0, 1 or 2. Landmark accuracy as well as inference latency generally go up with the model complexity. Default to 1.

SMOOTH_LANDMARKS

If set to true, the solution filters pose landmarks across different input images to reduce jitter, but ignored if static_image_mode is also set to true. Default to true.

MIN_DETECTION_CONFIDENCE

Minimum confidence value ([0.0, 1.0]) from the person-detection model for the detection to be considered successful. Default to 0.5.

MIN_TRACKING_CONFIDENCE

Minimum confidence value ([0.0, 1.0]) from the landmark-tracking model for the pose landmarks to be considered tracked successfully, or otherwise person detection will be invoked automatically on the next input image. Setting it to a higher value can increase robustness of the solution, at the expense of a higher latency. Ignored if static_image_mode is true, where person detection simply runs on every image. Default to 0.5.

Output

Naming style may differ slightly across platforms/languages.

POSE_LANDMARKS

A list of pose landmarks. Each landmark consists of the following:

x and y: Landmark coordinates normalized to [0.0, 1.0] by the image width and height respectively.
z: Represents the landmark depth with the depth at the midpoint of hips being the origin, and the smaller the value the closer the landmark is to the camera. The magnitude of z uses roughly the same scale as x.
visibility: A value in [0.0, 1.0] indicating the likelihood of the landmark being visible (present and not occluded) in the image.

以下為中文註解，方便我教學的時候自己複習

程式碼流程原則上是

匯入cv2, mediapipe

開始擷取視訊流，在把擷取的影像翻轉加強處理(包含顏色處理BGR轉RGB，處理完之後再轉回BGR輸出)

與pose_landmarks做對應後繪圖，有33個點，寫回去後繪製landmarks，輸出在視窗裡

按ESC可以結束

以下程式碼直接複製的話會每一行都多一行出來

import cv2

import mediapipe as mp

# 匯入cvs跟mediapipe，其中把mediapipe物件命名為mp

mp_drawing = mp.solutions.drawing_utils

mp_pose = mp.solutions.pose

# 把繪圖跟姿勢物件重新命名為mp_drawing跟mp_pose這樣就不用打一長串

# 因為使用webcam作為視訊串流來源，所以這裡就不練習使用靜態圖片的參數,以下為mp_pose.Pose()的五個變數

# static_image_mode,model_complexity,smooth_landmarks,min_detection_confidence,min_tracking_confidence

# 這五個參數中static_image_mode預設是false，如果是true，smooth_landmarks就要設為true

cap = cv2.VideoCapture(0)

# 開始擷取Webcam的影像，使用攝影機的時候括弧的參數是0

# min_detection_confidence=0.5,min_tracking_confidence=0.5，其中這兩個變數是mp_pose.Pose的兩個變數

# min_detection_confidence的值介於0-1之間，判斷是否為"人"，通常是設定為0.5，如果要準確一點是可以再調高

# min_tracking_confidence這個信度參數判斷是否追蹤成功，通常也都是設定為0.5

# 先寫一個大的with函數吧

with mp_pose.Pose(

min_detection_confidence=0.5,

min_tracking_confidence=0.5) as pose:

# 以下是判斷視訊鏡頭是否擷取成功

while cap.isOpened():

success, image = cap.read()

#如果成功就開始從攝影機擷取一張影像

if not success:

print("忽略空白的視訊鏡頭影格")

continue

# 繼續往下執行，不是使用break來中斷程式喔

image = cv2.cvtColor(cv2.flip(image, 1), cv2.COLOR_BGR2RGB)

# 因為cvs讀取進來的資料格式是BGR，cv2.cvtColor(來源,色彩轉換的樣式)，所以要先轉換為RGB，到時候輸出再轉回BGR

# 對圖片進行flip翻轉，以便增強圖片數據

# cv2.cv.flip(src, flipCode[, dst] )，是要把影像作翻轉，src是來源，因為輸出類型一樣dst就省略

# 其中flipCode，0表示繞x軸翻轉，正值(例如1)表示繞y軸翻轉。負值(例如-1)表示圍繞兩個軸翻轉。

# 把串流讀取到的frame先做水平翻轉，再把色彩由BGR轉為RGB

image.flags.writeable = False

# 為了增加效能，所以會先將圖片標記為不可寫入

# 最後便可以使用姿勢測函數進行圖片的姿勢的檢測，檢測結果保存在results中

results = pose.process(image)

# 下面程式碼是要將圖像姿勢標註寫入圖片中，所以image.flags.writeable 改回來 True

image.flags.writeable = True

# 要把寫入的圖像做輸出，所以資料型態轉回BGR

image = cv2.cvtColor(image, cv2.COLOR_RGB2BGR)

# PoseLandmark类中共33个人体骨骼点,mp_drawing.draw_landmarks有五個變數

# image, results.pose_landmarks, mp_pose.POSE_CONNECTIONS, DrawingSpec_point, DrawingSpec_line

# #参数：1、颜色，2、线条粗细，3、点的半径

# DrawingSpec_point = mp_drawing.DrawingSpec((0, 255, 0), 1 , 1)

# DrawingSpec_line = mp_drawing.DrawingSpec((0, 0, 255), 1, 1)

mp_drawing.draw_landmarks(

image, results.pose_landmarks, mp_pose.POSE_CONNECTIONS)

# 在視窗中的標題是MediaPipe Pose

cv2.imshow('MediaPipe Pose', image)

# cv2.waitkey是OpenCV內置的函式，用途是在給定的時間內(單位毫秒)等待使用者的按鍵觸發，否則持續循環

# 按下Esc鍵就關閉視窗

if cv2.waitKey(5) & 0xFF == 27:

break

# 以上結束with的程式碼

cap.release()

# 釋出視訊鏡頭的使用

Page updated

Google Sites

Report abuse