Students will learn how to install software on a UNIX computer system. To test the installation, they will hook up a camera and run example face tracking python program and a target recognition program using computer vision.
Learning outcome:
Prepare portable computing platform and practice Unix shell skills. Observe Unix open software distribution through online repositories both in binary as well as source code.
Urs Utzinger, Updated 1/7/2025
Your Raspberry Pi
Network Connection
When 30 students attempt downloading software packages the network will slow down. I upgraded the lab wireless access point, but it still is lots of data to download. One way to speed this up is to connect to the wired network on your desk but we only have one per desk. You can also try hotspot on your phone. You can measure network speed in a web browser by searching for speedtest.
sudo apt-get update
sudo apt-get upgrade
This is standard approach for the Linux Unix family (Debian, Ubuntu) to update system packages. MacOS is a part of the BSD Unix Family and uses different package manager. Besides apt there are other software distribution channels to install applications such as flatpack and snap. If you install OpenCV (see below) with apt you should not also install it with snap or flatpack. On Windows, your Android or iOS phone this process is hidden but it executes similar process.
We want to setup OpenCV (Computer Vision) and install the libraries it uses. You will need to answer questions with "Y". All commands below will need to complete without errors. The first command will take substantial time.
apt first queries a database online what subcomponents it needs then looks up what dependencies are not yet available on your computer, then downloads the packages, unpacks the compressed package (it's in a compressed archive to make it smaller for distribution), installs them and updates documentation and the database on your computer for the installed programs.
sudo apt install \
libgl1 libxcb-xinerama0 libxkbcommon-x11-0 libxcb1 \
libxcb-render0 libxcb-shape0 libxcb-shm0 libxcb-xfixes0 \
libfontconfig1 libfreetype6 fonts-dejavu-core
The following command should list fonts:
fc-list | head
Update the python setup. The first 5 commands are commented out because they are already part of Raspberry Pi OS when you installed the image.
When you install python modules with apt, it installs them system wide. With python there is also the pip installer which is used to install python packages for individual user without modifying system packages. Raspberry Pi as well as Ubuntu or MacOS needs Python to work properly, and you should be careful modifying the system installed Python package. Here we only add a small number of packages and use pip later.
smbus provides binding to I2C bus which we need to read sensors.
picamera2 are bindings to the Raspberry Pi camera subsytem
# sudo apt-get install python3-dev
# sudo apt-get install python3-pip
# sudo apt-get install python3-numpy
# sudo apt-get install python3-setuptools
# sudo apt-get install python3-wheel
sudo apt-get install python3-smbus
sudo apt-get install python3-picamera2
It is common to create a separate user defined Python environment that allows different version of Python and Python Packages to co-exist. Raspian bookworm/trixie expects you to work in virtual environment. You can create several virtual environments. In BME225 you used Jupiter, we do not use Jupiter on Raspberry Pi. Often you need one version of a python package for one application but another for another application. The virtual environment allows to easily switch to different setting. We will make one for BME210:
# Make the folder
mkdir ~/pythonBME210
# Go to the folder
cd ~/pythonBME210
# Create the environment
python3 -m venv --system-site-packages env
To activate Python Virtual Environment (each time after you start a shell, but Thonny or VS Code keep the settings and you dont need to run activate each time)
cd ~/pythonBME210
source env/bin/activate
To deactivate Python Virtual Environment
deactivate
Now is a good time to update pip, the python packages installer:
pip install --upgrade pip
You can run and debug your python code in Thonny. We will not use Conda as that works better for desktop computers. Thonny is light weight and works for most of your code. However, in order to use Thonny and the virtual environment you need to change the Python interpreter in Thonny to the one in the virtual environment.
Switch Thonny to regular mode: Top right.
Restart Thonny
In Tools->Options select interpreter.
Browse for python3 in the ~/pythonBME210/env/bin/python3 (3 dots on the right > Home > pythonBME210 > env > python3 > "OK")
Visual studio code is a better programming editor than Thonny, but it needs more resources and takes longer to start. Usually, the binaries are available at Microsoft: https://code.visualstudio.com/#alt-downloads however for the Raspberry Pi you will need to use another approach:
sudo apt install code
In the Raspberry Pi menu under programming, you should be able to find Visual Studio Code after you installed it. You cannot easily use a Web browser and Code on a computer with 2 GByte memory. Perhaps you want to stick to Thonny and use VS Code for your laptop.
In this course we will need several python packages. Python installs extensions with its own program called pip. pip contacts pypi.org to find user created python extensions.
There is a Python interpreter that runs on microcontrollers called Circuit Python. You learned how to program a microcontroller (ESP8266) in BME225 in C. Circuit Python adds the ability to run Python programs on a microcontroller. The packages developed for Circuit Python will support our need to read sensors on the Raspberry Pi. Therefore, we will install a package that provides compatibility and access to sensors like on Arduino IDE:
adafruit-pureio (to access I2C and SPI sensor interface)
adafruit-blinka (interface to Circuit Python packages)
adafruit-circuitpython-motorkit (for our motor hat which you will receive later)
For object detection we will use opencv dnn module.
Start the virtual python environment:
cd ~/pythonBME210
source env/bin/activate
Install the opencv python package:
pip3 install opencv-contrib-python
as well as these packages. They will allow us to use the circuit python extension to access sensors and controllers:
pip3 install adafruit-pureio
pip3 install adafruit-blinka
pip3 install adafruit-circuitpython-motorkit
We will also program a robotic arm called meArm. I maintain the code for that project on my Github page. If you do not have an account on Github you should consider getting one at some point in time. It's not needed for this class.
You can access all my code repositories on https://github.com/uutzinger and also on https://github.com/MediBrick. It shows what I have been working on.
GitHub is its own code repository system. You can learn the commands to pull and push code or use a desktop program to do it. It manages your code and was developed by the creator of Linux. It's notoriously difficult to remember how to use it.
Execute the command
git clone https://github.com/uutzinger/meArmPi.git
git clone https://github.com/uutzinger/camera.git
This will create a folder called meArm. When there is new version of meArmPi library you can update it with git pull when you are in the meArmPi folder.
To install the camera package in the pythonBME210 environment execute pip install -e . in the camera folder. It will make the files in the camera folder available as pacakge.
If you are not going to use VS Code on the Raspberry Pi you can skip this.
Open VS code and click on the 4 squares on the left side. This allows to install extensions.
Install Python, Pylance, Python Debugger by Microsoft.
When installing the extension, you will be given the option to Select or create a Python Environment. Select Interpreter. Select custom path and Browse to pythonBME210//env/bin and select python3.
You can open a folder, for example meArm and it will display all the files in the folder on the left site. If you open meArm.py, on the bottom right it will display the python interpreter it will use to run the program. You can click on it and verify that its pointing to pytonBME210.
You can test opencv installation with:
In a shell/terminal start python3
import cv2 If this does not complete successfully you did not complete the installation script above.
cv2.__version__
should display version number
exit() the python interpreter
Obtain one of the CSI cameras from the course staff. You will need also the flat ribbon cable attached to it. These are typical cameras used for machine vision or autonomous systems.
Make sure the Raspberry Pi is turned off.
Insert the cable into the camera slot of the Raspberry Pi: Gently pull the release hooks towards you. Orient the cable with the pads towards the visible metal connector pins and slide the cable into the connector. Push the cable hook back in. Have the connection inspected by course staff.
Make sure no metal parts touch your camera and the exposed connections are covered up. Otherwise you fry the camera or the Raspberry Pi.
Power on the Raspberry Pi.
To check if the camera works, execute in a terminal rpicam-hello -t 0. You can find documentation on raspberry pi website. Raspian is now using libcamera to manipulate the camra and its own python wrapper and camera tools.
As next step we want to test the camera with the camera package I made for python.
Open Thonny and the example program in the camera folder: picamera2_capture_display.py and attempt to run it. You can save the program to another filename because we will edit it below. Best place to save is in ~/pythonBME210.
We will want to detect human posture in the video images. For that we will use OpenCV DNN module similar to your prompt engineering homework. There are many platforms to run "Convolutional Neural Networks" such as Tensorflow (Google) , PyTorch (Meta), MNN (Alibaba, if you don't know Alibaba look it up) etc. A CNN model is the base of all AI as it takes input and creates output by multiplying it with weights through different layers. This is IMO the best explanation how AI training works: https://youtu.be/D8GOeCFFby4?si=Qiq8AY2DsSZlkoUV
We will need to obtain two models. One that detects people and one that extracts the pose.
You will need to go to the folder where you saved the camera program: cd ~/pythonBME210. Then make a folder that will contain the model data: mkdir Data . This will need to be in the same directory where you will save your program.
Then we cd Data and use wget to download the items. wget acts as when you use the browser and click on a link and save the content of the link on your computer.
Models:
There was effort to promote opencv for neural network computation and I looked up the examples and where the data was stored. We need
for human detection:
wget -O person_detection_mediapipe_2023mar_int8bq.onnx \
https://huggingface.co/ytfeng/opencv_zoo/resolve/main/models/person_detection_mediapipe/person_detection_mediapipe_2023mar_int8bq.onnx
for pose detection:
wget -O pose_estimation_mediapipe_2023mar_int8bq.onnx \
https://huggingface.co/ytfeng/opencv_zoo/resolve/main/models/pose_estimation_mediapipe/pose_estimation_mediapipe_2023mar_int8bq.onnx
These Models were developed for the Google Media Pipe and have been adapted for OpenCV.
Helper Functions:
wget -O mp_pose.py \
https://huggingface.co/ytfeng/opencv_zoo/resolve/main/models/pose_estimation_mediapipe/mp_pose.py
wget -O mp_persondet.py \
https://huggingface.co/ytfeng/opencv_zoo/resolve/main/models/person_detection_mediapipe/mp_persondet.py
You will need to add the image analysis to the camera program we tested above. The sections below will help you do this.
Loading the necessary python packages in the beginning of your program. The helpers are for loading the model and displaying the results of the model. If you do not have the data in Data directory your need to change the DATA_DIR (file is where the program is and it expects a Data directory where the program is stored)
import numpy as np
import cv2
DATA_DIR = os.path.abspath(os.path.join(os.path.dirname(__file__), "Data"))
if DATA_DIR not in sys.path:
sys.path.append(DATA_DIR)
from mp_persondet import MPPersonDet
from mp_pose import MPPose
MODEL_FILES = {
"pose": "pose_estimation_mediapipe_2023mar_int8bq.onnx",
"person": "person_detection_mediapipe_2023mar_int8bq.onnx",
}
You will need additional helpers. These are usually placed after the imports. I used the examples from the opencv_zoo (part of your upcoming homework) and told my AI agent to clean it up for use in classroom. The visualize functions is still a little clunky. You can not just copy this to the end of your program. These functions need to be placed after the import statements where other function definitions are located.
def model_path(model_key: str) -> str:
return os.path.join(DATA_DIR, MODEL_FILES[model_key])
def load_models(logger: logging.Logger) -> Tuple[MPPersonDet, MPPose]:
"""Loads person detection and pose estimation models."""
pose_path = model_path("pose")
person_path = model_path("person")
if not os.path.exists(pose_path):
logger.log(logging.CRITICAL, "Model file not found: %s", pose_path)
raise SystemExit(1)
if not os.path.exists(person_path):
logger.log(logging.CRITICAL, "Model file not found: %s", person_path)
raise SystemExit(1)
backend_id = cv2.dnn.DNN_BACKEND_OPENCV
target_id = cv2.dnn.DNN_TARGET_CPU
person_detector = MPPersonDet(
modelPath=person_path,
nmsThreshold=0.3,
scoreThreshold=0.5,
topK=5000,
backendId=backend_id,
targetId=target_id,
)
pose_estimator = MPPose(
modelPath=pose_path,
confThreshold=0.8,
backendId=backend_id,
targetId=target_id,
)
return person_detector, pose_estimator
def display_interval_from_config(configs: dict) -> float:
display_fps = float(configs.get("displayfps", 0) or 0)
capture_fps = float(configs.get("fps", 0) or 0)
if display_fps <= 0:
return 0.0 # no throttling
if capture_fps > 0 and display_fps >= 0.8 * capture_fps:
return 0.0 # close to capture fps so no throttling
return 1.0 / display_fps # throttled display
def visualize(
image: np.ndarray,
poses: List,
*,
draw_3d: bool = False,
draw_mask_edges: bool = False,
line_thickness: int = 1,
point_radius: int = 1,
) -> Tuple[np.ndarray, np.ndarray | None]:
"""
Draws 2D and 3D pose visualizations on images.
Args:
image (np.ndarray): The input image (BGR).
poses (List): List of pose results, each containing bounding box, landmarks, mask, etc.
Returns:
Tuple[np.ndarray, np.ndarray]: (2D visualization image, 3D visualization image)
"""
# Constants for display sizes and colors
DISPLAY_SIZE = 400
DISPLAY_CENTER = 200
SCALE = 100
COLOR_WHITE = (255, 255, 255)
COLOR_RED = (0, 0, 255)
COLOR_GREEN = (0, 255, 0)
def draw_skeleton_lines(canvas, landmarks, keep_landmarks, thickness: int):
"""Draws skeleton lines between keypoints if both are present."""
connections = [
(0, 1), (1, 2), (2, 3), (3, 7), (0, 4), (4, 5), (5, 6), (6, 8),
(9, 10), (12, 14), (14, 16), (16, 22), (16, 18), (16, 20), (18, 20),
(11, 13), (13, 15), (15, 21), (15, 19), (15, 17), (17, 19),
(11, 12), (11, 23), (23, 24), (24, 12), (24, 26), (26, 28),
(28, 30), (28, 32), (30, 32), (23, 25), (25, 27), (27, 31),
(27, 29), (29, 31)
]
for idx1, idx2 in connections:
if keep_landmarks[idx1] and keep_landmarks[idx2]:
cv2.line(canvas, landmarks[idx1], landmarks[idx2], COLOR_WHITE, thickness)
def draw_keypoints(canvas, landmarks, keep_landmarks, color=COLOR_RED, radius: int = 1):
"""Draws keypoints on the canvas."""
for i, point in enumerate(landmarks):
if keep_landmarks[i]:
cv2.circle(canvas, tuple(point), radius, color, -1)
def draw_edges_on_mask(mask, display_screen):
"""Draws green edges from mask onto the display image."""
edges = cv2.Canny(mask, 100, 200)
kernel = np.ones((2, 2), np.uint8)
edges = cv2.dilate(edges, kernel, iterations=1)
edges_bgr = cv2.cvtColor(edges, cv2.COLOR_GRAY2BGR)
edges_bgr[edges == 255] = COLOR_GREEN
return cv2.add(edges_bgr, display_screen)
def draw_pose_2d(display_screen, bbox, conf, landmarks_screen, keep_landmarks):
"""Draws 2D pose: bounding box, confidence, skeleton, and keypoints."""
bbox = bbox.astype(np.int32)
cv2.rectangle(display_screen, tuple(bbox[0]), tuple(bbox[1]), COLOR_GREEN, max(1, line_thickness))
cv2.putText(
display_screen,
"{:.4f}".format(conf),
(bbox[0][0], bbox[0][1] + 12),
cv2.FONT_HERSHEY_DUPLEX,
0.5,
COLOR_RED,
)
landmarks_xy = landmarks_screen[:, 0:2].astype(np.int32)
draw_skeleton_lines(display_screen, landmarks_xy, keep_landmarks, thickness=max(1, line_thickness))
draw_keypoints(display_screen, landmarks_xy, keep_landmarks, color=COLOR_RED, radius=max(1, point_radius))
def draw_pose_3d(display_3d, landmarks_world, keep_landmarks):
"""Draws 3D projections of the pose on the display_3d canvas."""
# Main View (XY)
landmarks_xy = (landmarks_world[:, [0, 1]] * SCALE + DISPLAY_CENTER).astype(np.int32)
draw_skeleton_lines(display_3d, landmarks_xy, keep_landmarks, thickness=2)
# Top View (XZ)
landmarks_xz = landmarks_world[:, [0, 2]]
landmarks_xz[:, 1] = -landmarks_xz[:, 1]
landmarks_xz = (landmarks_xz * SCALE + np.array([300, 100])).astype(np.int32)
draw_skeleton_lines(display_3d, landmarks_xz, keep_landmarks, thickness=2)
# Left View (YZ)
landmarks_yz = landmarks_world[:, [2, 1]]
landmarks_yz[:, 0] = -landmarks_yz[:, 0]
landmarks_yz = (landmarks_yz * SCALE + np.array([100, 300])).astype(np.int32)
draw_skeleton_lines(display_3d, landmarks_yz, keep_landmarks, thickness=2)
# Right View (ZY)
landmarks_zy = landmarks_world[:, [2, 1]]
landmarks_zy = (landmarks_zy * SCALE + np.array([300, 300])).astype(np.int32)
draw_skeleton_lines(display_3d, landmarks_zy, keep_landmarks, thickness=2)
# Copy input image for drawing
display_screen = image.copy()
# Create blank 3D display canvas
display_3d = None
if draw_3d:
display_3d = np.zeros((DISPLAY_SIZE, DISPLAY_SIZE, 3), np.uint8)
# Draw axes and labels for 3D visualization
cv2.line(display_3d, (DISPLAY_CENTER, 0), (DISPLAY_CENTER, DISPLAY_SIZE), COLOR_WHITE, 2)
cv2.line(display_3d, (0, DISPLAY_CENTER), (DISPLAY_SIZE, DISPLAY_CENTER), COLOR_WHITE, 2)
cv2.putText(display_3d, "Main View", (0, 12), cv2.FONT_HERSHEY_DUPLEX, 0.5, COLOR_RED)
cv2.putText(display_3d, "Top View", (DISPLAY_CENTER, 12), cv2.FONT_HERSHEY_DUPLEX, 0.5, COLOR_RED)
cv2.putText(display_3d, "Left View", (0, DISPLAY_CENTER + 12), cv2.FONT_HERSHEY_DUPLEX, 0.5, COLOR_RED)
cv2.putText(display_3d, "Right View", (DISPLAY_CENTER, DISPLAY_CENTER + 12), cv2.FONT_HERSHEY_DUPLEX, 0.5, COLOR_RED)
# Only draw 3D pose for the first detected pose (for clarity)
drew_3d = False
for pose_result in poses:
# Unpack pose result
bbox, landmarks_screen, landmarks_world, mask, _heatmap, conf = pose_result
# Optional: draw green edges from mask onto the display image
if draw_mask_edges:
display_screen = draw_edges_on_mask(mask, display_screen)
# Remove last 6 landmarks (as in original code)
landmarks_screen = landmarks_screen[:-6, :]
landmarks_world = landmarks_world[:-6, :]
keep_landmarks = landmarks_screen[:, 4] > 0.8
# Draw 2D pose (bounding box, skeleton, keypoints)
draw_pose_2d(display_screen, bbox, conf, landmarks_screen, keep_landmarks)
# Draw 3D pose projections for the first pose only
if draw_3d and not drew_3d and display_3d is not None:
drew_3d = True
draw_pose_3d(display_3d, landmarks_world, keep_landmarks)
return display_screen, display_3d
Code for the main program. This usually goes after the function definitions and before the main while loop. This is run once before we enter the main program loop (like in your Arduino programs from BME225). The main while loop is while not stop: You can not just copy this code at the end of your existing program, you need to find where in the code we have similar statement (#Display Window Setup) and complement it with the items missing.
# Load Models
person_detector, pose_estimator = load_models(logger)
# Display Window Setup
font = cv2.FONT_HERSHEY_SIMPLEX
textLocation0 = (10, 20)
textLocation1 = (10, 40)
textLocation2 = (10, 60)
textLocation3 = (10, 80)
fontScale = 0.5
fontColor = (255, 255, 255)
lineType = 1
window_name_image = "MediaPipe Pose Detection Demo"
window_name_3d = "3D Pose Demo"
cv2.namedWindow(window_name_image, cv2.WINDOW_AUTOSIZE)
cv2.namedWindow(window_name_3d, cv2.WINDOW_AUTOSIZE)
inference_fps = 0.0
Analysis code for the main loop. After we obtained the image from the camera, we want to analyze it. This runs the person detector first, extracts the identified people and then runs the post detection on each person. You need to find an appropriate location to place this code. You can either place it after you obtained the image or you plavce it in the code section where images are displayed in the window, but before they are displayed. You might need to adjust the names frame_proc or frame_display.
# Analysis
if frame is not None:
frame_proc = frame.copy()
infer_start = time.perf_counter()
persons = person_detector.infer(frame_proc)
poses = []
for person in persons:
pose = pose_estimator.infer(frame_proc, person)
if pose is not None:
poses.append(pose)
infer_elapsed = time.perf_counter() - infer_start
frame_proc, frame_3d = visualize(
frame_proc,
poses,
draw_3d=False,
draw_mask_edges=False,
line_thickness=int(1),
point_radius=int(1),
)
else:
frame_proc = None
frame_3d = None
Then I also updated the display section so that I can see the pose:
# Display
...
cv2.putText(frame_proc, 'Inference:{:<.1f} [ms]'.format(1000.*infer_elapsed),
textLocation3, font, fontScale, fontColor, lineType,
)
...
# 3D pose visualization
if frame_3d is not None:
cv2.imshow(window_name_3d, frame_3d)
If you have simple errors in your code you can copy and paste the program into your AI agent and ask it to review it for errors. It will then show you the errors. If you are lazy and dont want to fix the errors yourself you can ask your AI agent to rewrite the program for you and then you can copy and paste it back into the editor.
In some installations I have seen board cast error in 1d2Target Mat. This is indicating an issue with opencv not being able to load the onx model. If this happens to you we need to update opencv. You start a shell in the raspberry pi. In that shell you run:
sudo apt-get remove libopencv-dev
sudo apt-get remove opencv-data
sudo apt-get remove python3-opencv
sudo apt install \
libgl1 libxcb-xinerama0 libxkbcommon-x11-0 libxcb1 \
libxcb-render0 libxcb-shape0 libxcb-shm0 libxcb-xfixes0 \
libfontconfig1 libfreetype6 fonts-dejavu-core
fc-list | head # should list fonts
Then we need to load the python virtual environment so we can update opencv package. In the same shell you type:
cd ~/pythonBME210
source env/bin/activate
and install opencv with pip command:
pip install opencv-contrib-python
This will give us opencv version 4.13 and should work with onx models needed for this assignment.
If you had this issue I recommend taking screen shot as evidence and ask for extension of assignment.
In this assignement you used many unix commands. You could enter them into your cheat sheet for unix commands.