ML with cuda

Installing OS

稍微簡述一下安裝Fedora Linux作業系統的流程，以及升級現有作業系統的方法，並試著跑一下 CUDA 運算機器學習。

安裝Nvidia GPU driver。

安裝cuDNN和nvidia-cuda-toolkit。

可以直接點選這個連結觀看

foregoing-vessel-149.notion.site/Installing-Linux-OS-and-CUDA-for-ML-01eb45151a0e42f1a06d6266e17fc86e?pvs=4

A method to check linux version on terminal

~$ uname -r

6.5.0-41-generic

~$ cat /etc/os-release

PRETTY_NAME="Ubuntu 22.04.4 LTS"

NAME="Ubuntu"

VERSION_ID="22.04"

VERSION="22.04.4 LTS (Jammy Jellyfish)"

VERSION_CODENAME=jammy

ID=ubuntu

ID_LIKE=debian

HOME_URL="https://www.ubuntu.com/"

SUPPORT_URL="https://help.ubuntu.com/"

BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"

PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"

UBUNTU_CODENAME=jammy

Setup root , which is a framework for physicist to analysis big data,on a linux ubuntu workstation.

~$ sudo snap install root-framework

root-framework v6-30-04 from James Carroll✪ installed

~$ root

------------------------------------------------------------------

| Welcome to ROOT 6.30/04 https://root.cern |

| Built for linuxx8664gcc on Feb 03 2024, 23:12:12 |

| From tags/v6-30-04@v6-30-04 |

| With c++ (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0 |

| Try '.help'/'.?', '.demo', '.license', '.credits', '.quit'/'.q' |

------------------------------------------------------------------

root [0] .help

More introduction on

https://root.cern/gallery/

Galleries of images produced with ROOTData Analysis Framework

Installing cuda

Fedora linux的環境，使用pyenv建虛擬環境，python=3.12.4，cudatoolkit=11.8，pytorch+cu118，安裝好就能用cuda訓練模型。

下載 CUDA RPM 安裝包
bash

wget https://developer.download.nvidia.com/compute/cuda/11.8.0/local_installers/cuda-repo-fedora35-11-8-local-11.8.0_520.61.05-1.x86_64.rpm

這條命令使用 wget 工具從 NVIDIA 開發者網站下載 CUDA 11.8 的 RPM 安裝包，適用於 Fedora 35 操作系統。wget 是一個從網絡下載文件的命令行工具。

安裝 CUDA RPM 包
bash

sudo rpm -i cuda-repo-fedora35-11-8-local-11.8.0_520.61.05-1.x86_64.rpm

這條命令使用 rpm 工具來安裝剛才下載的 CUDA RPM 包。sudo 表示以超級用戶身份執行這條命令，-i 是 rpm 的安裝選項。

清理所有 DNF 緩存
bash

sudo dnf clean all

這條命令使用 dnf 工具清理所有的軟件包管理緩存。這有助於確保後續安裝過程中使用最新的軟件包信息。

安裝最新的 DKMS 支持的 NVIDIA 驅動
bash

sudo dnf -y module install nvidia-driver:latest-dkms

這條命令使用 dnf 工具安裝最新版本的支持 DKMS（動態內核模塊支持）的 NVIDIA 驅動。-y 選項表示自動回答「是」以確認所有提示。module install 表示安裝一個模塊，nvidia-driver:latest-dkms 是指定的模塊名稱。

安裝 CUDA 工具包
bash

sudo dnf -y install cuda

這條命令使用 dnf 工具安裝 CUDA 工具包。-y 選項表示自動回答「是」以確認所有提示。install 是 dnf 的安裝選項，cuda 是要安裝的包名稱。

這些命令將下載並安裝適用於 Fedora 35 的 CUDA 11.8 及相關的 NVIDIA 驅動程序，並確保安裝過程中的所有依賴關係和配置都能正確處理。

constructing virtualenvs

Check Existing Virtual Environments: List all virtual environments managed by pyenv to see if pytorch exists or if there is another virtual environment you should be using:
sh

pyenv virtualenvs

Create a Virtual Environment (if needed): If pytorch does not exist, you can create a new virtual environment:
sh

pyenv virtualenv <python-version> pytorch

Replace <python-version> with the specific Python version you want to use, such as 3.12.4.

Activate the Virtual Environment: Activate the newly created virtual environment:
sh

pyenv activate pytorch

Install Dependencies: Once the virtual environment is activated, install matplotlib and any other dependencies:
sh

pip install matplotlib

pip install torch torchvision torchaudio

specify the CUDA version if needed:

pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118

驗證 CUDA 可用性

運行 testGPU.py 腳本檢查 CUDA 可用性。

pi@fedora:~$ pyenv activate pytorch

(pytorch) pi@fedora:~$ python --version

Python 3.12.4

(pytorch) pi@fedora:~$ python testGPU.py

CUDA is available

Device count: 1

Current device: 0

Device name: NVIDIA GeForce RTX 3060 Ti

working on windows Os

如果在Windows作業系統下，建議使用Anaconda來建置環境：

安裝好Anaconda

Create 一個環境，名字就叫pytorch，python==12.4

$conda create --name pytorch python==12.4

安裝Nvidia GPU的驅動程式

安裝Cuda toolkit 11.8

安裝cuDNN

$conda activate pytorch

在pytorch這個環境中安裝torch

為了讓pytorch能夠在GPU上運行tensor，

就來安裝visual studio community 2022 選有C++開發的library，

做好環境變數的設置，

就能夠在Windows作業系統+Anaconda包好了cudatoolkit=11.8 ，利用GPU跑模型訓練。

整個建置的過程，使用ChatGPT協作，帶來大幅的時間縮減，提升效率：

分享協作連結 https://chatgpt.com/share/c844cb46-623a-4956-9387-271adaf8e870

完成設定之後來試試GPU 結果很讚

運行一下程式testGPU.py

運行的結果

CUDA is available

Device count: 1

Current device: 0

Device name: NVIDIA GeForce RTX 3060

表示用得到GPU

#testGPU.py

import torch

if torch.cuda.is_available():

print("CUDA is available")

print(f"Device count: {torch.cuda.device_count()}")

print(f"Current device: {torch.cuda.current_device()}")

print(f"Device name: {torch.cuda.get_device_name(torch.cuda.current_device())}")

else:

print("CUDA is not available")

以上是Fedora Linux 和 Windows 安裝 pytorch 和 cuda 的參考流程，如果你的作業系統是 Ubuntu 可以參考這篇用ChatGPT 協作。
https://chatgpt.com/share/77b9c244-0ade-4a90-8bc4-966667cbc288

Page updated

Report abuse