Troubleshooting frozen system

Troubleshooting Frozen system

slow / frozen system

check if procs are in uninterrupted sleep state (waiting for IO and causing slowness)

ps aux (check STAT column, 'D' will show procs that are in uninterrupted sleep)

check paging faults

sar -B 2 5 will generate paging report, check majflt column, major faults per second, if high #, means system is out of RAM

LSOF

find top memory and IO procs, get all list of open files for each proc

lsof -p <PID>

show all opened files by user

lsof -u <username>

show all procs running on a port

lsof -i TCP:<PORT>

or do a portrange, lsof -i TCP:20-150

SYSRQ

enable SysRQ to kill procs that are in 'uninterrupted sleep' state. SysRQ will respond even in frozen state (assuming command line is responsive)

1. enable sysrq

echo 1 > /proc/sys/kernel/sysrq

2. get info (in /var/log/messages) of D state procs

echo 'w' > /proc/sysrq-trigger

3. disable sysrq

echo 0 > /proc/sys/kernel/sysrq

HTOP

htop settings tweaked for sysadmin use, showing proc state, IO state, OOM score, etc

add to ~/.config/htop/htoprc

# Beware! This file is rewritten by htop when settings are changed in the interface.

# The parser is also very primitive, and not human-friendly.

fields=0 48 2 11 113 111 110 20 17 18 38 39 40 46 47 49 1

sort_key=111

sort_direction=1

hide_threads=1

hide_kernel_threads=1

hide_userland_threads=0

shadow_other_users=0

show_thread_names=0

show_program_path=1

highlight_base_name=0

highlight_megabytes=0

highlight_threads=1

tree_view=0

header_margin=1

detailed_cpu_time=1

cpu_count_from_zero=1

update_process_names=0

account_guest_in_cpu_meter=0

color_scheme=5

delay=15

left_meters=LeftCPUs Memory Swap

left_meter_modes=1 1 1

right_meters=RightCPUs Hostname Clock Uptime LoadAverage Tasks Swap Memory

right_meter_modes=1 2 2 2 2 2 2 2