Troubleshooting frozen system
Troubleshooting Frozen system
slow / frozen system
check if procs are in uninterrupted sleep state (waiting for IO and causing slowness)
ps aux (check STAT column, 'D' will show procs that are in uninterrupted sleep)
check paging faults
sar -B 2 5 will generate paging report, check majflt column, major faults per second, if high #, means system is out of RAM
LSOF
find top memory and IO procs, get all list of open files for each proc
lsof -p <PID>
show all opened files by user
lsof -u <username>
show all procs running on a port
lsof -i TCP:<PORT>
or do a portrange, lsof -i TCP:20-150
SYSRQ
enable SysRQ to kill procs that are in 'uninterrupted sleep' state. SysRQ will respond even in frozen state (assuming command line is responsive)
1. enable sysrq
echo 1 > /proc/sys/kernel/sysrq
2. get info (in /var/log/messages) of D state procs
echo 'w' > /proc/sysrq-trigger
3. disable sysrq
echo 0 > /proc/sys/kernel/sysrq
HTOP
htop settings tweaked for sysadmin use, showing proc state, IO state, OOM score, etc
add to ~/.config/htop/htoprc
# Beware! This file is rewritten by htop when settings are changed in the interface.
# The parser is also very primitive, and not human-friendly.
fields=0 48 2 11 113 111 110 20 17 18 38 39 40 46 47 49 1
sort_key=111
sort_direction=1
hide_threads=1
hide_kernel_threads=1
hide_userland_threads=0
shadow_other_users=0
show_thread_names=0
show_program_path=1
highlight_base_name=0
highlight_megabytes=0
highlight_threads=1
tree_view=0
header_margin=1
detailed_cpu_time=1
cpu_count_from_zero=1
update_process_names=0
account_guest_in_cpu_meter=0
color_scheme=5
delay=15
left_meters=LeftCPUs Memory Swap
left_meter_modes=1 1 1
right_meters=RightCPUs Hostname Clock Uptime LoadAverage Tasks Swap Memory
right_meter_modes=1 2 2 2 2 2 2 2