Surveillance des métriques de serveur Linux dans Home Assistant via mqtt

Il était nécessaire de mettre un autre serveur à la maison et j'ai décidé de surveiller ses performances dans une maison intelligente à domicile, utilisée par Home Assistant. Une recherche rapide et réfléchie sur Google ne m'a pas donné de solutions universelles, j'ai donc construit mon propre vélo.





Introduction: nous surveillerons séparément la charge et la température du processeur, la RAM et la charge d'échange, l'espace disque libre, la durée de disponibilité, la charge totale du système, la température et l'état des disques intelligents séparément, ainsi que l'état du raid (sur un serveur avec serveur ubuntu 20, un simple raid logiciel1 a été soulevé) ... Disques WD Green, carte mère GA-525 avec atom525 intégré.



Le courtier mosquitto a déjà été configuré sur le serveur de la maison intelligente, donc mqtt a été choisi comme méthode de transfert de données.



Dans les premières sections de ce travail, les principes des méthodes de collecte de données appliquées sont donnés, et à la fin - les scripts de transfert de données et les paramètres HA.



Toutes les commandes des exemples sont exécutées en tant que root.





Table des matières

Collecte des capteurs système

Collecte des

données de charge système Collecte des données d'intégrité du disque dur

Collecte des

données d' état RAID Envoi des données collectées

Configuration de Home Assistant





Lectures des capteurs du système

Pour obtenir les capteurs intégrés, nous utiliserons l'utilitaire de capteurs





S'il n'est pas installé, mettez-le: apt-get install lm-sensors







Tout d'abord, vous devez trouver tous les capteurs disponibles. Nous exécutons la commande sensors-detect





et répondons à toutes les questions y . Après cela, vous pouvez voir ce qui s'est passé:sensors







Il est à noter que personnellement, mes capteurs ont commencé à afficher tous les capteurs trouvés uniquement après un redémarrage. Peut-être une sorte de bug, je ne sais pas.





. sensors json, . sensors -A -u -j



json. , .







, . . json - jp. - ubuntu :



apt-get install jq







xpath . , -.





. , , , temp3, :



sensors -A -u -j | jq '.["coretemp-isa-0000"]["Core 0"].temp2_input'

sensors -A -u -j | jq '.["it8720-isa-0290"].fan1.fan1_input'

sensors -A -u -j | jq '.["it8720-isa-0290"].temp3.temp3_input'








, , , , .





. - free. , -m, .





, . - , .



free -m | grep "Mem" | awk '{print $2}'







grep , awk - , . , . .





, df. , , , . - , . : df









df | grep "/dev/md127p1" | awk '{print $5}' | sed 's/%$//'

df | grep "/dev/md126p1" | awk '{print $5}' | sed 's/%$//'








/proc/loadavg. , - , . , , / 1, 5 15 . . , ( ) , '? 15 :



cat /proc/loadavg | awk '{print $3}'







uptime:



uptime | awk '{print $3}' | sed 's/,$//'







mpstat. , , . , , . , , , . mpstat , apt install sysstat. ,



mpstat | grep all | awk '{print $13}'







, .



, , . bash . bc



cpuidle=$(mpstat | grep all | awk '{print $13}')

cpuload=$(echo "100-$cpuidle" | bc -l)

echo " : $cpuload"








hddtemp. , :



apt-get install hddtemp







: , -n :





SMART smartmontools



apt-get install smartmontools







, -a, .



smartctl -a /dev/sda







, . , . . :





  • Raw_Read_Error_Rate — . , . , . . , ;





  • Reallocated_Sector_Ct — . ;





  • Seek_Error_Rate — . ;





  • Spin_Retry_Count — . ;





  • Reallocated_Event_Count — ;





  • Offline_Uncorrectable — . .





, - json. -j, :



smartctl -a -j /dev/sda







json, . . , . json xpath .





xpath, jq, ( ):





smartctl -a /dev/sda -j | jq '.ata_smart_attributes.table[0].raw.value' #Raw_Read_Error_Rate

smartctl -a /dev/sda -j | jq '.ata_smart_attributes.table[3].raw.value' #Reallocated_Sector_Ct

smartctl -a /dev/sda -j | jq '.ata_smart_attributes.table[4].raw.value' #Seek_Error_Rate

smartctl -a /dev/sda -j | jq '.ata_smart_attributes.table[6].raw.value' #Spin_Retry_Count

smartctl -a /dev/sda -j | jq '.ata_smart_attributes.table[12].raw.value' #Reallocated_Event_Count

smartctl -a /dev/sda -j | jq '.ata_smart_attributes.table[14].raw.value' #Offline_Uncorrectable








, " - " - -H, . -j, json.





json:



smartctl -a /dev/sda -j | jq '.smart_status.passed' #smart_status







, ()

, , , cron . .



smartctl -t short /dev/sda







, 2



smartctl -t long /dev/sda







, 1 .



, , smartd, , . , . smartd .





RAID

raid mdadm. , /var. , mdadm , raid .



, - sys. [1] [2]



- . .





, cat /proc/mdstat





- :









echo 'check' >/sys/block/md126/md/sync_action

echo 'check' >/sys/block/md127/md/sync_action












cat /sys/block/md126/md/mismatch_cnt

cat /sys/block/md127/md/mismatch_cnt








0, .





, .





mosquitto, :



apt-get install mosquitto-clients







- , . - ( ), ( raid ), ( smart):



touch system.sh && touch drives.sh && touch smart.sh

chmod u+x system.sh && chmod u+x drives.sh && chmod u+x smart.sh








:





system.sh
#!/bin/bash
#      
ip=xx.xx.xx.xx
usr="xx"
pass="xx"



tempdrive1=$(hddtemp "/dev/sda" -n)
echo "  1: $tempdrive1"
tempdrive2=$(hddtemp "/dev/sdb" -n)
echo "  2: $tempdrive2"


tempcpu=$(sensors -A -u -j | jq '.["coretemp-isa-0000"]["Core 0"].temp2_input')
echo " : $tempcpu"
fan=$(sensors -A -u -j | jq '.["it8720-isa-0290"].fan1.fan1_input')
echo "  : $fan"
temp3=$(sensors -A -u -j | jq '.["it8720-isa-0290"].temp3.temp3_input')
echo " : $temp3"

totalram=$(free -m | grep "Mem" | awk '{print $2}')
echo " : $totalram"
usedram=$(free -m | grep "Mem" | awk '{print $3}')
echo "  : $usedram"
usedrampercent=$(($usedram * 100 / $totalram))
echo "    : $usedrampercent"

totalswap=$(free -m | grep "Swap" | awk '{print $2}')
echo " : $totalswap"
usedswap=$(free -m | grep "Swap" | awk '{print $3}')
echo "  : $usedswap"
usedswappercent=$(($usedswap * 100 / $totalswap))
echo "    : $usedswappercent"

averageload=$(cat /proc/loadavg | awk '{print $3}')
echo "  : $averageload"

uptimedata=$(uptime | awk '{print $3}' | sed 's/,$//')
echo ": $uptimedata"

cpuidle=$(mpstat | grep all | awk '{print $13}')
cpuload=$(echo "100-$cpuidle" | bc -l) # ,    bash      
echo "  : $cpuload"


echo " "
echo " "

mosquitto_pub -h $ip -t "srv/tempdrive1" -m $tempdrive1 -u $usr -P $pass
mosquitto_pub -h $ip -t "srv/tempdrive2" -m $tempdrive2 -u $usr -P $pass

mosquitto_pub -h $ip -t "srv/tempcpu" -m $tempcpu -u $usr -P $pass
mosquitto_pub -h $ip -t "srv/fan" -m $fan -u $usr -P $pass
mosquitto_pub -h $ip -t "srv/temp3" -m $temp3 -u $usr -P $pass

mosquitto_pub -h $ip -t "srv/usedrampercent" -m $usedrampercent -u $usr -P $pass
mosquitto_pub -h $ip -t "srv/usedswappercent" -m $usedswappercent -u $usr -P $pass

mosquitto_pub -h $ip -t "srv/averageload" -m $averageload -u $usr -P $pass

mosquitto_pub -h $ip -t "srv/uptimedata" -m $uptimedata -u $usr -P $pass

mosquitto_pub -h $ip -t "srv/cpuload" -m $cpuload -u $usr -P $pass

      
      



drives.sh
#!/bin/bash
#      
ip=xx.xx.xx.xx
usr="xx"
pass="xx"



raid_system_status=$(cat /sys/block/md126/md/mismatch_cnt)
echo " RAID  : $raid_system_status"
raid_var_status=$(cat /sys/block/md127/md/mismatch_cnt)
echo " RAID  : $raid_var_status"

freesystemdisk=$(df | grep "/dev/md127p1" | awk '{print $5}' | sed 's/%$//')
echo "    : $freesystemdisk"
freedatadisk=$(df | grep "/dev/md126p1" | awk '{print $5}' | sed 's/%$//')
echo "    : $freedatadisk"

echo " "
echo " "

mosquitto_pub -h $ip -t "srv/raid_system_status" -m $raid_system_status -u $usr -P $pass
mosquitto_pub -h $ip -t "srv/raid_var_status" -m $raid_var_status -u $usr -P $pass

mosquitto_pub -h $ip -t "srv/freesystemdisk" -m $freesystemdisk -u $usr -P $pass
mosquitto_pub -h $ip -t "srv/freedatadisk" -m $freedatadisk -u $usr -P $pass

      
      



smart.sh
#!/bin/bash
#      
ip=xx.xx.xx.xx
usr="xx"
pass="xx"



Raw_Read_Error_Rate1=$(smartctl -a /dev/sda -j | jq '.ata_smart_attributes.table[0].raw.value')
echo "SMART Raw_Read_Error_Rate  1: $Raw_Read_Error_Rate1"
Reallocated_Sector_Ct1=$(smartctl -a /dev/sda -j | jq '.ata_smart_attributes.table[3].raw.value')
echo "SMART Reallocated_Sector_Ct  1: $Reallocated_Sector_Ct1"
Seek_Error_Rate1=$(smartctl -a /dev/sda -j | jq '.ata_smart_attributes.table[4].raw.value')
echo "SMART Seek_Error_Rate  1: $Seek_Error_Rate1"
Spin_Retry_Count1=$(smartctl -a /dev/sda -j | jq '.ata_smart_attributes.table[6].raw.value')
echo "SMART Spin_Retry_Count  1: $Spin_Retry_Count1"
Reallocated_Event_Count1=$(smartctl -a /dev/sda -j | jq '.ata_smart_attributes.table[12].raw.value')
echo "SMART Reallocated_Event_Count  1: $Reallocated_Event_Count1"
Offline_Uncorrectable1=$(smartctl -a /dev/sda -j | jq '.ata_smart_attributes.table[14].raw.value')
echo "SMART Offline_Uncorrectable  1: $Offline_Uncorrectable1"

smart_status1=$(smartctl -a /dev/sda -j | jq '.smart_status.passed')
echo "  1: $smart_status1"

Raw_Read_Error_Rate2=$(smartctl -a /dev/sdb -j | jq '.ata_smart_attributes.table[0].raw.value')
echo "SMART Raw_Read_Error_Rate  2: $Raw_Read_Error_Rate2"
Reallocated_Sector_Ct2=$(smartctl -a /dev/sdb -j | jq '.ata_smart_attributes.table[3].raw.value')
echo "SMART Reallocated_Sector_Ct  2: $Reallocated_Sector_Ct2"
Seek_Error_Rate2=$(smartctl -a /dev/sdb -j | jq '.ata_smart_attributes.table[4].raw.value')
echo "SMART Seek_Error_Rate  2: $Seek_Error_Rate2"
Spin_Retry_Count2=$(smartctl -a /dev/sdb -j | jq '.ata_smart_attributes.table[6].raw.value')
echo "SMART Spin_Retry_Count  2: $Spin_Retry_Count2"
Reallocated_Event_Count2=$(smartctl -a /dev/sdb -j | jq '.ata_smart_attributes.table[12].raw.value')
echo "SMART Reallocated_Event_Count  2: $Reallocated_Event_Count2"
Offline_Uncorrectable2=$(smartctl -a /dev/sdb -j | jq '.ata_smart_attributes.table[14].raw.value')
echo "SMART Offline_Uncorrectable  2: $Offline_Uncorrectable2"

smart_status2=$(smartctl -a /dev/sdb -j | jq '.smart_status.passed')
echo "  2: $smart_status2"

echo " "
echo " "

mosquitto_pub -h $ip -t "srv/Raw_Read_Error_Rate1" -m $Raw_Read_Error_Rate1 -u $usr -P $pass
mosquitto_pub -h $ip -t "srv/Reallocated_Sector_Ct1" -m $Reallocated_Sector_Ct1 -u $usr -P $pass
mosquitto_pub -h $ip -t "srv/Seek_Error_Rate1" -m $Seek_Error_Rate1 -u $usr -P $pass
mosquitto_pub -h $ip -t "srv/Spin_Retry_Count1" -m $Spin_Retry_Count1 -u $usr -P $pass
mosquitto_pub -h $ip -t "srv/Reallocated_Event_Count1" -m $Reallocated_Event_Count1 -u $usr -P $pass
mosquitto_pub -h $ip -t "srv/Offline_Uncorrectable1" -m $Offline_Uncorrectable1 -u $usr -P $pass

mosquitto_pub -h $ip -t "srv/Raw_Read_Error_Rate2" -m $Raw_Read_Error_Rate2 -u $usr -P $pass
mosquitto_pub -h $ip -t "srv/Reallocated_Sector_Ct2" -m $Reallocated_Sector_Ct2 -u $usr -P $pass
mosquitto_pub -h $ip -t "srv/Seek_Error_Rate2" -m $Seek_Error_Rate2 -u $usr -P $pass
mosquitto_pub -h $ip -t "srv/Spin_Retry_Count2" -m $Spin_Retry_Count2 -u $usr -P $pass
mosquitto_pub -h $ip -t "srv/Reallocated_Event_Count2" -m $Reallocated_Event_Count2 -u $usr -P $pass
mosquitto_pub -h $ip -t "srv/Offline_Uncorrectable2" -m $Offline_Uncorrectable2 -u $usr -P $pass

mosquitto_pub -h $ip -t "srv/smart_status1" -m $smart_status1 -u $usr -P $pass
mosquitto_pub -h $ip -t "srv/smart_status2" -m $smart_status2 -u $usr -P $pass

      
      



, Mosquitto broker Home Assistant





, , , .





Home Assistant

, . Home Assistant .





sensor:
  - platform: mqtt
    state_topic: "srv/tempdrive1"
    name: " nextcloud   1"
    unit_of_measurement: °C
  - platform: mqtt
    state_topic: "srv/tempdrive2"
    name: " nextcloud   2"
    unit_of_measurement: °C
  - platform: mqtt
    state_topic: "srv/tempcpu"
    name: " nextcloud  "
    unit_of_measurement: °C
  - platform: mqtt
    state_topic: "srv/fan"
    name: " nextcloud  "
    unit_of_measurement: ppm
  - platform: mqtt
    state_topic: "srv/temp3"
    name: " nextcloud  "
    unit_of_measurement: °C
  - platform: mqtt
    state_topic: "srv/usedrampercent"
    name: " nextcloud  RAM"
    unit_of_measurement: "%"
  - platform: mqtt
    state_topic: "srv/usedswappercent"
    name: " nextcloud  SWAP"
    unit_of_measurement: "%"
  - platform: mqtt
    state_topic: "srv/freesystemdisk"
    name: " nextcloud     "
    unit_of_measurement: "%"
  - platform: mqtt
    state_topic: "srv/freedatadisk"
    name: " nextcloud     "
    unit_of_measurement: "%"
  - platform: mqtt
    state_topic: "srv/averageload"
    name: " nextcloud   "
  - platform: mqtt
    state_topic: "srv/uptimedata"
    name: " nextcloud "
  - platform: mqtt
    state_topic: "srv/cpuload"
    name: " nextcloud   "
    unit_of_measurement: "%"
  - platform: mqtt
    state_topic: "srv/Raw_Read_Error_Rate1"
    name: " nextcloud  1 SMART Raw_Read_Error_Rate"
  - platform: mqtt
    state_topic: "srv/Reallocated_Sector_Ct1"
    name: " nextcloud  1 SMART Reallocated_Sector_Ct"
  - platform: mqtt
    state_topic: "srv/Seek_Error_Rate1"
    name: " nextcloud  1 SMART Seek_Error_Rate"
  - platform: mqtt
    state_topic: "srv/Spin_Retry_Count1"
    name: " nextcloud  1 SMART Spin_Retry_Count"
  - platform: mqtt
    state_topic: "srv/Reallocated_Event_Count1"
    name: " nextcloud  1 SMART Reallocated_Event_Count"
  - platform: mqtt
    state_topic: "srv/Offline_Uncorrectable1"
    name: " nextcloud  1 SMART Offline_Uncorrectable"
  - platform: mqtt
    state_topic: "srv/smart_status1"
    name: " nextcloud  1 SMART "
  - platform: mqtt
    state_topic: "srv/Raw_Read_Error_Rate2"
    name: " nextcloud  2 SMART Raw_Read_Error_Rate"
  - platform: mqtt
    state_topic: "srv/Reallocated_Sector_Ct2"
    name: " nextcloud  2 SMART Reallocated_Sector_Ct"
  - platform: mqtt
    state_topic: "srv/Seek_Error_Rate2"
    name: " nextcloud  2 SMART Seek_Error_Rate"
  - platform: mqtt
    state_topic: "srv/Spin_Retry_Count2"
    name: " nextcloud  2 SMART Spin_Retry_Count"
  - platform: mqtt
    state_topic: "srv/Reallocated_Event_Count2"
    name: " nextcloud  2 SMART Reallocated_Event_Count"
  - platform: mqtt
    state_topic: "srv/Offline_Uncorrectable2"
    name: " nextcloud  2 SMART Offline_Uncorrectable"
  - platform: mqtt
    state_topic: "srv/smart_status2"
    name: " nextcloud  2 SMART "
  - platform: mqtt
    state_topic: "srv/raid_system_status"
    name: " nextcloud RAID   "
  - platform: mqtt
    state_topic: "srv/raid_var_status"
    name: " nextcloud RAID   "
      
      



, , , ! . , , . :





, . , , smart .





- , . , . → → mqtt.



- linux , , , .





- . , . , .





La capture d'écran montre que le serveur discuté est prévu pour nextcloud. Ses indicateurs internes peuvent également être parfaitement ajoutés à HA, pour cela, il existe une merveilleuse api. Et HA a une intégration intégrée.








All Articles