Fluentd: pourquoi le réglage du tampon de sortie est important



De nos jours, il est impossible d'imaginer un projet basé sur Kubernetes sans pile ELK, avec laquelle les journaux des applications et des composants système du cluster sont enregistrés. Dans notre pratique, nous utilisons la pile EFK avec Fluentd au lieu de Logstash.



Fluentd — , Cloud Native Computing Foundation, - Kubernetes.



Fluentd Logstash , , Fluentd , .



, EFK , , Kibana . , .





Fluentd DaemonSet ( Kubernetes) stdout /var/log/containers. JSON- ElasticSearch, standalone , . Kibana.



Fluentd , ElasticSearch . , Nginx. :



127.0.0.1 192.168.0.1 - [28/Feb/2013:12:00:00 +0900] "GET / HTTP/1.1" 200 777 "-" "Opera/12.0" -


, ElasticSearch , :



{
  "_index": "test-custom-prod-example-2020.01.02",
  "_type": "_doc",
  "_id": "HgGl_nIBR8C-2_33RlQV",
  "_version": 1,
  "_score": 0,
  "_source": {
    "service": "test-custom-prod-example",
    "container_name": "nginx",
    "namespace": "test-prod",
    "@timestamp": "2020-01-14T05:29:47.599052886 00:00",
    "log": "127.0.0.1 192.168.0.1 - [28/Feb/2013:12:00:00  0900] \"GET / HTTP/1.1\" 200 777 \"-\" \"Opera/12.0\" -",
    "tag": "custom-log"
  }
}

{
  "_index": "test-custom-prod-example-2020.01.02",
  "_type": "_doc",
  "_id": "IgGm_nIBR8C-2_33e2ST",
  "_version": 1,
  "_score": 0,
  "_source": {
    "service": "test-custom-prod-example",
    "container_name": "nginx",
    "namespace": "test-prod",
    "@timestamp": "2020-01-14T05:29:47.599052886 00:00",
    "log": "127.0.0.1 192.168.0.1 - [28/Feb/2013:12:00:00  0900] \"GET / HTTP/1.1\" 200 777 \"-\" \"Opera/12.0\" -",
    "tag": "custom-log"
  }
}


, .



Fluentd :



2020-01-16 01:46:46 +0000 [warn]: [test-prod] failed to flush the buffer. retry_time=4 next_retry_seconds=2020-01-16 01:46:53 +0000 chunk="59c37fc3fb320608692c352802b973ce" error_class=Fluent::Plugin::ElasticsearchOutput::RecoverableRequestFailure error="could not push logs to Elasticsearch cluster ({:host=>\"elasticsearch\", :port=>9200, :scheme=>\"http\", :user=>\"elastic\", :password=>\"obfuscated\"}): read timeout reached"


ElasticSearch request_timeout , - . Fluentd ElasticSearch :



2020-01-16 01:47:05 +0000 [warn]: [test-prod] retry succeeded. chunk_id="59c37fc3fb320608692c352802b973ce" 
2020-01-16 01:47:05 +0000 [warn]: [test-prod] retry succeeded. chunk_id="59c37fad241ab300518b936e27200747" 
2020-01-16 01:47:05 +0000 [warn]: [test-dev] retry succeeded. chunk_id="59c37fc11f7ab707ca5de72a88321cc2" 
2020-01-16 01:47:05 +0000 [warn]: [test-dev] retry succeeded. chunk_id="59c37fb5adb70c06e649d8c108318c9b" 
2020-01-16 01:47:15 +0000 [warn]: [kube-system] retry succeeded. chunk_id="59c37f63a9046e6dff7e9987729be66f"


, ElasticSearch _id . .



Kibana :







. — fluent-plugin-elasticsearch . , ElasticSearch . , -, .



Fluentd, . - ElasticSearch , , . , , , , , Fluentd .



, , , , : , , . , , , , , Fluentd .



:



 <buffer>
        @type file
        path /var/log/fluentd-buffers/kubernetes.test.buffer
        flush_mode interval
        retry_type exponential_backoff
        flush_thread_count 2
        flush_interval 5s
        retry_forever
        retry_max_interval 30
        chunk_limit_size 8M
        queue_limit_length 8
        overflow_action block
      </buffer>


:

chunk_limit_size — , .



  • flush_interval — , .
  • queue_limit_length — .
  • request_timeout — , Fluentd ElasticSearch.


, queue_limit_length chunk_limit_size, « , ». :



2020-01-21 10:22:57 +0000 [warn]: [test-prod] failed to write data into buffer by buffer overflow action=:block


, , , , .



: , , .



chunk_limit_size 32 , ElasticSeacrh , . , , queue_limit_length.



-, request_timeout. , 20 , Fluentd :



2020-01-21 09:55:33 +0000 [warn]: [test-dev] buffer flush took longer time than slow_flush_log_threshold: elapsed_time=20.85753920301795 slow_flush_log_threshold=20.0 plugin_id="postgresql-dev" 


, , slow_flush_log_threshold. request_timeout.



:



  1. request_timeout , ( ). -.
  2. slow_flush_log_threshold. elapsed_time .
  3. request_timeout , elapsed_time, . request_timeout elapsed_time + 50%.
  4. , slow_flush_log_threshold. elapsed_time + 25%.


, , . , , .



, , , :



node-1 node-2 node-3 node-4
/ / / /
failed to flush the buffer 1749/2 694/2 47/0 1121/2
retry succeeded 410/2 205/1 24/0 241/2


, , , . - Fluentd , slow_flush_log_threshold. request_timeout, , .





Fluentd EFK , . , , ElasticSearch , .



:






All Articles