Fonctionnalités ClickHouse pour les développeurs avancés. Alexey Milovidov (2018)



Dans le rapport, il est prévu de prendre en compte les fonctionnalités peu connues ou insuffisamment couvertes de ClickHouse dans la documentation: agrégation incrémentielle et manipulations avec les états des fonctions d'agrégation, copie inter-cluster, exécution de requêtes sans utiliser de serveur, etc. Des exemples tirés de la pratique du développement de services Yandex seront donnés: comment tirer le maximum du système.







! ! ClickHouse. , , .



, . , , . . . .



ClickHouse, , . - clickstream. , , -. , , ..



, , . . - .





. , , pupkin.narod.ru; , yandex.ru. . ? .



. , . MergeTree, .





, . CREATE TABLE. ORDER BY – . , , . . , . . , , Hash . . . hash .





, . - . , . .



4,5 , . . 3,5 . . 4,5 .





1/10. from . 0,6 . , - , 2,5 . , , , . , – . , 10 . , .





, ?



  • -, . . , .
  • . , unix timestamp.
  • timestamp, ( , ). , , DevOps, . . , .
  • - . , .
  • , .
  • , url, hash . , , , . , url . - hash-.
  • - .
  • . , - .
  • , . , . - hash. , . ? . - , . , - , .
  • . – , . . .




, , :



  • -, , , hash-, . . , , .
  • . , . . . . . 1/10 hashes .
  • , .




:



  • – 1/10 .
  • – — - , , 1 000 000. . , , 1 000 000, . , , , , . . . 1 000 000 10 000 000 1 000 000 000? – _sample_factor. : x _sample_factor, , .
  • – SAMPLE OFFSET. . 1/10 , : « , , 1/10 ». : SAMPLE 1/10 OFFSET 1/10. .
  • . , , . , . , , . ? , , , , , overhead.




– . , , , count distinct, uniq. uniq 4 .



. . , If. , : sumIf. , . – , , – , - UInt8. , , . , , .





? – . ..



Google. .





– Array. , Array. , , . .



. . , .



groupArray, . , … . . . .



groupUniqArray. .



Array - , , .



, groupArrayArray? , , . .



groupUniqArrayArray – , .





, . , . , , , , , .





. . . sum, Array, If , , . . . sumArrayIf SumIfArray.



? . , . – , , – : . sumIfArray. . . Array sumIf .



– sumArrayIf. , – : sumArray .





. : sumForEachStateForEachIfArrayIfState. , , , . . , .





. , , . , . .



, , . . .



, - , .



, count distinct? Hash table. hash‑ .



, , . : « , , ». - .



, ClickHouse . –State, . , , AggregateFunction - .



, . . . AggtregateFunction. , , .



–Merge. , , .



, .







uniq . .





. –State. - , . - . , UTF-8. , , UTF-8.





? AggregateFunction . . – , – .





. , , . ?





– .



, ClickHouse – , – . clickstream, , , . , . ClickHouse , .



. - , , .



. AggregatingMergeTree. , . . , - . . , . . , , count distinct, .



, . ClickHouse, , .





. ?



  • – , . , . , . , ClickHouse-, . .
  • , , , , – . sum sumIf – , .
  • , *. , , , arrayReduce. , , , , . . –State, .


* 2020 , initializeAggregation.





ClickHouse – .



, ClickHouse , . . INSERT, , . . INSERT , . , .



conflict-free , update. INSERT. , .



mulit-master. . , . , , . . . exactly-once .



, . , INSERT, , , , .





. , INSERT.



. INSERT, SELECT, .



– INSERT. , , . insert_quorum = 2. .



, INSERT , , INSERT, .



SELECT , select_sequential_consistency. , . select_linearizability, .



, SELECT, ZooKeeper. SELECT , , . . INSERT, .



, , - , . , , .



, . , , , .





. .



, , GROUP BY, , , - . , .



. . . .





. progress-bar, . ClickHouse- – , . , , . , 9,31 , 10 .



, : « 10 ?». . .





, . , , . - 0,5 . , , .



- , 10 . , , . , - .





: « max_memory_usage »?





, . . , production .





– GROUP BY . ? , - . , . , . , . . . . . - . , .



, :



  • – , .
  • – merge, . . merge , .


. dataset - . . datasets - , buckets. .





. max_memory_usage 10 . 8 (max_bytes_before_external_group_by = 8 000 000 000, distributed_aggregation_memory_efficient = 1). progress-bar - . . ? , . , , .



. , , . , , . . . , .



, . . . , 10 GROUP BY, , 1 .



, , , .





. .



, ClickHouse – - geospatial-, .



– . . , , .



? , . , - , . , , .



? , .



pointInPolygon. – : lat, lon, . *. - .



* 2020 , .



, . , -, . , . . .



, pointInPolygon, . pointInElliplses, ellipses . , , , . , . . 0 1, ellipses.



– greatCircleDistance. *. . , .



* 2020 , geoDistance, WGS-84 .





https://events.yandex.ru/lib/talks/5330/



– ClickHouse . ClickHouse. , , - . ClickHouse. .



. . , . modelEvaluate.



, , .



, CTR . . , : https://events.yandex.ru/lib/talks/5330/. .



. CatBoost. CatBoost? .





, ?



  • - , . , - , . *. , . - , , . .
  • , , ClickHouse . . , – . , , . , . , ClickHouse. . .
  • , , -, . . .


* ClickHouse.





. ClickHouse ClickHouse-.



- . ClickHouse. - . .



? grep, sed, awk perl. , , - , ClickHouse , . ClickHouse - , . . clickhouse-local.



. – , . . . , ClickHouse - .



, . , , JSON, JSONEachRow. stdin . , , . , , - ClickHouse. , , . , awk, perl, sed. , grep, . . .



, , , , – ClickHouse-. . .





ClickHouse-, ClickHouse-local , . , ClickHouse- .





? , ?



Date DateTime. Date , , ISO 8601, DateTime , - , . , , , *.



, . parseDateTimeBestEffort. , . , . . , , , .



* , date_time_input_format.



, , Hadoop . ClickHouse-local MapReduce jobs Hadoop.



, Parquet. pull request. , *.



* , — !



, , trash SQL. , - regexp’ , clickhouse-local. , Awk, , ClickHouse*.



* — Regexp.





, , .



? ClickHouse . . . , , . , .



? . , INSERT SELECT . . , .



, . . , , , , , – , , . - . .



, ClickHouse-copier. , . Zookeeper. – , .



. . . , .



- , . - .





– production.



.. 538 . 240 . , . .



. , . . lz4, zstd . . , . ClickHouse-copier. , , . - .



, - . , , . , , , , .



, !







! . . real time , ?



. , .



. ., , , ? ? ?



. , , , . - . , . . . . , . . . . , . .



. . ClickHouse?



, ClickHouse. . . . , . , , . , , , .



! Avro, Parquet, MapReduce machine learning, - , ClickHouse - jobs YARN, Mesos? . . , Spark, , , date locality , ?



, , , . , Hadoop MapReduce YT. , ClickHouse YT, , . - , .



. . . YT - ? . . , ClickHouse, tool CSV job?



*. , - . ClickHouse-, - , ClickHouse, YT. .



* .



! ClickHouse-local - - - , ?



, standalone-. – ClickHouse. . . ClickHouse, , , local, , .



! . groupUniqArrayArray order , ?



, , , .



! ! ClickHouse. . Vertica. , . ? CSV. CSV ClickHouse. ClickHouse , . . , , , . , , . - ?



ClickHouse JOIN , ? *. ClickHouse hash JOIN, . . , hash- . . . , , - . , merge JOIN, , .



* , join_algorithm. , , .



! , , master-master , , , ? , , ?



– , pull request. . , . Master-master . -, ClickHouse ZooKeeper. . , , , - - .



! ! ClickHouse-copier , , , ?



ClickHouse-copier , operation-. . , , , . . . , . . , -, . . , ZooKeeper. – *.



* , .




All Articles