Dans le rapport, il est prévu de prendre en compte les fonctionnalités peu connues ou insuffisamment couvertes de ClickHouse dans la documentation: agrégation incrémentielle et manipulations avec les états des fonctions d'agrégation, copie inter-cluster, exécution de requêtes sans utiliser de serveur, etc. Des exemples tirés de la pratique du développement de services Yandex seront donnés: comment tirer le maximum du système.
! ! ClickHouse. , , .
, . , , . . . .
ClickHouse, , . - clickstream. , , -. , , ..
, , . . - .
. , , pupkin.narod.ru; , yandex.ru. . ? .
. , . MergeTree, .
, . CREATE TABLE. ORDER BY – . , , . . , . . , , Hash . . . hash .
, . - . , . .
4,5 , . . 3,5 . . 4,5 .
1/10. from . 0,6 . , - , 2,5 . , , , . , – . , 10 . , .
, ?
- -, . . , .
- . , unix timestamp.
- timestamp, ( , ). , , DevOps, . . , .
- - . , .
- , .
- , url, hash . , , , . , url . - hash-.
- - .
- . , - .
- , . , . - hash. , . ? . - , . , - , .
- . – , . . .
, , :
- -, , , hash-, . . , , .
- . , . . . . . 1/10 hashes .
- , .
:
- – 1/10 .
- – — - , , 1 000 000. . , , 1 000 000, . , , , , . . . 1 000 000 10 000 000 1 000 000 000? – _sample_factor. : x _sample_factor, , .
- – SAMPLE OFFSET. . 1/10 , : « , , 1/10 ». : SAMPLE 1/10 OFFSET 1/10. .
- . , , . , . , , . ? , , , , , overhead.
– . , , , count distinct, uniq. uniq 4 .
. . , If. , : sumIf. , . – , , – , - UInt8. , , . , , .
? – . ..
Google. .
– Array. , Array. , , . .
. . , .
groupArray, . , … . . . .
groupUniqArray. .
Array - , , .
, groupArrayArray? , , . .
groupUniqArrayArray – , .
, . , . , , , , , .
. . . sum, Array, If , , . . . sumArrayIf SumIfArray.
? . , . – , , – : . sumIfArray. . . Array sumIf .
– sumArrayIf. , – : sumArray .
. : sumForEachStateForEachIfArrayIfState. , , , . . , .
. , , . , . .
, , . . .
, - , .
, count distinct? Hash table. hash‑ .
, , . : « , , ». - .
, ClickHouse . –State, . , , AggregateFunction - .
, . . . AggtregateFunction. , , .
–Merge. , , .
, .
uniq . .
. –State. - , . - . , UTF-8. , , UTF-8.
? AggregateFunction . . – , – .
. , , . ?
– .
, ClickHouse – , – . clickstream, , , . , . ClickHouse , .
. - , , .
. AggregatingMergeTree. , . . , - . . , . . , , count distinct, .
, . ClickHouse, , .
. ?
- – , . , . , . , ClickHouse-, . .
- , , , , – . sum sumIf – , .
- , *. , , , arrayReduce. , , , , . . –State, .
* 2020 , initializeAggregation.
ClickHouse – .
, ClickHouse , . . INSERT, , . . INSERT , . , .
conflict-free , update. INSERT. , .
mulit-master. . , . , , . . . exactly-once .
, . , INSERT, , , , .
. , INSERT.
. INSERT, SELECT, .
– INSERT. , , . insert_quorum = 2. .
, INSERT , , INSERT, .
SELECT , select_sequential_consistency. , . select_linearizability, .
, SELECT, ZooKeeper. SELECT , , . . INSERT, .
, , - , . , , .
, . , , , .
. .
, , GROUP BY, , , - . , .
. . . .
. progress-bar, . ClickHouse- – , . , , . , 9,31 , 10 .
, : « 10 ?». . .
, . , , . - 0,5 . , , .
- , 10 . , , . , - .
: « max_memory_usage »?
, . . , production .
– GROUP BY . ? , - . , . , . , . . . . . - . , .
, :
- – , .
- – merge, . . merge , .
. dataset - . . datasets - , buckets. .
. max_memory_usage 10 . 8 (max_bytes_before_external_group_by = 8 000 000 000, distributed_aggregation_memory_efficient = 1). progress-bar - . . ? , . , , .
. , , . , , . . . , .
, . . . , 10 GROUP BY, , 1 .
, , , .
. .
, ClickHouse – - geospatial-, .
– . . , , .
? , . , - , . , , .
? , .
pointInPolygon. – : lat, lon, . *. - .
* 2020 , .
, . , -, . , . . .
, pointInPolygon, . pointInElliplses, ellipses . , , , . , . . 0 1, ellipses.
– greatCircleDistance. *. . , .
* 2020 , geoDistance, WGS-84 .
https://events.yandex.ru/lib/talks/5330/
– ClickHouse . ClickHouse. , , - . ClickHouse. .
. . , . modelEvaluate.
, , .
, CTR . . , : https://events.yandex.ru/lib/talks/5330/. .
. CatBoost. CatBoost? .
, ?
- - , . , - , . *. , . - , , . .
- , , ClickHouse . . , – . , , . , . , ClickHouse. . .
- , , -, . . .
* ClickHouse.
. ClickHouse ClickHouse-.
- . ClickHouse. - . .
? grep, sed, awk perl. , , - , ClickHouse , . ClickHouse - , . . clickhouse-local.
. – , . . . , ClickHouse - .
, . , , JSON, JSONEachRow. stdin . , , . , , - ClickHouse. , , . , awk, perl, sed. , grep, . . .
, , , , – ClickHouse-. . .
ClickHouse-, ClickHouse-local , . , ClickHouse- .
? , ?
Date DateTime. Date , , ISO 8601
, DateTime , - , . , , , *.
, . parseDateTimeBestEffort. , . , . . , , , .
* , date_time_input_format
.
, , Hadoop . ClickHouse-local MapReduce jobs Hadoop.
, Parquet. pull request. , *.
* , — !
, , trash SQL. , - regexp’ , clickhouse-local. , Awk, , ClickHouse*.
* — Regexp
.
, , .
? ClickHouse . . . , , . , .
? . , INSERT SELECT . . , .
, . . , , , , , – , , . - . .
, ClickHouse-copier. , . Zookeeper. – , .
. . . , .
- , . - .
– production.
.. 538 . 240 . , . .
. , . . lz4, zstd . . , . ClickHouse-copier. , , . - .
, - . , , . , , , , .
, !
! . . real time , ?
. , .
. ., , , ? ? ?
. , , , . - . , . . . . , . . . . , . .
. . ClickHouse?
, ClickHouse. . . . , . , , . , , , .
! Avro, Parquet, MapReduce machine learning, - , ClickHouse - jobs YARN, Mesos? . . , Spark, , , date locality , ?
, , , . , Hadoop MapReduce YT. , ClickHouse YT, , . - , .
. . . YT - ? . . , ClickHouse, tool CSV job?
*. , - . ClickHouse-, - , ClickHouse, YT. .
* .
! ClickHouse-local - - - , ?
, standalone-. – ClickHouse. . . ClickHouse, , , local, , .
! . groupUniqArrayArray order , ?
, , , .
! ! ClickHouse. . Vertica. , . ? CSV. CSV ClickHouse. ClickHouse , . . , , , . , , . - ?
ClickHouse JOIN , ? *. ClickHouse hash JOIN, . . , hash- . . . , , - . , merge JOIN, , .
* , join_algorithm
. , , .
! , , master-master , , , ? , , ?
– , pull request. . , . Master-master . -, ClickHouse ZooKeeper. . , , , - - .
! ! ClickHouse-copier , , , ?
ClickHouse-copier , operation-. . , , , . . . , . . , -, . . , ZooKeeper. – *.
* , .