三津石智巳

👦🏻👦🏻👧🏻 Father of 3 | 🗺️ Service Reliability Engineering Manager at Rakuten Travel | 📚 Avid Reader | 👍 Wagashi | 👍 Caffe Latte | 👍 Owarai

【感想】Elasticsearch: The Definitive Guide シーズン2

Chapter 2, 4, 9がこの本のマニアックな部分なので積極的に読んでいきたい。

"If you set replication to async, it will return success to the client as soon as the request has been executed on the primary shard. It will still forward the request to the replicas, but you will not know whether the replicas succeeded."

via Check out this quote from Elasticsearch: The Definitive Guide - https://learning.oreilly.com/library/view/elasticsearch-the-definitive/9781449358532/part01ch04.html

MongoDBでいうところのwrite concernのようなものかと。7.3の公式ドキュメントでこの記述がなかなか見つからない。

"By default, the primary shard requires a quorum, or majority, of shard copies (where a shard copy can be a primary or a replica shard) to be available before even attempting a write operation."

via Check out this quote from Elasticsearch: The Definitive Guide - https://learning.oreilly.com/library/view/elasticsearch-the-definitive/9781449358532/part01ch04.html

公式ドキュメントの構造がいまだに理解できないのだが、関係するのはおそらくここらへん。


By default, write operations only wait for the primary shards to be active before proceeding (i.e. wait_for_active_shards=1). This default can be overridden in the index settings dynamically by setting index.write.wait_for_active_shards. To alter this behavior per operation, the wait_for_active_shards request parameter can be used.

https://www.elastic.co/guide/en/elasticsearch/reference/7.3/docs-index_.html

これは7.3までにデフォルトquorumが変わったということか?


で変わったことを確認。Elasticsearchは公式ドキュメントよりGithub issuesの方が分かりやすい気がする。

By default, index.translog.durability is set to request meaning that Elasticsearch will only report success of an index, delete, update, or bulk request to the client after the translog has been successfully fsynced and committed on the primary and on every allocated replica. If index.translog.durability is set to async then Elasticsearch fsyncs and commits the translog every index.translog.sync_interval (defaults to 5 seconds).

https://www.elastic.co/guide/en/elasticsearch/reference/7.3/index-modules-translog.html

  • index.translog.durability: request
  • wait_for_active_shards: 1

がデフォルトということは分かったが、結局いつclientにacknowledgeが返るかのデフォルトがわからない。

Since replicas can be offline, the primary is not required to replicate to all replicas. Instead, Elasticsearch maintains a list of shard copies that should receive the operation. This list is called the in-sync copies and is maintained by the master node. As the name implies, these are the set of "good" shard copies that are guaranteed to have processed all of the index and delete operations that have been acknowledged to the user.

3. Forward the operation to each replica in the current in-sync copies set. If there are multiple replicas, this is done in parallel.
4. Once all replicas have successfully performed the operation and responded to the primary, the primary acknowledges the successful completion of the request to the client.

https://www.elastic.co/guide/en/elasticsearch/reference/7.3/docs-replication.html#_basic_write_model

ふわっとしているが全てのin-sync copies上でindexingが終わって始めてclientにacknowledgeが返るように読める。MongoDBでいうところのなんだ?


Stennie氏はStack OverflowのMongoDBトピックで個人的に信頼のおける方。曰く、MongoDBにおいてw: allのようなwrite concernはない。強い一貫性を求める場合、read preferenceをprimary(デフォルト)とし、write concernをmajorityとするとのこと。なるほど。言われてみれば当たり前だがwrite/readを同時に考えずにreplicationを語ることはできないと気がついた。

ここまで読んでreplicationに対する理解が圧倒的に弱いことがわかったので、別の本へ移動。