三津石智巳

👦🏻👦🏻👧🏻 Father of 3 | 🗺️ Service Reliability Engineering Manager at Rakuten Travel | 📚 Avid Reader | 👍 Wagashi | 👍 Caffe Latte | 👍 Owarai

【感想】Designing Data-Intensive Applications

もう何度目になるかの感想文。

"In the lambda approach, the stream processor consumes the events and quickly produces an approximate update to the view; the batch processor later consumes the same set of events and produces a corrected version of the derived view."

 

via Check out this quote from Designing Data-Intensive Applications - https://learning.oreilly.com/library/view/-/9781491903063/ch12.html

そもそもラムダアーキテクチャってそういうことだったのという感想。別の何かと勘違いしているかもしれない。

いずれにせよ、ストリーム処理をバッチ処理で補完するというのは納得感があるところなのだが、すぐあとに実装上の課題がいろいろと列挙されている。

一方で、http://www.cidrdb.org/cidr2015/Papers/CIDR15_Paper25u.pdfのような提案もなされていることから、それらの課題が解かれるのも時間の問題か。

"The tension between these philosophies has lasted for decades (both Unix and the relational model emerged in the early 1970s) and still isn’t resolved. For example, I would interpret the NoSQL movement as wanting to apply a Unix-esque approach of low-level abstractions to the domain of distributed OLTP data storage."

 

via Check out this quote from Designing Data-Intensive Applications - https://learning.oreilly.com/library/view/-/9781491903063/ch12.html

UNIXとデータベースの思想を比較して、NoSQLムーブメントを再解釈している。控えめに言ってかっこよすぎる。

いまさらだが、この本を12章から読むと面白いのではないか。

"This process is remarkably similar to setting up a new follower replica (see “Setting Up New Followers”), and also very similar to bootstrapping change data capture in a streaming system (see “Initial snapshot”)."

 

via Check out this quote from Designing Data-Intensive Applications - https://learning.oreilly.com/library/view/-/9781491903063/ch12.html

この処理が、全く異なるシステムのどの処理と類似しているのかを指摘できるのはかっこよいよな。

"Federation and unbundling are two sides of the same coin: composing a reliable, scalable, and maintainable system out of diverse components. Federated read-only querying requires mapping one data model into another, which takes some thought but is ultimately quite a manageable problem. I think that keeping the writes to several storage systems in sync is the harder engineering problem, and so I will focus on it."

 

via Check out this quote from Designing Data-Intensive Applications - https://learning.oreilly.com/library/view/-/9781491903063/ch12.html

違和感の正体はこれかもしれない。federationは気合でも解決できそうな問題なのに対して、unbundlingは気合ではどうにもならなそう。そして、「アーキテクチャ」はそういう困難な問題を解くためのものだの思っている。unbundlingがシステム要求として求められないという現場は当然あるだろうが。

"The goal of unbundling is not to compete with individual databases on performance for particular workloads; the goal is to allow you to combine several different databases in order to achieve good performance for a much wider range of workloads than is possible with a single piece of software. It’s about breadth, not depth—in the same vein as the diversity of storage and processing models that we discussed in “Comparing Hadoop to Distributed Databases”."

 

via Check out this quote from Designing Data-Intensive Applications - https://learning.oreilly.com/library/view/-/9781491903063/ch12.html

1つのソフトウェアでは成し遂げられないこと(性能)を成し遂げるためにunbundlingが必要という説明。よく言う、なぜ組織が必要か、個人ではできないことをするためだみたいな説明。

"Just because an application uses a data system that provides comparatively strong safety properties, such as serializable transactions, that does not mean the application is guaranteed to be free from data loss or corruption. For example, if an application has a bug that causes it to write incorrect data, or delete data from a database, serializable transactions aren’t going to save you."

 

via Check out this quote from Designing Data-Intensive Applications - https://learning.oreilly.com/library/view/-/9781491903063/ch12.html

これ、非常に大事な視点よな。データベースだけで完璧では意味なくて、システムとして、さらに言えばビジネスとして、究極には社会として安全性を保証していかなければいけない。

"If two people concurrently register the same username or book the same seat, you can send one of them a message to apologize, and ask them to choose a different one. This kind of change to correct a mistake is called a compensating transaction [59, 60]."

 

via Check out this quote from Designing Data-Intensive Applications - https://learning.oreilly.com/library/view/-/9781491903063/ch12.html

補償処理のことをcompensating transactionというらしい。そのままだが、専門用語があるのだなと改めて。

"the apology workflow already needs to be part of your business processes anyway in order to deal with forklift incidents, and so it might be unnecessary to require a linearizable constraint on the number of items in stock"

 

via Check out this quote from Designing Data-Intensive Applications - https://learning.oreilly.com/library/view/-/9781491903063/ch12.html

ここらへん興味あるな。謝罪のワークフロー(とcompensating transaction)はシステム・ソフトウェア起因でなくてもいずれにせよ必要になるので、であるのならば、この例では在庫の残数に対してlinearizabilityはいらないのかもしれない。

"Similarly, many airlines overbook airplanes in the expectation that some passengers will miss their flight, and many hotels overbook rooms, expecting that some guests will cancel. In these cases, the constraint of “one person per seat” is deliberately violated for business reasons, and compensation processes (refunds, upgrades, providing a complimentary room at a neighboring hotel) are put in place to handle situations in which demand exceeds supply."

 

via Check out this quote from Designing Data-Intensive Applications - https://learning.oreilly.com/library/view/-/9781491903063/ch12.html

ビジネス上の理由から(概念的な)一意制約に対する違反が行われることは普通にある。

こういうビジネスの全体感(特にcompensating transactionのような守りの要件)を事後的にではなく設計できる人材ってどういう方なんだろう。

"Another way of looking at coordination and constraints: they reduce the number of apologies you have to make due to inconsistencies, but potentially also reduce the performance and availability of your system, and thus potentially increase the number of apologies you have to make due to outages. You cannot reduce the number of apologies to zero, but you can aim to find the best trade-off for your needs—the sweet spot where there are neither too many inconsistencies nor too many availability problems."

 

via Check out this quote from Designing Data-Intensive Applications - https://learning.oreilly.com/library/view/-/9781491903063/ch12.html

面白い。調整と制約をきつくすることで一貫性に起因する謝罪の数を減らすことができるかもしれないか、それは性能と可用性に起因する謝罪の数を増やすことになるかもしれない。

"I hope that in the future we will see more self-validating or self-auditing systems that continually check their own integrity, rather than relying on blind trust [68]."

 

via Check out this quote from Designing Data-Intensive Applications - https://learning.oreilly.com/library/view/-/9781491903063/ch12.html

盲目的に他コンポーネントを信頼するのではなく、継続的にvalidationとauditにより監視をする。

"Cryptographic auditing and integrity checking often relies on Merkle trees [74], which are trees of hashes that can be used to efficiently prove that a record appears in some dataset (and a few other things)."

 

via Check out this quote from Designing Data-Intensive Applications - https://learning.oreilly.com/library/view/-/9781491903063/ch12.html

integrity checkの手法にMerkle treesがある。

 

最後のSummaryは本当によくまとまっている。こういう技術ビジョンを持った組織を率いたいですし、こういう志向の人材・エンジニアを輩出できる組織でありたい!