三津石智巳

👦🏻👦🏻👧🏻 Father of 3 | 🗺️ Service Reliability Engineering Manager at Rakuten Travel | 📚 Avid Reader | 👍 Wagashi | 👍 Caffe Latte | 👍 Owarai

【感想】Database Internals


Database Internals輪読会があるというので、積ん読を読んでみた。なお、私の背景知識は下記の通り。

  • MongoDBとRedisを使ったアプリケーション開発の経験はそれなりにあり、内部構造も多少勉強したが、WierdTigerは結局よくわかっていない。
  • RDBMSの内部構造については素人。

https://www.linkedin.com/in/mitsuishitomomi/

Preface

"Knowing the history helps to understand differences and motivation better."

 

via Check out this quote from Database Internals - https://learning.oreilly.com/library/view/database-internals/9781492040330/preface01.html

歴史を学ぶのは大事だ。

"To collect material for this book, I studied over 15 books, more than 300 papers, countless blog posts, source code, and the documentation for several open source databases."

 

via Check out this quote from Database Internals - https://learning.oreilly.com/library/view/database-internals/9781492040330/preface01.html

エンジニアの鑑だ。

"The book is arranged into parts that discuss the subsystems and components responsible for storage (Part I) and distribution (Part II)."

 

via Check out this quote from Database Internals - https://learning.oreilly.com/library/view/database-internals/9781492040330/preface01.html

「24時間365日 サーバ/インフラを支える技術」を思い出した。

1. Introduction and Overview

"One way to look at this is that database management systems are applications built on top of storage engines"

 

via Check out this quote from Database Internals - https://learning.oreilly.com/library/view/database-internals/9781492040330/part01.html

DBMSはストレージエンジンの上に作られている。

"MongoDB allows switching between WiredTiger, In-Memory, and the (now-deprecated) MMAPv1 storage engines."

 

via Check out this quote from Database Internals - https://learning.oreilly.com/library/view/database-internals/9781492040330/part01.html

MMAPv1の何が問題だったのか分かっていない。当時MongoDB Universityでは、OSを上手に使っているのが良いんだと言うような説明だったと思う。

"These methods can be used only for a high-level comparison and can be as coarse as choosing between HBase and SQLite, so even a superficial understanding of how each database works and what’s inside it can help you land a more weighted conclusion."

 

via Check out this quote from Database Internals - https://learning.oreilly.com/library/view/database-internals/9781492040330/part01.html

開発者ならばThoughtWorks Technology Radarとかで説得しようとしてはいけないと読んだ。

"it’s usually much better to use a database that slowly saves the data than one that quickly loses it."

 

via Check out this quote from Database Internals - https://learning.oreilly.com/library/view/database-internals/9781492040330/part01.html

確かに。

"Does the database support the required queries?

Is this database able to handle the amount of data we’re planning to store?

How many read and write operations can a single node handle?

How many nodes should the system have?

How do we expand the cluster given the expected growth rate?

What is the maintenance process?"

 

via Check out this quote from Database Internals - https://learning.oreilly.com/library/view/database-internals/9781492040330/part01.html

決して包括的とは言えないが基本的なチェックリストとして使えそう。

"It’d be great if we could use databases as black boxes and never have to take a look inside them, but the practice shows that sooner or later a bug, an outage, a performance regression, or some other problem pops up, and it’s better to be prepared for it."

 

via Check out this quote from Database Internals - https://learning.oreilly.com/library/view/database-internals/9781492040330/part01.html

同意。たまに意見がぶつかるのだが、私も人に聞くことも含めて必要があればどこまでも深く調べる必要がある派。深く調べる必要がない現場はラッキー。

"Benchmarks can be useful to define and test details of the service-level agreement, understanding system requirements, capacity planning, and more."

 

via Check out this quote from Database Internals - https://learning.oreilly.com/library/view/database-internals/9781492040330/part01.html

恥ずかしながらベンチマークを使ったことがない…

Chapter 1. Introduction and Overview

Memory- Versus Disk-Based DBMSの記述はやや物足りなかった。類書の「Designing Data-Intensive Applications」の下記記述は勉強になる。

"Counterintuitively, the performance advantage of in-memory databases is not due to the fact that they don’t need to read from disk. Even a disk-based storage engine may never need to read from disk if you have enough memory, because the operating system caches recently used disk blocks in memory anyway. Rather, they can be faster because they can avoid the overheads of encoding in-memory data structures in a form that can be written to disk [44]."

 

via Check out this quote from Designing Data-Intensive Applications - https://learning.oreilly.com/library/view/designing-data-intensive-applications/9781491903063/ch03.html

10年前の記事だが、キャッシュの大きいRDB vs インメモリデータベース、性能がどれだけ違うのか調べてみると - Publickeyでも言及がある。List of in-memory databases - Wikipediaのリストを見て改めて思うが、Redisが好きだ。ふと思い出したがTokyo Cabinet/Tyrantはいまどうなったのだろう。いつかNikotama XXXみたいなミドルウェアを開発したい。

"It is unfair to say that the in-memory database is the equivalent of an on-disk database with a huge page cache (see “Buffer Management”). Even though pages are cached in memory, serialization format and data layout incur additional overhead and do not permit the same degree of optimization that in-memory stores can achieve."

 

via Check out this quote from Database Internals - https://learning.oreilly.com/library/view/database-internals/9781492040330/ch01.html

と思ったらNoteで言及があった。

"Data files (sometimes called primary files) can be implemented as index-organized tables (IOT), heap-organized tables (heap files), or hash-organized tables (hashed files)."

 

via Check out this quote from Database Internals - https://learning.oreilly.com/library/view/database-internals/9781492040330/ch01.html

Clustered indexes are index-organized tablesで勉強した!Use The Index, Lukeにはまだまだ勉強することが残っている。ちなみに、Use The Index, LukeではIOTを避けるべきという論調だったように記憶しているが、この本は中立的。

参考文献から

を見つけた。

Chapter 5. Transaction Processing and Recovery

"Uncached pages are said to be paged in when they’re loaded from disk. If any changes are made to the cached page, it is said to be dirty, until these changes are flushed back on disk.

 

Since the memory region where cached pages are held is usually substantially smaller than an entire dataset, the page cache eventually fills up and, in order to page in a new page, one of the cached pages has to be evicted."

 

via Check out this quote from Database Internals - https://learning.oreilly.com/library/view/database-internals/9781492040330/ch05.html

page cacheの用語ってなぜか覚えられない。