Understanding How Databases Store Data on Disk
What is a Database?​
A database is a systematically organized collection of related data, typically stored electronically within a computer system.
Databases are structured into tables, with each table defined by columns (fields) specifying data types and rows (records) containing the actual data.
In non-relational databases, data is stored in collections and documents rather than traditional tables and rows. While this guide focuses on relational databases, non-relational concepts may be addressed in a future system design course.
Example Tables in a Relational Database​
User Table
UserID | Name | PhoneNo |
---|---|---|
1 | Alice | 9090909090 |
2 | Bob | 8080808080 |
Product Table
ProductID | Name | Price |
---|---|---|
1 | IPhone | 1000 |
2 | Chair | 50 |
How Does a Database Store Data on Disk?​
When working with a database management system (DBMS), it's important to understand how data is physically stored on disk. While we interact with data in a structured, tabular format, the underlying storage mechanism involves multiple layers and components.
Disk Structure: Tracks, Sectors, and Blocks​
A disk, whether it's a Hard Disk Drive (HDD) or a Solid-State Drive (SSD), serves as secondary memory. Here's how it's organized:
-
Tracks: Concentric circular paths on the disk surface.
-
Sectors: Pie-shaped divisions within each track.
-
Blocks: The smallest units of storage, defined at the intersection of tracks and sectors. Blocks are the basic units of data transfer to and from the disk.
Storing Table Rows in Disk Blocks​
Let's consider a simplified scenario of storing a row's data:
-
Integer Field: 4 bytes
-
String Field: Assume 60 bytes
-
Another Integer Field: 4 bytes
Each row might occupy around 68 bytes in total.
If a disk block can store 258 bytes, approximately 3 rows could fit into a single block (258 / 72 ≈ 3). When more rows are added to the table, they are stored in additional blocks, ideally placed contiguously to enhance read and write performance.
Typically, a disk block can store around 4 kB of data.
Data Operations and Memory Management​
When an operation is performed on a specific row (e.g., row 2), the entire block containing rows 1, 2, and 3 is loaded into RAM for processing. This approach ensures efficient data retrieval and manipulation by minimizing disk I/O operations.
Databases typically store data in a structured manner, with entire rows stored consecutively to optimize access.
For more in-depth information on data structures used for indexing and retrieval, check out our indexing guide.