How disk access impacts DBMS Performance

From the discussion on disk structure, we understand the DBMS performance to be defined by 3 phenomenons

  1. Data has to be in memory for the DBMS to operate on it. This is computer system limitation.
  2. The unit for data transfer between disk and main memory is a block; if a single item on a block is needed, the entire block has to be transferred. Reading or writing a disk block is called an I/O (for input/output) operation.
  3. The time to read or write a block varies, depending on the location of the data:
    access time = seek time + rotational delay + transfer time

performance

The time taken for database operations is impacted significantly by how data is stored on disk. The time for moving blocks to or from disk usually dominates the time taken for database operations. To minimize this time, it is necessary to locate data records strategically on disk. Need to consider geometry and mechanics of disks.

For instance, if two records are frequently used together, we should place them close on the disk. By saying “close” we mean they should be on the same block or two consecutive blocks. If that is not possible could be on the same track, the same cylinder, or an adjacent cylinder.

This idea of closeness should be considered at the time of storing, retrieving the data to minimize the disk I/O. Database processing engines sometimes define the block, cylinder, track to suite the nature of their data. We will later discuss on the data distribution mechanisms.