Data storage on Magnetic Disk : DBMS perspective

Why read about the disk structure ?

When querying data from database, the server has to optimize the access plan for each database object. Generation of plan must be done keeping in mind the disk structure. This significantly affects the read time when the data is in the volumes of giga bytes.

Magnetic disks are widely used for database applications. They support direct access to a desired location in disk. Job of DBMS is to provide seamless access to data on disk, so that applications need not worry about whether data is in main memory or disk.

Disk Structure:

Data is stored on disk, in units called blocks. A block is a contiguous sequence of bytes. Even though we need to read a single row of size 1 Kilo byte, we have to read the whole block. Blocks are arranged in concentric rings called tracks. Tracks can be recorded on one or both surfaces of a platter. Accordingly we refer to platters as single-sided or double-sided. The set of all tracks with the same diameter is called a cylinder, because the space occupied by these tracks is shaped like a cylinder; a cylinder contains one track per platter surface. Each track is divided into arcs called sectors. When the disk is initialized, we can set the size of a disk block as a multiple of the sector size.

disk-structure45-10-638

An array of disk heads, one per recorded surface, is moved as a unit. When one head is positioned over a block, the other heads are in identical positions with respect to their platters. To read or write a block, a disk head must be positioned on top of the block. As the size of a platter decreases, seek times also decrease since we have to move a disk head a smaller distance. Typical platter diameters are 3.5 inches.

Disk Controller :

A disk controller interfaces a disk drive to the computer. It issues instructions to read or write a sector by moving the arm assembly and transferring data to and from the disk surfaces. A checksum is computed for when data is written to a sector and stored with the sector. The checksum is computed again when the data on the sector is read back. If the sector is corrupted or the read is faulty for some reason, the checksums will differ. The controller computes checksums and if it detects an error, it tries to read the sector again.

While direct access to any desired location in main memory takes approximately the same time, determining the time to access a location on disk is more complicated. The time to access a disk block has several components.

  1. Seek time is the time taken to move the disk heads to the track on which a desired block is located.
  2. Rotational delay is the waiting time for the desired block to rotate under the disk head. It is the time required for half a rotation on average and is usually less than seek time.
  3. Transfer time is the time to actually read or write the data in the block once the head is positioned, that is, the time for the disk to rotate over the block.