Made in USA: Enterprise Application Services

Cost Estimation for SELECT Operations

Cost Estimation for SELECT Operations

For a given SELECT query, the DBMS has a number of possible execution strategies. Here are a few that we will discuss (the list is not complete):

  1. Linear search (brute force)
  2. Binary search
  3. Using a primary database index / hash key to retrieve a single record


Number of EMPLOYEE records (r)100,000
Number of disk blocks (b)10,000
Blocking factor (bfr) (records per block)10

Linear Search

The DBMS must use a linear search when no database index exists on the selection condition (e.g., the EmpID). This is precisely the type of operation that we want to prevent with proper indexing.

Given our assumptions, the cost of a linear search for this query would be:

  • C = b/2 on average if the record exists
  • C = b if the record does not exist

So, for the query above, C = b/2 = 5,000 block accesses.

Binary Search

The DBMS might employ a binary search on an index with non-unique entries. Say, for instance, that the database developer has a nonclustered index that groups by city, then first name, then last name, and they need to retrieve all customers who live in a certain city. The DBMS can use a binary search to find the first matching city record and then retrieve all subsequent city records by traversing down the database index row by row.

The cost of a performing a binary search is exactly the same as performing a binary search anywhere else, namely, C = log2b.

For the query above, the database developer has C = log2b = log210,000 = 14 block accesses.

Primary Index / Hash

If the column is unique (like a primary key, for example) then the database index can be implemented as a hash table. Such an index on the EmpID column would allow the database developer to hash directly to the correct employee record in constant time.

In general, static and linear hashes have a cost of C = 1.

For SELECT queries with composite selection conditions, the cost must be evaluated based on the database index structure which exists for each column involved in the join condition—which goes into more detail than this paper is designed to cover. Nevertheless, even for our very simple example query, we can mathematically verify an enormous performance improvement by using a good database index.