For the following problems, assume that a disk has block size B = 1024 bytes.

  1. A file has r = 20,000 BOOK records of fixed length. Each record has the following fields: Book_id (4 bytes), Title (45 bytes), and Publisher_name (45 bytes). Book_id is the primary key; assume the values are 1-20000. Titles are not required to be unique but there are few duplicates; assume there are 19950 unique values. There are 50 unique publisher names.

    1. Calculate the record size R in bytes.

      	  R = 4 + 45 + 45 = 94 bytes
      	
    2. Calculate the blocking factor bfr and the number of file blocks b, assuming an unspanned organization.

      	  bfr = B/R = 1024/94 = 10  (rounded down)
      	  b = r/bfr = 20000/10 = 2000  (rounded up)
      	
    3. Assume an unordered file organization. Calculate the number of blocks accessed (read and written) in the following cases:

      1. Searching for book ID 1000.
        	      unordered, so must scan from the beginning of the file
        	      book ID is primary key, so only a single match
        
        		average: b/2 = 1000 blocks read
        		worst case: b = 2000 blocks read
        	    
      2. Searching for books published by DAW.
        	      unordered, so must scan from the beginning of the file
        	      publisher is not unique, so may be multiple matches
        
        		must read whole file: 2000 blocks read
        	    
      3. Searching for books with ID >= 15000.
        	      unordered, so must scan from the beginning of the file
        	      range search so multiple matches
        
        		must read whole file: 2000 blocks read
        	    
    4. Assume an ordered file organization with book ID as the ordering field. Calculate the number of blocks accessed (read and written) in the following cases:

      1. Searching for book ID 1000.
        	      ordered on book ID, so can use binary search
        	      book ID is primary key, so only a single match
        
        	        read log2(b) blocks, rounded up: 11
        	    
      2. Searching for books with the title "Foreigner".
        	      unordered on title, so must scan from the beginning of the file
        	      title is not unique, so may be multiple matches
        
        		must read whole file: 2000 blocks read
        	    
      3. Searching for books published by DAW.
        	      unordered on publisher, so must scan from the beginning of the file
        	      publisher is not unique, so may be multiple matches
        
        		must read whole file: 2000 blocks read
        	    
      4. Searching for books with ID >= 15000.
        	      ordered on book ID, so can use binary search
        	      book ID is primary key, so only one record per value
        	      range search
        
        	        log2(b) blocks = 11 (round up) blocks to locate book ID 15000
        
        	        expected number of matches: 5000 because book ID values are 1-20000 and book IDs are unique
        	        additional blocks read: 5000/bfr = 500 (rounded up)
        	           ... -1 because one of those was read as the last one in the binary search
        
        	        total: 11+500-1 = 510 blocks read
        	    
    5. Assume an ordered file organization with publisher as the ordering field. Calculate the number of blocks accessed (read and written) in the following cases:

      1. Searching for book ID 1000.
        	      unordered on book ID, so must scan from the beginning of the file
        	      book ID is key, so only one match
        
        	        average: b/2 = 1000 blocks read
        	        worst case: b = 2000 blocks read
        	    
      2. Searching for books with the title "Foreigner".
        	      unordered on title, so must scan from the beginning of the file
        	      title is not unique, so may be multiple matches
        
        		must read whole file: 2000 blocks read
        	    
      3. Searching for books published by DAW.
        	      ordered on publisher, so can use binary search
        	      publisher is not unique, so may be multiple matches
        
        	        log2(b) blocks = 11 (round up) blocks to locate book ID 15000
        
        	        expected number of matches: r/# distinct values = 20000/50 = 400
           	        additional blocks read: 400/bfr = 40 (rounded up)
        	           ... -1 because one of these was read as the last one in the binary search
        
        	        total blocks read: 11+40-1 = 50
        	    
      4. Searching for books with ID >= 15000.
        	      unordered on book ID, so must scan from the beginning of the file
        	      range search, so may be multiple matches
        
        		must read whole file: 2000 blocks read
        	    
    6. Assume an hashed file organization with book ID as the hash field. Calculate the number of blocks accessed (read and written) in the following cases:

      1. Searching for book ID 1000.
        	      hashed on book ID
        	      book ID is key, so only one match
        
          	        total blocks read: 1
        	    
      2. Searching for books published by DAW.
        	      unordered on publisher, so must scan from the beginning of the file
        	      publisher is not unique, so may be multiple matches
        
        		must read whole file: 2000 blocks read
        	    
      3. Searching for books with ID >= 15000.
        	      hashed on book ID
        	      book ID is primary key, so only one record per value
        	      range search
        
        	        expected number of values: 5000 because book IDs are unique and range 1-20000
        
        	        total blocks read: 2000 (with 5000 book ID values to check, better to just read the whole file)
        	    
      4. Searching for books with ID >= 19950.
        	      hashed on book ID
        	      book ID is primary key, so only one record per value
        	      range search
        
        	        expected number of values: 50 because book IDs are unique and range 1-20000
        
        	        total blocks read: 50 (look up each value 19950-20000)
        	    
  2. Use the same file properties as the previous question. A block pointer is 6 bytes long.

    1. Assume there is a primary index on Book_id. Calculate the number of block accesses needed to search for and retrieve a record given its Book_id.

      	  A primary index has one index record per data block, thus ri = b = 2000 (as calculated in the previous problem).
      
      	  An index record is 4+6 = 10 bytes (Ri), and bfri = floor(B/Ri) = 102.
      
      	  There are thus bi = ceiling(ri/bfri) = 20 index blocks.
      	  Binary search on the index requires reading ceiling(log2(bi)) = 5 blocks.
      	  An equality search on a primary index has one match, requiring 1 data block to be read.
      
      	  Total blocks read = 5+1 = 6.
      	
    2. Assume there is a clustering index on Publisher_name. Calculate the number of block accesses needed to search for and retrieve records for a particular Publisher_name.

      	  A clustering index has one index record per distinct value, thus ri = 50 (because there are 50 distinct publisher names).
      
      	  An index record is 45+6 = 51 bytes (Ri), and bfri = floor(B/Ri) = 20.
      
      	  There are thus bi = ceiling(ri/bfri) = 3 index blocks.
      	  Binary search on the index requires reading ceiling(log2(bi)) = 2 blocks.
      	  An equality search on a clustering index may have r/d = 400 matches.  They will be consecutive in the file,
      	  so ceiling(400/bfr) = 40 data blocks are read.
      
      	  Total blocks read = 2+40 = 42.
      	
    3. Assume there is a clustering index on Title. Calculate the number of block accesses needed to search for and retrieve records with titles starting with A-D.

      	  A clustering index has one index record per distinct value, thus ri = 19950 (because there are 19950 distinct titles).
      
      	  An index record is 45+6 = 51 bytes (Ri), and bfri = floor(B/Ri) = 20.
      
      	  There are thus bi = ceiling(ri/bfri) = 998 index blocks.
      	  Binary search on the index requires reading ceiling(log2(bi)) = 10 blocks.
      	  A range search will have s matches.  Assuming titles are equally likely to start with each letter (not a
      	  terribly accurate assumption), it is expected that s = r*4/26 = 3077.  The matches will be consecutive in the file,
      	  so ceiling(3077/bfr) = 308 data blocks are read.
      	  
      	  Total blocks read = 10+308 = 318.
      	
    4. Assume there is a secondary index on Publisher_name. Calculate the number of block accesses needed to search for and retrieve records for a particular Publisher_name.

      	  A secondary index has one index record per distinct value (assuming option 3), thus ri = 50 (because there are 50 distinct publisher names).
      
      	  An index record is 45+6 = 51 bytes (Ri), and bfri = floor(B/Ri) = 20.
      
      	  There are thus bi = ceiling(ri/bfri) = 3 index blocks.
      	  Binary search on the index requires reading ceiling(log2(bi)) = 2 blocks.
      
      	  The blocking factor for indirect blocks is ceiling(B/6) = 170.  There are s = r/d = 400 matches per publisher name,
      	  so there are ceiling(400/170) = 3 indirect blocks per index record.
      
      	  s matches per publisher name means s = 400 data blocks read.
      	  
      	  Total blocks read = 2+3+400 = 405.
      	
    5. For each of the preceding problems, how many levels would be needed if a multilevel index was used?

      	  (a) - the first level index has 20 blocks.  The second level index is a primary index with one index record per
      	  first-level block (20), Ri = 10 and bfri = 102.  Only one block is needed for the second level index, so only two levels are needed.
      
      	  (b) - the first level index has 3 blocks.  The second level index is a primary index with one index record per
      	  first-level block (3), Ri = 51 and bfri = 20.  Only one block is needed for the second level index, so only two levels are needed.
      
      	  (c) - the first level index has 998 blocks.  The second level index is a primary index with one index record per
      	  first-level block (998), Ri = 51 and bfri = 20.  50 blocks are needed for the second level index.  The third level index is a
      	  primary index with one index record per second-level block (50) and bfri = 20, so 3 blocks are needed.  The fourth level index
      	  is a primary index with one index record per third-level block (3) and bfri = 20, so only 1 blocks is needed.  There are four levels total.
      
      	  (d) - same as (b).
      	

Valid HTML 4.01!