For the following problems, assume that a disk has block size B = 1024 bytes.
A file has r = 20,000 BOOK records of fixed length. Each record has the following fields: Book_id (4 bytes), Title (45 bytes), and Publisher_name (45 bytes). Book_id is the primary key; assume the values are 1-20000. Titles are not required to be unique but there are few duplicates; assume there are 19950 unique values. There are 50 unique publisher names.
Calculate the record size R in bytes.
R = 4 + 45 + 45 = 94 bytes
Calculate the blocking factor bfr and the number of file blocks b, assuming an unspanned organization.
bfr = B/R = 1024/94 = 10 (rounded down) b = r/bfr = 20000/10 = 2000 (rounded up)
Assume an unordered file organization. Calculate the number of blocks accessed (read and written) in the following cases:
unordered, so must scan from the beginning of the file book ID is primary key, so only a single match average: b/2 = 1000 blocks read worst case: b = 2000 blocks read
unordered, so must scan from the beginning of the file publisher is not unique, so may be multiple matches must read whole file: 2000 blocks read
unordered, so must scan from the beginning of the file range search so multiple matches must read whole file: 2000 blocks read
Assume an ordered file organization with book ID as the ordering field. Calculate the number of blocks accessed (read and written) in the following cases:
ordered on book ID, so can use binary search book ID is primary key, so only a single match read log_{2}(b) blocks, rounded up: 11
unordered on title, so must scan from the beginning of the file title is not unique, so may be multiple matches must read whole file: 2000 blocks read
unordered on publisher, so must scan from the beginning of the file publisher is not unique, so may be multiple matches must read whole file: 2000 blocks read
ordered on book ID, so can use binary search book ID is primary key, so only one record per value range search log_{2}(b) blocks = 11 (round up) blocks to locate book ID 15000 expected number of matches: 5000 because book ID values are 1-20000 and book IDs are unique additional blocks read: 5000/bfr = 500 (rounded up) ... -1 because one of those was read as the last one in the binary search total: 11+500-1 = 510 blocks read
Assume an ordered file organization with publisher as the ordering field. Calculate the number of blocks accessed (read and written) in the following cases:
unordered on book ID, so must scan from the beginning of the file book ID is key, so only one match average: b/2 = 1000 blocks read worst case: b = 2000 blocks read
unordered on title, so must scan from the beginning of the file title is not unique, so may be multiple matches must read whole file: 2000 blocks read
ordered on publisher, so can use binary search publisher is not unique, so may be multiple matches log_{2}(b) blocks = 11 (round up) blocks to locate book ID 15000 expected number of matches: r/# distinct values = 20000/50 = 400 additional blocks read: 400/bfr = 40 (rounded up) ... -1 because one of these was read as the last one in the binary search total blocks read: 11+40-1 = 50
unordered on book ID, so must scan from the beginning of the file range search, so may be multiple matches must read whole file: 2000 blocks read
Assume an hashed file organization with book ID as the hash field. Calculate the number of blocks accessed (read and written) in the following cases:
hashed on book ID book ID is key, so only one match total blocks read: 1
unordered on publisher, so must scan from the beginning of the file publisher is not unique, so may be multiple matches must read whole file: 2000 blocks read
hashed on book ID book ID is primary key, so only one record per value range search expected number of values: 5000 because book IDs are unique and range 1-20000 total blocks read: 2000 (with 5000 book ID values to check, better to just read the whole file)
hashed on book ID book ID is primary key, so only one record per value range search expected number of values: 50 because book IDs are unique and range 1-20000 total blocks read: 50 (look up each value 19950-20000)
Use the same file properties as the previous question. A block pointer is 6 bytes long.
Assume there is a primary index on Book_id. Calculate the number of block accesses needed to search for and retrieve a record given its Book_id.
A primary index has one index record per data block, thus r_{i} = b = 2000 (as calculated in the previous problem). An index record is 4+6 = 10 bytes (R_{i}), and bfr_{i} = floor(B/R_{i}) = 102. There are thus b_{i} = ceiling(r_{i}/bfr_{i}) = 20 index blocks. Binary search on the index requires reading ceiling(log_{2}(b_{i})) = 5 blocks. An equality search on a primary index has one match, requiring 1 data block to be read. Total blocks read = 5+1 = 6.
Assume there is a clustering index on Publisher_name. Calculate the number of block accesses needed to search for and retrieve records for a particular Publisher_name.
A clustering index has one index record per distinct value, thus r_{i} = 50 (because there are 50 distinct publisher names). An index record is 45+6 = 51 bytes (R_{i}), and bfr_{i} = floor(B/R_{i}) = 20. There are thus b_{i} = ceiling(r_{i}/bfr_{i}) = 3 index blocks. Binary search on the index requires reading ceiling(log_{2}(b_{i})) = 2 blocks. An equality search on a clustering index may have r/d = 400 matches. They will be consecutive in the file, so ceiling(400/bfr) = 40 data blocks are read. Total blocks read = 2+40 = 42.
Assume there is a clustering index on Title. Calculate the number of block accesses needed to search for and retrieve records with titles starting with A-D.
A clustering index has one index record per distinct value, thus r_{i} = 19950 (because there are 19950 distinct titles). An index record is 45+6 = 51 bytes (R_{i}), and bfr_{i} = floor(B/R_{i}) = 20. There are thus b_{i} = ceiling(r_{i}/bfr_{i}) = 998 index blocks. Binary search on the index requires reading ceiling(log_{2}(b_{i})) = 10 blocks. A range search will have s matches. Assuming titles are equally likely to start with each letter (not a terribly accurate assumption), it is expected that s = r*4/26 = 3077. The matches will be consecutive in the file, so ceiling(3077/bfr) = 308 data blocks are read. Total blocks read = 10+308 = 318.
Assume there is a secondary index on Publisher_name. Calculate the number of block accesses needed to search for and retrieve records for a particular Publisher_name.
A secondary index has one index record per distinct value (assuming option 3), thus r_{i} = 50 (because there are 50 distinct publisher names). An index record is 45+6 = 51 bytes (R_{i}), and bfr_{i} = floor(B/R_{i}) = 20. There are thus b_{i} = ceiling(r_{i}/bfr_{i}) = 3 index blocks. Binary search on the index requires reading ceiling(log_{2}(b_{i})) = 2 blocks. The blocking factor for indirect blocks is ceiling(B/6) = 170. There are s = r/d = 400 matches per publisher name, so there are ceiling(400/170) = 3 indirect blocks per index record. s matches per publisher name means s = 400 data blocks read. Total blocks read = 2+3+400 = 405.
For each of the preceding problems, how many levels would be needed if a multilevel index was used?
(a) - the first level index has 20 blocks. The second level index is a primary index with one index record per first-level block (20), R_{i} = 10 and bfr_{i} = 102. Only one block is needed for the second level index, so only two levels are needed. (b) - the first level index has 3 blocks. The second level index is a primary index with one index record per first-level block (3), R_{i} = 51 and bfr_{i} = 20. Only one block is needed for the second level index, so only two levels are needed. (c) - the first level index has 998 blocks. The second level index is a primary index with one index record per first-level block (998), R_{i} = 51 and bfr_{i} = 20. 50 blocks are needed for the second level index. The third level index is a primary index with one index record per second-level block (50) and bfr_{i} = 20, so 3 blocks are needed. The fourth level index is a primary index with one index record per third-level block (3) and bfr_{i} = 20, so only 1 blocks is needed. There are four levels total. (d) - same as (b).