For the following problems, assume that a disk has block size B = 1024 bytes.

1. A file has r = 20,000 BOOK records of fixed length. Each record has the following fields: Book_id (4 bytes), Title (45 bytes), and Publisher_name (45 bytes). Book_id is the primary key; assume the values are 1-20000. Titles are not required to be unique but there are few duplicates; assume there are 19950 unique values. There are 50 unique publisher names.

1. Calculate the record size R in bytes.

```	  R = 4 + 45 + 45 = 94 bytes
```
2. Calculate the blocking factor bfr and the number of file blocks b, assuming an unspanned organization.

```	  bfr = B/R = 1024/94 = 10  (rounded down)
b = r/bfr = 20000/10 = 2000  (rounded up)
```
3. Assume an unordered file organization. Calculate the number of blocks accessed (read and written) in the following cases:

1. Searching for book ID 1000.
```	      unordered, so must scan from the beginning of the file
book ID is primary key, so only a single match

average: b/2 = 1000 blocks read
worst case: b = 2000 blocks read
```
```	      unordered, so must scan from the beginning of the file
publisher is not unique, so may be multiple matches

```
3. Searching for books with ID >= 15000.
```	      unordered, so must scan from the beginning of the file
range search so multiple matches

```
4. Assume an ordered file organization with book ID as the ordering field. Calculate the number of blocks accessed (read and written) in the following cases:

1. Searching for book ID 1000.
```	      ordered on book ID, so can use binary search
book ID is primary key, so only a single match

read log2(b) blocks, rounded up: 11
```
2. Searching for books with the title "Foreigner".
```	      unordered on title, so must scan from the beginning of the file
title is not unique, so may be multiple matches

```
```	      unordered on publisher, so must scan from the beginning of the file
publisher is not unique, so may be multiple matches

```
4. Searching for books with ID >= 15000.
```	      ordered on book ID, so can use binary search
book ID is primary key, so only one record per value
range search

log2(b) blocks = 11 (round up) blocks to locate book ID 15000

expected number of matches: 5000 because book ID values are 1-20000 and book IDs are unique
... -1 because one of those was read as the last one in the binary search

total: 11+500-1 = 510 blocks read
```
5. Assume an ordered file organization with publisher as the ordering field. Calculate the number of blocks accessed (read and written) in the following cases:

1. Searching for book ID 1000.
```	      unordered on book ID, so must scan from the beginning of the file
book ID is key, so only one match

average: b/2 = 1000 blocks read
worst case: b = 2000 blocks read
```
2. Searching for books with the title "Foreigner".
```	      unordered on title, so must scan from the beginning of the file
title is not unique, so may be multiple matches

```
```	      ordered on publisher, so can use binary search
publisher is not unique, so may be multiple matches

log2(b) blocks = 11 (round up) blocks to locate book ID 15000

expected number of matches: r/# distinct values = 20000/50 = 400
... -1 because one of these was read as the last one in the binary search

total blocks read: 11+40-1 = 50
```
4. Searching for books with ID >= 15000.
```	      unordered on book ID, so must scan from the beginning of the file
range search, so may be multiple matches

```
6. Assume an hashed file organization with book ID as the hash field. Calculate the number of blocks accessed (read and written) in the following cases:

1. Searching for book ID 1000.
```	      hashed on book ID
book ID is key, so only one match

```
```	      unordered on publisher, so must scan from the beginning of the file
publisher is not unique, so may be multiple matches

```
3. Searching for books with ID >= 15000.
```	      hashed on book ID
book ID is primary key, so only one record per value
range search

expected number of values: 5000 because book IDs are unique and range 1-20000

total blocks read: 2000 (with 5000 book ID values to check, better to just read the whole file)
```
4. Searching for books with ID >= 19950.
```	      hashed on book ID
book ID is primary key, so only one record per value
range search

expected number of values: 50 because book IDs are unique and range 1-20000

total blocks read: 50 (look up each value 19950-20000)
```
2. Use the same file properties as the previous question. A block pointer is 6 bytes long.

1. Assume there is a primary index on Book_id. Calculate the number of block accesses needed to search for and retrieve a record given its Book_id.

```	  A primary index has one index record per data block, thus ri = b = 2000 (as calculated in the previous problem).

An index record is 4+6 = 10 bytes (Ri), and bfri = floor(B/Ri) = 102.

There are thus bi = ceiling(ri/bfri) = 20 index blocks.
Binary search on the index requires reading ceiling(log2(bi)) = 5 blocks.
An equality search on a primary index has one match, requiring 1 data block to be read.

Total blocks read = 5+1 = 6.
```
2. Assume there is a clustering index on Publisher_name. Calculate the number of block accesses needed to search for and retrieve records for a particular Publisher_name.

```	  A clustering index has one index record per distinct value, thus ri = 50 (because there are 50 distinct publisher names).

An index record is 45+6 = 51 bytes (Ri), and bfri = floor(B/Ri) = 20.

There are thus bi = ceiling(ri/bfri) = 3 index blocks.
Binary search on the index requires reading ceiling(log2(bi)) = 2 blocks.
An equality search on a clustering index may have r/d = 400 matches.  They will be consecutive in the file,
so ceiling(400/bfr) = 40 data blocks are read.

Total blocks read = 2+40 = 42.
```
3. Assume there is a clustering index on Title. Calculate the number of block accesses needed to search for and retrieve records with titles starting with A-D.

```	  A clustering index has one index record per distinct value, thus ri = 19950 (because there are 19950 distinct titles).

An index record is 45+6 = 51 bytes (Ri), and bfri = floor(B/Ri) = 20.

There are thus bi = ceiling(ri/bfri) = 998 index blocks.
Binary search on the index requires reading ceiling(log2(bi)) = 10 blocks.
A range search will have s matches.  Assuming titles are equally likely to start with each letter (not a
terribly accurate assumption), it is expected that s = r*4/26 = 3077.  The matches will be consecutive in the file,
so ceiling(3077/bfr) = 308 data blocks are read.

Total blocks read = 10+308 = 318.
```
4. Assume there is a secondary index on Publisher_name. Calculate the number of block accesses needed to search for and retrieve records for a particular Publisher_name.

```	  A secondary index has one index record per distinct value (assuming option 3), thus ri = 50 (because there are 50 distinct publisher names).

An index record is 45+6 = 51 bytes (Ri), and bfri = floor(B/Ri) = 20.

There are thus bi = ceiling(ri/bfri) = 3 index blocks.
Binary search on the index requires reading ceiling(log2(bi)) = 2 blocks.

The blocking factor for indirect blocks is ceiling(B/6) = 170.  There are s = r/d = 400 matches per publisher name,
so there are ceiling(400/170) = 3 indirect blocks per index record.

s matches per publisher name means s = 400 data blocks read.

Total blocks read = 2+3+400 = 405.
```
5. For each of the preceding problems, how many levels would be needed if a multilevel index was used?

```	  (a) - the first level index has 20 blocks.  The second level index is a primary index with one index record per
first-level block (20), Ri = 10 and bfri = 102.  Only one block is needed for the second level index, so only two levels are needed.

(b) - the first level index has 3 blocks.  The second level index is a primary index with one index record per
first-level block (3), Ri = 51 and bfri = 20.  Only one block is needed for the second level index, so only two levels are needed.

(c) - the first level index has 998 blocks.  The second level index is a primary index with one index record per
first-level block (998), Ri = 51 and bfri = 20.  50 blocks are needed for the second level index.  The third level index is a
primary index with one index record per second-level block (50) and bfri = 20, so 3 blocks are needed.  The fourth level index
is a primary index with one index record per third-level block (3) and bfri = 20, so only 1 blocks is needed.  There are four levels total.

(d) - same as (b).
```

 last updated: --Mon Nov 20 01:02:20 EST 2017-- page owned by: bridgeman@hws.edu