For each of the following scenarios, compute the cost of the execution plan shown in terms of the number of blocks read/written. Show your work - it should be clear how you calculated the result, including what formulas you used. Assume 5 blocks of memory are available.

  • For the SELECTs, use an index if there is one, otherwise exploit the file ordering if it is relevant, otherwise use SL.
  • For the CROSS PRODUCTs, use JNL.

No Pipelining

Without pipelining, results are written out after every operation.

σBdate>'1957-12-31' SL (EMPLOYEE is ordered by Ssn and there is no index on Bdate) SL cost: b = 2000 blocks read
selectivity sl = 12/45 (condition covers 12 years of the 45-year range; assuming uniform distribution)
selection cardinality s = sl*r = (12/45)*10000 = 2600 rows selected
blocks written = ⌈(rows/bfrEMPLOYEE)⌉ = ⌈(rows/(r/b))⌉ = ⌈(2600/(10000/2000))⌉ = 520
σBdate>'1957-12-31' × WORKS_ON JNL JNL cost: bR+(bRbS)/(n-2) = 520+(520*1200)/(5-2) = 208520 blocks read
selection cardinality = |R||S| = 2600*12000 = 31200000 rows result
blocking factor for the combined rows bfrEMPLOYEE,WORKS_ON = ⌈bfrEMPLOYEEbfrWORKS_ON/(bfrEMPLOYEE+bfrWORKS_ON)⌉ = ⌈(10000/2000)*(12000/1200)/((10000/2000)+(12000/1200))⌉ = 3
blocks written = ⌈(rows/bfrEMPLOYEE,WORKS_ON)⌉ = ⌈31200000/3⌉ = 10400000
σEssn=Ssn SL (condition doesn't involve a constant value, so binary search or an index wouldn't help even if available) SL cost: b (may be many matches) = 10400000 blocks read
join selectivity js = min(sB/|S|,sA/|R|) = min((12000/10000)/10000,1/2600) = 1/2600 (this condition is really a join condition rather than a equality or range condition, so we consider the inputs to the preceding CROSS PRODUCT to determine selectivity - Ssn is PK for EMPLOYEE, but Essn is only part of a PK for WORKS_ON and thus not a key by itself)
selection cardinality s = js*r = (1/2600)*31200000 = 12000 rows selected
blocks written = ⌈(rows/bfrEMPLOYEE,WORKS_ON)⌉ = ⌈12000/3⌉ = 4000
σPname='Aquarius' SL SL cost: b (Pname is not known to be UNIQUE) = 100 blocks read
selectivity sl = 1/d = 1/1800
selection cardinality s = sl*r = (1/1800)*2000 = 2 rows selected (round up)
blocks written = ⌈(rows/bfrPROJECT)⌉ = ⌈(rows/(r/b))⌉ = ⌈(2/(2000/100))⌉ = 1
σEssn=Ssn × σPname='Aquarius' JNL JNL cost: bR+(bRbS)/(n-2) = 400+(400*1)/(5-2) = 534 blocks read
selection cardinality = |R||S| = 12000*2 = 24000 rows result
blocking factor for the combined rows bfrEMPLOYEE,WORKS_ON,PROJECT = ⌈bfrEMPLOYEE,WORKS_ONbfrPROJECT/(bfrEMPLOYEE,WORKS_ON+bfrPROJECT)⌉ = ⌈3*(2000/100)/(3+(2000/100))⌉ = 3
blocks written = ⌈(rows/bfrEMPLOYEE,WORKS_ON,PROJECT)⌉ = ⌈24000/3⌉ = 12000
σPnumber=Pno SL (condition doesn't involve a constant value, so binary search or an index wouldn't help even if available) SL cost: b (may be many matches) = 12000 blocks read
join selectivity js = min(sB/|S|,sA/|R|) = min(1/2,(12000/2000)/12000) = 1/2000 (this condition is really a join condition rather than a equality or range condition, so we consider the inputs to the preceding CROSS PRODUCT to determine selectivity - Pnumber is PK for PROJECT, but there are 12000/2000 rows in WORKS_ON for each Pno)
selection cardinality s = js*r = (1/2000)*24000 = 12 rows selected
blocks written = ⌈(rows/bfrEMPLOYEE,WORKS_ON,PROJECT)⌉ = ⌈12/3⌉ = 4
πLname cost: b = 4 blocks read
we don't know the size of the Lname column and thus can't compute the blocking factor for the final set of rows written, but we do know that one Lname record doesn't take up more space than an entire EMPLOYEE record so bfrEMPLOYEE can be used as a worst-case estimate
blocks written: ⌈(rows/bfrLname)⌉ ≤ ⌈(rows/bfrEMPLOYEE)⌉ = ⌈12/(10000/2000)⌉ = 3
total blocks read: 2000+208520+10400000+100+534+12000+4 = 10623158
blocks written: 520+10400000+4000+1+12000+4+3 = 10416528
total: 21039686 blocks read/written

With Pipelining

With pipelining, intermediate results are only written if necessary. PROJECT, SL, and the left side of JNL and JSL can be pipelined.

Differences from the unpipelined case are underlined italics.

σBdate>'1957-12-31' SL (EMPLOYEE is ordered by Ssn and there is no index on Bdate) SL cost: b = 2000 blocks read
selectivity sl = 12/45 (condition covers 12 years of the 45-year range; assuming uniform distribution) selection cardinality s = sl*r = (12/45)*10000 = 2600 rows selected
blocks produced = ⌈(rows/bfrEMPLOYEE)⌉ = ⌈(rows/(r/b))⌉ = ⌈(2600/(10000/2000))⌉ = 520
blocks written = 0 (pipelined)
σBdate>'1957-12-31' × WORKS_ON JNL JNL cost: bR+(bRbS)/(n-2) = 0+(520*1200)/(5-2) = 208000 blocks read (R does not need to be read, but S is still read bR/(n-2) times)
selection cardinality = |R||S| = 2600*12000 = 31200000 rows result
blocking factor for the combined rows bfrEMPLOYEE,WORKS_ON = ⌈bfrEMPLOYEEbfrWORKS_ON/(bfrEMPLOYEE+bfrWORKS_ON)⌉ = ⌈(10000/2000)*(12000/1200)/((10000/2000)+(12000/1200))⌉ = 3
blocks produced = ⌈(rows/bfrEMPLOYEE,WORKS_ON)⌉ = ⌈31200000/3⌉ = 10400000
blocks written = 0 (pipelined)
σEssn=Ssn SL (pipelined, so can't binary search or use index even if otherwise applicable) SL cost: 0 (pipelined)
join selectivity js = min(sB/|S|,sA/|R|) = min((12000/10000)/10000,1/2600) = 1/2600 (this condition is really a join condition rather than a equality or range condition, so we consider the inputs to the preceding CROSS PRODUCT to determine selectivity - Ssn is PK for EMPLOYEE, but Essn is only part of a PK for WORKS_ON and thus not a key by itself)
selection cardinality s = js*r = (1/2600)*31200000 = 12000 rows selected
blocks produced = ⌈(rows/bfrEMPLOYEE,WORKS_ON,PROJECT)⌉ = ⌈24000/3⌉ = 12000
blocks written = 0 (pipelined)
σPname='Aquarius' SL SL cost: b (Pname is not known to be UNIQUE) = 100 blocks read
selectivity sl = 1/d = 1/1800
selection cardinality s = sl*r = (1/1800)*2000 = 2 rows selected (round up)
blocks produced = ⌈(rows/bfrPROJECT)⌉ = ⌈(rows/(r/b))⌉ = ⌈(2/(2000/100))⌉ = 1
blocks written: 1 (can't pipline right side of JNL)
σEssn=Ssn × σPname='Aquarius' JNL JNL cost: bR+(bRbS)/(n-2) = 0+(400*1)/(5-2) = 534 blocks read (R does not need to be read, but S is still read bR/(n-2) times)
selection cardinality = |R||S| = 12000*2 = 24000 rows result
blocking factor for the combined rows bfrEMPLOYEE,WORKS_ON,PROJECT = ⌈bfrEMPLOYEE,WORKS_ONbfrPROJECT/(bfrEMPLOYEE,WORKS_ON+bfrPROJECT)⌉ = ⌈3*(2000/100)/(3+(2000/100))⌉ = 3
blocks produced = ⌈(rows/bfrEMPLOYEE,WORKS_ON,PROJECT)⌉ = ⌈24000/3⌉ = 12000
blocks written = 0 (pipelined)
σPnumber=Pno SL (pipelined, so can't binary search or use index even if otherwise applicable) SL cost: 0 (pipelined)
join selectivity js = min(sB/|S|,sA/|R|) = min(1/2,(12000/2000)/12000) = 1/2000 (this condition is really a join condition rather than a equality or range condition, so we consider the inputs to the preceding CROSS PRODUCT to determine selectivity - Pnumber is PK for PROJECT, but there are 12000/2000 rows in WORKS_ON for each Pno)
selection cardinality s = js*r = (1/2000)*24000 = 12 rows selected
blocks produced = ⌈(rows/bfrEMPLOYEE,WORKS_ON,PROJECT)⌉ = ⌈12/3⌉ = 4
blocks written = 0 (pipelined)
πLname cost: 0 (pipelined)
we don't know the size of the Lname column and thus can't compute the blocking factor for the final set of rows written, but we do know that one Lname record doesn't take up more space than an entire EMPLOYEE record so bfrEMPLOYEE can be used as a worst-case estimate
blocks written: ⌈(rows/bfrLname)⌉ ≤ ⌈(rows/bfrEMPLOYEE)⌉ = ⌈12/(10000/2000)⌉ = 3 blocks written
total blocks read: 2000+208000+0+100+534+0+0 = 210634
blocks written: 0+0+0+1+0+0+3 = 4
total: 210638 blocks read/written (pipelining can make a big difference!)