Uploaded by norah khan

cs4347

advertisement
1. A clustering index is used for numerous data pieces that have the same value for the
ordering field.
A primary index has records that are the same fixed length; it always has the
same first field called the primary key; there is one index entry for every block of
data.
- Clustering is used when there is not a distinct size like the primary key. It has two
fields in the ordered file; a disk pointer and a clustering index.
- A secondary index is only used when there is already some way to primarily
access the data. Its also an ordered file with two fields. Several secondary
indexes can exist in one data set.
2.
0. Fd: propertyid -> country name, lot#, area, price, tax_rate
- Area is common
1. Countryname, lot# -> propertyid, area, price, tax_rate
2. Countryname -> tax_rate
3.Area -> price
● Lossless join have at least 1 key relation
● LOTS1A and LOTS1B has common attribute 'Area' , but 'Area' is super key in
LOTS1B and LOTS1A and LOTS2 has common attribute 'Country_name', it
is superkey in LOTS2 , hence it is lossless join decomposition .
3. A set of inferences that is complete means that other logical conclusions cna come
from it. The soundness means that you cannot derive dependencies that are not sound/
that do not hold true.
“By sound, we mean that given a set of functional dependencies F specified on a relation
schema R, any dependency that we can infer from F by using IR1 through IR3
holds in every relation state r of R that satisfies the dependencies in F. By complete,
we mean that using IR1 through IR3 repeatedly to infer dependencies until no more
dependencies can be inferred results in the complete set of all possible dependencies
that can be inferred from F.”
4. ABD is a candidate key because there is no subset of the attributes that is key. Ab is
not a candidate key.
5. All the attributes are atomic so it is in 1nf currently. It isnt 2nf because there is a
dependency between key and non key (in this case salesman and commission). The is
not a 3nf relation because theres dependencies that are transiitive.
2nf:
CAR_SALE1(Car#, Salesman#, DateSold, DiscountAmount)
CAR_SALE2(Salesman#, Commission%)
3nf:
CAR_SALES1A(Car#, Salesman#, DateSold)
CAR_SALES1B(DateSold, DiscountAmount)
CAR_SALE3(Salesman#, Commission%)
**6. Find left and right redundancies; find extraneous attributes there is no
redundancies so: minimal cover is E ={P QRS, RS T}
7a. It is easy to insert and retrieve ; can take time and cause blanks
7b. More efficient because of order ; difficult to rearrange
7c. Good with lots of data because it is fast; there is a fixed number of buckets which is
an issue if the system gets bigger or smaller
- Hashing is the most difficult and expensive to impelement
8. There is the functional dependency of tripid and start date
- Cities and cards visited can be used multiple times
- The MVDS are:
- (Start_date --> Cities_visited | Cards_used) (Trip_id --> Cities_visited | Cards_used)
- There are no independencies in the relation so its 4nf ]
9a. M doesnt determine y and p - M Y does - M C doesnt determine y or p
9b. Not in bcnf or 3nf because the two FD m and mp are not superkeys, it is niot 2nf either so it is
1nf . It is lossless
Download