Objective
Record information:
- User information: name, address
- Oyster card information: card id, create date, card type, balance
- Trips: travel from, touch in date and time, travel to, touch out date and time, fare Dummy data
3rd Normal Form is used below to normalise the data. It is a 3 steps process, each step depends on the completion of the previous step and progressively transforming the data.
1st Normal Form: Atomicity
The principle of 1st normal form is each field should only have singular value (avoid delimiters). From the data above, the address column and trips column are not atomic.
For the address column, it can be consistently split into 3 columns: street, city, postcode.
For the trips column, it will be hard to split into fixed amount of columns, as the trips are ever increasing. Therefore, it needs more flexibility. The solution is to split the trips into different rows, so it can be flexibly growing (insert a new row when there is a new trip occurred). Then split each trip information into multiple columns: entry_station, entry_time, exit_station, exit_time and fare.
After the 1st normal form transformation, the table will look like below:
trip
2nd Normal Form: No partial dependency
Firstly, let’s decide what is the primary key for the table above, which can uniquely distinguish one row from another. Intuitively, it seems should be each trip. However, there is no one individual column which can identify a trip, so we produce a surrogate key trip_id, and allocate unique ids to each trip.
The principle of 2nd normal form is no partial dependency, which means each column should be direct attribute of the primary key. Oyster card (oyster_card_id), entry_station, entry_time, exit_station, exit_time and fare are attributes of a trip. However, type, created_at, name, street, city, postcode seem more related to oyster card rather than the trip. We shall take them out of the trip table, create an oyster card table for them. Furthermore, street, city and postcode are attributes of the person who owns the oyster card, rather than attribute of oyster card, we can further take them out and put them into a person table.
After the transformation, we will have 3 tables:
trip
oyster_card
*Notice the balance is now the snapshot of the latest (historic balance not required).
person
3rd Normal Form: No transitive dependency
No transitive dependency means each column should be an independent attribute of the table, not multiple columns form an attribute. From above, the street, city, postcode are part of an address. They should be moved to a separate address table.
After the 3rd normal form transformation, the person table is replaced by two tables:
person
address
Now, we have our ER model.
There are further normalisation techniques available, i.e. Boyce–Codd normal form (3.5 normal form), 4th normal form, 5th normal form… Usually, 3rd normal form can get our operational database into a good state. Also, people rarely need to follow each step 100% in reality. The more normalised the data, the more joins needed, the less comprehensible the schema becomes.