Basic Concepts that every Data Modeler should know
Data modeling is an art as well as science. Data modeling is very important in the initial stage of the project. A good data model is assets for the project while the bad data model can become the liability for the project
Peter Chen has started data modeling with his 1970 paper. He has discussed the basic concept of the Entity-relationship model. Thanks to Peter Chen we are living in a much better world now. In simple terms, the entity is a table containing the data and attributes are the fields in the table.
Entity:- In simple terms, the entity is a table containing the data and attributes are the fields in the table. There are two different type of entity.
a. Strong Entity:- A strong entity is an independent entity in the schema. Strong entities are represented by a rectangle box
b. Weak entity:- A weak entity is a dependent entity. Its existence depends on the strong entity. Weak entities are represented by a double rectangle box
Relationship:- Relationship is the heart of data modeling. The entity is related to other and different type of relationship is defined using different diagrams.
Entity keys:- Keys are ways to uniquely identify the candidate (record) in the dataset. The keys are different for a different candidate in the entity. There are two popular keys. We can represnt the keys by a simple key symbol
i. Primary Key:- The key is chosen by the data modeler to uniquely identify the candidate in the entity.
ii. Foreign Key:- Use to identify the relationship with a primary key
Hierarchy: Hierarchy is like creating subcategories. When two entities have more common attributes than different attributes, we use hierarchy.If the organization is having three types of employees permanent, contract and intern, we can hierarchy to make modeling simple inspite of creating seperate entities. In the below diagram, there are three different entity (Permanent, Contract, Intern) which are the sub type of Employees.
Cardinality:- Basically it tells how two entities have a relationship in numerical terms
a. One to one: When every candidate of the first entity is connected to the only one candidate of the second entity.
b. One to many:- When every candidate of the first entity is connected to the one or more candidate of the second entity.
c. Mandatory relationship:- When every candidate of the first entity is connected to at least one candidate of the second entity. In general, a solid line represents the mandatory relationship
d. Optional relationship:- When is the relationship between the first entity is not required with the second entity. A dashed line represents the optional relationship
e. Many to Many:- When one or more candidate from the first entity can be connected to one or more candidate from the second entity
Permissible Values:- Very useful technique to limit the data inside the fields. Permissible values are based on business context. Assume an attribute “date of birth” in the Employee table. We can limit this by putting condition like date of birth should not greater than today’s date