How do I best learn DBMS SLIDES

Part 1: Basics of database design, ER model, normalization

Basic course databases: basics and design of database models

A database system is part of almost every business application. To design these applications, it is necessary to have sufficient knowledge of the design of databases. The basics were usually taught during studies or training and perhaps refreshed through seminars and courses. The overall context can easily be lost. Keywords are: relational database, normalization, ER model and SQL.

Basic course databases - part 1

When developing software, a database system usually has to be integrated. In the ideal case, an existing database can be used and only the connection to this has to be implemented. On the other hand, the scope of the order can also include building up the database from the first step. Knowledge of database design is required in both cases.

As is so often the case, the motto is not to start creating the tables straight away, but to first draft a concept in peace with pencil and paper (if you prefer to do it on the computer, that is of course also possible). We try to give you the tools you need in a four-part tutorial. For the newcomer it is a good guide to acquire the knowledge, for the experienced developer the chance to close one or the other gap. Let's start with an overview of the stages of database development.

Basic course databases - overview

Database development phases

A database cannot be set up in one step. To do this, several phases must be run through (Fig. 1):

  • External phase: Determination of the information structure: The database should represent an excerpt from the real world (also referred to as "mini world") in the computer. This mapping is done through the description of the data. For this purpose, the information needs of the users must be determined and structured. The result of this first step - also known as a specification and requirements analysis - is an informal description of the technical problem.
  • Conceptual phase: Establishment of the semantic model: The aim of the conceptual design is the formalized description of the facts under consideration. There are various approaches to generating such an overall view. The best-known model is the so-called Entity Relationship Model (ER model). An explanation of this ER model follows later. The result of this step is the technical concept of the database.
  • Logical phase: Creation of the logical data model: The aim is to transfer the semantic data model into a logical data model, e.g. B. in a relational data model (tabular form). This phase comprises two steps: In the first step, the conceptual schema (ER model) must be transformed into the database schema. This step can be automated using software. In the second step, the relational schema is optimized, e.g. B. the implementation of a normalization of the table structure (section: "Optimization: normalization of the data model").
  • Physical phase: Implementation of the database: At the end of this phase the empty database should exist. For this purpose, the logical model should be translated into a specific database schema using a data definition language (e.g. SQL). Data types, value ranges, relations and views must be specified.

Excursus: It depends on the type - (standard) data types
The data type must be specified for each attribute of a table. The available data types differ between the individual databases. Therefore, full compatibility cannot be guaranteed. This is particularly important when changing the database. It must be ensured that the import of external data does not lead to data loss or errors in the data. The data transfer is particularly attentive to "non-standardized" data types, e.g. B. Understand date values. Table 1 gives an overview of important data types that are actually available in every database.

Relational data model

There are different data models (Fig. 2). The starting point was the hierarchical data model, which today is rarely used in database management systems (DBMS) due to its limited applicability. The network model is used for large amounts of data in the mainframe area. The most widely used model is the relational data model. The relational model is based on the following elements:

  • Tables (relations)
  • Operators, d. H. Arithmetic operations for tables
  • Rules, d. H. Conditions on the values ​​of certain columns

Building a table for a relational database can Figure 3 can be removed. The following properties apply to a relation [1]:

  1. No tuple occurs twice, i.e. H. at no point in time are there two tuples that have the same value.
  2. The tuple order is not defined; H. you cannot rely on a particular order of the rows in the table.
  3. The attribute order is not defined (order of the columns).
  4. Attribute values ​​are atomic. The values ​​of an attribute are subject to a domain, i. H. they come from a range of values. Since all elements of a domain are atomic, this also applies to attributes.

The most important operations with relations (relational algebra) that a database management system must provide are the following:

  • Selection of lines
  • Selection of columns
  • Joining tables
  • Association of tables

This is exemplified in Figure 4 explained. Rules or constraints ensure that the database does not contain contradicting data. Local constraints apply to one table at a time. For example, the following applies to the primary key: Each value in this column can only appear once. In contrast, global constraints (referential integrity) apply to multiple tables. An example of global integrity: for every value that is a foreign key in a table, there is exactly one value in another table and is there the primary key (Fig. 5).

The Entity Relationship Model (ER model)

The scheme in which a database structure is designed is called the Entity Relationship Model (ER). The basic table and relationship structure of a database is designed and mapped in an ER model. The representation in the ER model (graphical) and the mapping of the database structure with the help of tables are equivalent to one another, i.e. H. a mutual transfer is possible (and is also carried out automatically by development environments). In order to be able to work with the ER model, some terms have to be defined:

  • Attributes: These are the fields of a data record. In the graphic model, the properties are shown as rectangles with rounded corners.
  • Entity: To put it very simply, it is an object of the respective table, i.e. a data record. The entities differ from one another through different values ​​of the properties. The illustration is made by a rectangle.
  • Entity sets: Several entities with the same attributes (not attribute values) are combined to form an entity set. An entity set corresponds to a table in the database.
  • Relationships: The individual tables are not separate from one another. Links are established between the entities. A relationship can be assigned to what is known as a relationship type. The graphical visualization in the ER model takes place by means of diamonds that are connected to the associated entities.

The most important relationship types are: 1:1, 1: n and n: m-Relationship:

  • 1: 1 relationship: There is exactly one data record in Table 2 for each data record in Table 1. In theory, this type of relationship indicates that the tables can be merged. However, there can be reasons that one 1:1- Justify the relationship, such as security and performance. If certain data are to be specially protected against access, they can be stored in a separate table. Entire tables can be better protected with regard to user rights than individual table columns. With very large databases, it can make sense to outsource less used data. This increases the speed of access to the remaining data.
  • 1: n relationship: These are characterized by the fact that there is any number of (0 ... n) data records in the other table for a data record in one table. An example: the relationship between the tables Lecture hall and students. There can be no (0), one (1) or several (n) students in a lecture hall.
  • many-to-many relationship: 0..m data records in table 2 are assigned to each data record from table 1. On the other hand, each data record from table 2 is assigned 0..n data records in table 1. Again an example: Table 1 contains the information about the students and Table 2 the information about the available courses at the university. Each student can for his part take no, one or more courses. On the other hand, no, one or more students can be assigned to a course. Such relationships must be resolved using an auxiliary table.

There are several ways that relationship types are represented in an ER diagram. The original notation for ER models is the Chen notation [3]; it has been replaced by the MIN-MAX notation because it only allows limited statements about a relationship. In MIN-MAX notation, an ordered pair with a minimum and a maximum value is specified for each entity type involved in a relationship. The UML notation can also be used for ER models. Figure 6 shows the same state of affairs in both the Chen and the MIN-MAX notation.

Key questions and other basic concepts

The theory of databases also includes an explanation of the following terms: Primary key, Secondary key, Foreign key and Zero values. An entity can be uniquely identified with the help of the primary key (Latin: primus: the first). It can be composed of several attributes. Primary keys are underlined in the graphic representation by underlining the respective property. A secondary key helps to find records in a table more quickly. If no secondary index is created and a specific attribute value has to be searched for within a table, then all data records are to be searched sequentially (one after the other). In the case of large databases, this can mean very long response times. If a secondary index is created, the database system creates a new index table for this purpose, which is sorted according to the desired criterion. For a search, access to this index table is sufficient, where a cross-reference is made to the actual data record in the main table. The disadvantage is that creating a large number of secondary indexes means a large increase in storage space requirements. The following applies here: Secondary indices are to be created for frequently used search queries; in terms of storage space, they are to be dispensed with for more rare evaluations. As a rule, database systems also allow the subsequent creation of secondary indices, so that new search requests for the database can also be incorporated later.

A foreign key is a key that is stored in the current table but refers to a key in a "foreign" table (usually the primary key of the foreign table). The concept of the zero value is also of particular importance. Null values ​​are (intentionally) missing values ​​related to a property in a table. It has to be determined how it is dealt with. In the case of numeric data fields in particular, a distinction must be made between a zero value and a value of 0 (nothing is not zero!). In connection with zero values, it must be determined for the individual database fields whether data entry is mandatory (mandatory field) or whether the entry is optional (optional field).

Optimization: normalization of the data model

An essential step in defining the database structure is called normalization. Redundancies in the data are eliminated. This process is described below using an example. When developing a database, one of the first steps is to establish the basic table and relationship structures. Particular care must be taken here, as errors in this phase can only be eliminated later with a great deal of effort (and expense). In the worst case, data inconsistencies or data loss can occur. Usually there are data collections or there are first drafts of which data are to be managed. As a rule, however, these are not in the form required for a database. When defining the database structure, one of the most important steps is what is known as normalization. This serves to avoid redundancies. Redundancy means that the same information is available several times. This can lead to so-called anomalies. H. conflicting database contents. A simple example: A database not only stores the customer number for an order, but also the customer's address data. If the customer has ordered several orders, this information is available multiple (redundant). The problem arises when a change is made to customer data. Then the address data of all data records of the customer have to be adjusted. It is obvious that something can be forgotten in the process. It would be better to split up the data structure and manage the customer data separately from the order information. In order to resolve such relationships when creating the database structure, the table structure is normalized. The normalization process is used to simplify the maintenance of a database and to ensure the consistency of the data.

The normalization process is to be demonstrated with the help of a simple example. However, the process can be transferred to more complex contexts. Only the number of attributes increases. After the data collection, the data is available in the "raw form" (Fig. 7). It is typical that attributes contain several values ​​- Course no and description each contain multiple values. The first thing to do is to eliminate it. To form the first normal form, the so-called non-atomic attributes have to be resolved. To do this, new rows (columns) are inserted into the table (Fig. 8). The following applies: The first normal form (1NF) occurs when only atomic (simple) characteristic values ​​are available.

A first problem in the database structure has been eliminated, but there are still some disadvantages: The redundancy has increased, for example the table for student number 1 now has 3 data records that differ only in a few properties. If a change is necessary, all data records must be adapted. The previous primary key is still sufficient (Student No.) no longer for clear identification. The new primary key is Student No. and Course no. It is a composite primary key. A new goal is to achieve the second normal form.

A relation is in the second normal form (2NF) if it is in the first normal form and every non-key attribute is fully functionally dependent on the primary key.

How can one now determine whether this condition has been violated? If an attribute that does not belong to the key is uniquely identified by part of the key, then there is no 2NF. In our example, the attributes of the Designation column are unambiguous thanks to the partial key attribute Course no defined, d. H. specifying a Course no enough to get the description to obtain. If the primary key consists of only one attribute, the 2NF is already available. If the condition is (still) violated, a new relation must be created. This contains the relevant partial key attribute and all non-key attributes that are dependent on this partial key. In the new relation, the relevant partial key attribute becomes the primary key, and the outsourced non-key attributes are deleted in the original relation. It sounds more complicated than it is. For our example, the steps mentioned have been implemented (Fig. 9). The 2NF now has several relations (tables). From the relation college student are the attributes Course no and designation in its own relation Courses outsourced. The primary key in this relation is the attribute Course no. The connection between the two relations college student and Courses becomes by means of the relation Occupancy produced.

Even now we have not yet eliminated all redundancies. A look at the relationship college student confirms that: the characteristics Subject area - number of semesters occur several times in combination with the same characteristic values. If, for example, the number of semesters for the business administration degree needs to be changed, several data records still need to be adjusted. To prevent this, the transition to the third normal form must be carried out. This is defined as follows: The relation is in the third normal form (3NF) when it is in the 2NF and each non-key attribute is not transitively dependent on the primary key, i.e. H. no other non-key attribute follows from any non-key attribute. In relation to our example and the relation college student applies:

StudentNr → Name

StudentNr → first name

StudentNr → date of birth

StudentNr → field of study

StudentNr → number of semesters

This means that all other attributes follow from the primary key. However, the following also applies: Subject → number of semesters.

The number of semester however, already follows from the non-key attribute Subject area. There is a so-called transitive dependency that needs to be eliminated. Another relation is created for this. This contains the attributes Subject area and Number of semesters. The attribute Subject area becomes the primary key in this relation. Figure 10 shows reaching the 3NF.

After this step, all tables in the database (college student, Education, Courses and Occupancy) the 3NF. It is also said that the database is normalized. In theory, there are still definitions for higher-order normal forms, but for most practical applications, reaching the 3NF should be considered sufficient. If the normalization process is continued too far (over-normalization), the complexity increases unnecessarily. The response behavior also deteriorates, since a large number of key references (foreign keys) first have to be evaluated for queries and the actual information is scattered over too many tables. The following therefore applies: With the help of normalization, a good compromise between system performance and freedom from redundancy should be sought. The use of tools from database systems occasionally leads to the over-normalization described. It is the task of the database designer to recognize this and to "slow down" the work of these tools!

Basic course databases: conclusion and outlook

The article gave an introduction to the basics of database development. We are far from the end of this. When working with databases, the query language Structured Query Language (SQL) plays an essential role. This means that all operations - from creating the database structure to targeted searches in the database - can be carried out. Part 2 will give an overview of SQL and show how you can work with it using practical examples.

Our editorial team recommends: