Computer Science
Database Normalisation
Database Normalisation is a process of organizing data in a database to reduce redundancy and dependency. It involves breaking down a large table into smaller tables and defining relationships between them. This helps to improve data consistency, reduce data duplication, and increase database efficiency.
Written by Perlego with AI-assistance
Related key terms
1 of 5
12 Key excerpts on "Database Normalisation"
- eBook - PDF
- Jocelyn O. Padallan(Author)
- 2020(Publication Date)
- Arcler Press(Publisher)
Normalization refers to a technique for deciding which attributes should be grouped together in a relation. It is a tool to improve and validate a logical design, so that it satisfies various challenges that avoid redundancy of data. In addition, Normalization refers to the process of decomposing relations with anomalies in order to produce meager, well-organized relations. Thus, it can be said that in normalization process, a relation with redundancy can be refined by replacing it or decomposing it with smaller relations that contain the identical information, but without redundancy. In this chapter, the major area of focus is on studying the informal design guidelines for relation schemas, functional dependency as well as its types. 7.6. DE NORMALIZATION It is an optimization technique used in the databases in which the redundant data is added to the tables. Costly joins are reduced to be used in relational databases with the help of renormalization. DE normalization is not a process of not performing normalization but it is a process performed after the normalization for database optimization. Normalization converts the tables into more efficient ones and it decomposes one table in more tables with one form of data present in every Distributed Database Architecture 190 single table. But with normalization one issue arises that the issue is that with an increased number of tables the number of joins also increases. The issue with joins is that it has a severe impact on the performance of the database. Thus, in order to deal with this issue renormalization is performed that helps in the process of optimizing the database. It does so by adding more redundant data and also data is grouped together in a database. When the user requires to extract some information from the database then they provide queries to the database and in turn, the database provides the users with the required results. - eBook - PDF
- Gavin JT Powell(Author)
- 2011(Publication Date)
- Digital Press(Publisher)
The sequence of steps involved in the normalization process is called normal forms . Essentially, normal forms applied during a process of normalization allow creation of a relational database model as a step-by-step progression. Tuning of a database installation, and particularly, making SQL coding perform well, is heavily dependent on effective entity design. So a good understanding of normalization is essential. 4 1.1 The Formal Definition of Normalization The most commonly applied normal forms are first, second, and third normal forms. Additionally, there are the rarely commercially implemented Boyce-Codd, fourth, fifth, and Domain Key normal forms. The normal forms steps are cumulative upon each other. In other words, each one is applied to the previous step, and the next step cannot be applied until the previous one is implemented. The following examples apply: A database model can only have third normal applied when it is in second normal form. If a database model is only in first normal form, then third normal form cannot be applied. That is what is meant by cumulative. The overall result of normalization is removal of duplication and minimizing on redundant chunks of data. The result is better organization and more effective use of physical space, among other factors. Normalization is not always the best solution. In data warehouses, there is often a completely different approach to database mode design. Normal-ization is not the only solution. The formal approach to normalization insists on expecting a designer to apply every normal form layer, in every situation. In a commercial environment this is often overzealous applica-tion of detail. The trouble with the deeper and more precisely refined aspects of normalization is that normalization tends to overdefine itself sim-ply for the sake of defining itself further. - eBook - PDF
- Lee Chao(Author)
- 2013(Publication Date)
- Auerbach Publications(Publisher)
The next section dis-cusses table normalization. 3.3 Normalization A relational database consists of tables and relationships. A properly designed table structure will greatly improve database performance, reduce the size of data storage, and ensure correct database operations. In this section, you will learn about the concepts of table structure and the process of normalizing tables. Figure 3.4 (a) PUBLISHERS table, (b) EMPLOYEES table. 76 ◾ Cloud Database Development and Management 3.3.1 Why Table Normalization A normalization process is a process to eliminate anomalies that occur in a database operation. To see what those anomalies are, let us consider the CLASS_REGISTRATON_INFO table shown in Table 3.1. In Table 3.1, information about students, faculty members, courses, and classes are stored in a single table CLASS_REGISTRATON_INFO. This fact indicates that the primary key has to be a combination of several columns. Indeed, StudentID alone cannot determine the column Course. For example, the value 11 in the StudentID column cannot determine the values in the Course column. The same 11 relates to two different values, Database and E-Commerce. By the same argument, StudentID cannot determine ClassID. On the other hand, the column Course alone cannot determine StudentID and some other columns. Some students may take the same course more than once. In such a case, the values of ClassID cannot be determined by the combination (StudentID, Course). Therefore, the primary key is a combination of three columns (StudentID, Course, ClassID) for the table CLASS_REGISTRATON_INFO. The table displayed in Table 3.1 is in fact a relation. Each cell has only a single data value, the values in each column have the same data type, and there are no duplicated rows. Unfortunately, a table that satisfies the requirements of a relation is far from ideal in a relational database. The table CLASS_REGISTRATON_INFO has a data redundancy problem. - eBook - PDF
- Mark L. Gillenson(Author)
- 2012(Publication Date)
- Wiley(Publisher)
THE DATA NORMALIZATION PROCESS Data normalization was the earliest formalized database design technique and at one time was the starting point for logical database design. Today, with the popularity of the Entity-Relationship model and other such diagramming tools and the ability to convert its diagrams to database structures, data normalization is used more as a check on database structures produced from E-R diagrams than as a The Data Normalization Process 175 FIGURE 7.25 The Lucky Rent-A-Car relational database RENTAL Car Serial Customer Rental Return Total Number Number Date Date Cost CUSTOMER Customer Customer Customer Customer Number Name Address Telephone MAINTENANCE Repair Car Serial Repair Number Number Date Procedure Mileage Time CAR Car Serial Manufacturer Number Model Year Class Name MANUFACTURER Manufactur er Manufacturer Sales Rep Sales Rep Name Country Name Telephone full-scale database design technique. That’s one of the reasons for learning about data normalization. Another reason is that the data normalization process is another way of demonstrating and learning about such important topics as data redundancy, foreign keys, and other ideas that are so central to a solid understanding of database management. Data normalization is a methodology for organizing attributes into tables so that redundancy among the non-key attributes is eliminated. Each of the resultant tables deals with a single data focus, which is just another way of saying that each resultant table will describe a single entity type or a single many-to-many relationship. Furthermore, foreign keys will appear exactly where they are needed. In other words, the output of the data normalization process is a properly structured relational database. Introduction to the Data Normalization Technique The input required by the data normalization process has two parts. - eBook - ePub
- Peter ter Braake(Author)
- 2021(Publication Date)
- Packt Publishing(Publisher)
The final reason for avoiding redundancy is to minimize the risk of inconsistencies in your data. Let's suppose, once again, that addresses are stored in three different locations in the database and a customer moves. You now risk changing the address in two of the locations while forgetting about the third address. You now have two different addresses for the same customer in your database. When you query the database for the customer's address, it now depends on how you write your query as to whether you get the current address or the old address. And different people will write different queries. Your database can no longer be trusted. You will read more about this in the section regarding normalization steps.You may think that the problems may not be that bad. Unfortunately, Murphy's law applies to databases. Everything that can go wrong will go wrong. Redundancy in databases, combined with applications where people working in primary processes enter data in the database, will always lead to data quality issues in the database.Note By avoiding redundancy in a database, an OLTP workload will be faster and cheaper, while, at the same time, the data quality will increase.How to avoid redundancy
After learning why redundancy should be avoided for operational databases, let's now look at how to avoid redundancy. Normalizing data is a process in which we split data from a report (or a form, or a screen of an application) step by step to divide the data elements between different tables. The result of the process of normalization is that we eliminate redundancy.In our everyday processes, we work with information on a constant basis. For example, in a stockroom, people operating forklift trucks use picklists. A teacher of a course might use a presentation list. A call center employee might use a list of people to call. A support engineer might use a list of the most common issues. And when an issue is solved, they need to enter the details pertaining to the problem, the solution, and who the problem belonged to. What all these lists and forms have in common is that they show information coming from a database. It doesn't matter whether you print a list, use an app, or fill out a paper form. The information comes from, or needs to be stored in, a database.In order to design a database, we need to know what data people work with and how they use this data. We need to know the lists and screens and forms that people will work with.Back to normalizing data and redundancy. Data is divided between multiple tables. This leads to "narrow" tables. By "narrow," I mean that a table consists of a relatively small number of columns. If you need to store 100 different pieces of data and you do that in a single table, it will consist of 100 columns. If this data is divided between 10 different tables, you have an average of just 10 columns per table. - eBook - PDF
- Graeme Simsion, Graham Witt(Authors)
- 2004(Publication Date)
- Morgan Kaufmann(Publisher)
Our principal tool is normalization , a set of rules for allocating data to tables in such a way as to eliminate certain types of redundancy and incompleteness. In practice, normalization is usually one of the later activities in a data modeling project, as we cannot start normalizing until we have established what columns (data items) are required. In the approach described in Part 2, normalization is used in the logical database design stage, following requirements analysis and conceptual modeling. We have chosen to introduce normalization at this early stage of the book 1 so that you can get a feeling for what a well-designed logical data model looks like. You will find it much easier to understand (and under-take) the earlier stages of analysis and design if you know what you are working toward. Normalization is one of the most thoroughly researched areas of data modeling, and you will have little trouble finding other texts and papers on the subject. Many take a fairly formal, mathematical approach. Here, we focus more on the steps in the process, what they achieve, and the practi-cal problems you are likely to encounter. We have also highlighted areas of ambiguity and opportunities for choice and creativity. The majority of the chapter is devoted to a rather long example. We encourage you to work through it. By the time you have finished, you will 33 1 Most texts follow the sequence in which activities are performed in practice (as we do in Part 2). However, over many years of teaching data modeling to practitioners and college students, we have found that both groups find it easier to learn the top-down techniques if they have a concrete idea of what a well-structured logical model will look like. See also comments in Chapter 3, Section 3.3.1. have covered virtually all of the issues involved in basic normalization 2 and encountered many of the most important data modeling concepts and terms. - eBook - ePub
Database Modeling and Design
Logical Design
- Toby J. Teorey, Sam S. Lightstone, Tom Nadeau, H.V. Jagadish(Authors)
- 2010(Publication Date)
- Morgan Kaufmann(Publisher)
6 Normalization T his chapter focuses on the fundamentals of normal forms for relational databases and the database design step that normalizes the candidate tables [step II(d) of the database design life cycle]. It also investigates the equivalence between the conceptual data model (e.g., the ER model) and normal forms for tables. As we go through the examples in this chapter, it should become obvious that good, thoughtful design of a conceptual model will result in databases that are either already normalized or can be easily normalized with minor changes. This truth illustrates the beauty of the conceptual modeling approach to database design, in that the experienced relational database designer will develop a natural gravitation toward a normalized model from the beginning. For most database practitioners, Sections 6.1 through 6.4 cover the critical normalization needed for everyday use, through Boyce-Codd normal form (BCNF). Section 6.5 covers the higher normal forms of mostly theoretical interest; however, we do show the equivalency between higher normal forms and ternary relationships in the ER model and UML for the interested reader. 6.1 Fundamentals of Normalization Relational database tables, whether they are derived from ER or UML models, sometimes suffer from some rather serious problems in terms of performance, integrity and maintainability. For example, when the entire database is defined as a single large table, it can result in a large amount of redundant data and lengthy searches for just a small number of target rows. It can also result in long and expensive updates, and deletions in particular can result in the elimination of useful data as an unwanted side effect. Figure 6.1 Single table database Such a situation is shown in Figure 6.1, where products, salespersons, customers, and orders are all stored in a single table called Sales. In this table, we see that certain product and customer information is stored redundantly, wasting storage space - eBook - PDF
Beginning Database Design Solutions
Understanding and Implementing Database Design Concepts for the Cloud and Beyond
- Rod Stephens(Author)
- 2023(Publication Date)
- Wiley(Publisher)
Need to add a Birthdate field? Just do it! That doesn’t mean you can’t create a rule saying there can be only one Customer element for any given customer. You might have to do some extra work to enforce that rule because the document won’t do it for you. Similarly, you can use a graph database to define a network. If you like, you can make a rule that all nodes shall have between zero and four neighbors, but you’ll have to enforce that rule yourself because the database won’t do it for you. Summary ❘ 199 SUMMARY Normalization is the process of rearranging a database’s table designs to prevent certain kinds of data anomalies. Different levels of normalization protect against different kinds of errors. If every table represents a single, clearly defined entity, then you’ve already gone a long way toward making your database safe from data anomalies. You can use normalization to further safeguard the database. In this chapter, you learned about: ➤ Different kinds of anomalies that can afflict a database ➤ Different levels of normalization and the anomalies that they prevent ➤ Methods for normalizing database tables The next chapter discusses another way that you can reduce the chances of errors entering a database. It explains design techniques other than normalization that can make it safer for a software applica- tion to manipulate the database. Before you move on to Chapter 8, however, use the following exercises to test your understanding of the material covered in this chapter. You can find the solutions to these exercises in Appendix A. EXERCISES 1. Suppose a student contact list contains the fields Name, Email, Email, Phone1, PhoneType1, Phone2, PhoneType2, and MajorOrSchool. The student’s preferred email address is listed in the first Email field and a backup address is stored in the second Email field. Similarly, the preferred phone number is in the Phone1 field and a backup number is in the Phone2 field. - eBook - PDF
- Joy Starks, Philip Pratt, Mary Last, , Joy Starks, Philip Pratt, Mary Last(Authors)
- 2018(Publication Date)
- Cengage Learning EMEA(Publisher)
C H A P T E R 5 DATABASE DESIGN 1: NORMALIZATION L E A R N I N G O B J E C T I V E S • Discuss functional dependence and primary keys • Define first normal form, second normal form, third normal form, and fourth normal form • Describe the problems associated with tables (relations) that are not in first normal form, second normal form, or third normal form, along with the mechanism for converting to all three • Discuss the problems associated with incorrect conversions to third normal form • Describe the problems associated with tables (relations) that are not in fourth normal form and describe the mechanism for converting to fourth normal form • Understand how normalization is used in the database design process I N T R O D U C T I O N You have examined the basic relational model, its structure, and the various ways of manipulating data within a relational database. In this chapter, you will learn about the normalization process and its underlying concepts and features. The normalization process is a series of steps that enable you to identify the existence of potential problems or anomalies in the database along with methods for correcting these problems. An update anomaly is a data inconsistency that results from data redundancy, the use of inappropriate nulls, or from a partial update. A deletion anomaly is the unintended loss of data due to deletion of other data. An insertion anomaly results when you cannot add data to the database due to absence of other data. To correct anomalies in a database, you must convert tables to various types of normal forms . A table in a particular normal form possesses a certain desirable collection of properties. The most common normal forms are first normal form (1NF), second normal form (2NF), third normal form (3NF), and the lesser-used fourth normal form (4NF). - eBook - ePub
Database Modeling and Design
Logical Design
- Toby J. Teorey, Sam S. Lightstone, Tom Nadeau, H.V. Jagadish(Authors)
- 2011(Publication Date)
- Morgan Kaufmann(Publisher)
Relational database tables, whether they are derived from ER or Unified Modeling Language (UML) models, sometimes suffer from some rather serious problems in terms of performance, integrity, and maintainability. For example, when the entire database is defined as a single large table, it can result in a large amount of redundant data and lengthy searches for just a small number of target rows. It can also result in long and expensive updates, and deletions in particular can result in the elimination of useful data as an unwanted side effect.Such a situation is shown in Figure 6.1 , where products, salespersons, customers, and orders are all stored in a single table called Sales . In this table we see that certain product and customer information is stored redundantly, wasting storage space. Certain queries, such as “Which customers ordered vacuum cleaners last month?,” would require a search of the entire table. Also, updates, such as changing the address of the customer Dave Bachmann, would require changing many rows. Finally, deleting an order by a valued customer, such as Qiang Zhu (who bought an expensive computer), if that is his only outstanding order, deletes the only copy of his address and credit rating as a side effect. Such information may be difficult (or sometimes impossible) to recover. These problems also occur for situations in which the database has already been set up as a collection of many tables, but some of the tables are still too large.Figure 6.1 Single table database.If we had a method of breaking up such a large table into smaller tables so that these types of problems would be eliminated, the database would be much more efficient and reliable. Classes of relational database schemes or table definitions, called normal forms , are commonly used to accomplish this goal. The creation of a normal form database table is called normalization - Mark L. Gillenson, Paulraj Ponniah, Alex Kriegel, Boris M. Trukhnov, Allen G. Taylor, Gavin Powell, Frank Miller(Authors)
- 2012(Publication Date)
- Wiley(Publisher)
You will often find that this is the least complicated and most efficient way to handle the normalization process. Even though some tables require additional normalization, others will already be at 3NF and not require any additional effort. 4.3.4 Denormalizing Data The smaller tables created by the normalization process are typically the most efficient design for data entry and data modification. One reason for this is that you have eliminated duplicate data, reducing the amount of data that must be written to the database. However, there are two potential problems. 4.3.4 DENORMALIZING DATA 135 One problem is that this process can be taken to the extreme. Consider cus- tomer addresses for example. It’s likely that you will have several customers in the same state. You could create a separate STATE table to contain this informa- tion and create a relationship between the CUSTOMER and STATE tables, but it’s probably less work and requires less storage space to just store the 2-character state abbreviation with each customer. The other problem is that a design that makes for efficient data entry does not always make for efficient data retrieval. During normalization, you tend to break a table down into smaller, related tables. There is a good possibility that at least some of your queries will require you to recombine that data into a sin- gle result. This is done through a query process known as joining, where you combine the data from two tables based on a linking column or columns. Typ- ically, you will combine two related tables based on the foreign key, but that’s not the only possibility. This can be a resource-intensive process, becoming more intensive as more tables are added. This can be an issue in any database, but tends especially to be a problem in decision support databases where the majority of the database activity relates FOR EXAMPLE Finding New Tables It’s fairly common to “discover” new tables during the normalization process.- eBook - PDF
- B. Walraet(Author)
- 2014(Publication Date)
- North Holland(Publisher)
As usual, that is overdone. Normalization is a process by which data consistency is obtained at the price of having to manage many inter-related tables. The management of many tables is however harder since more joins are needed, which may cause performance problems. Thus, one should keep in mind that compromise is the reality of daily work. Many anomalies can be avoided by means of discipline. If sufficient other ways to enforce such discipline can be in-stalled, then one can very well envisage not to go that far in normalization. This idea is called selective denormalization... In fact, many believe that the first normal form is much too rigorous. Are all re-peating groups so bad as Codd said? What about an array, specifically one with fixed bounds? No harm in such a field, provided all elements of the array have the same se-mantic weight, i.e. no element of the array is used as a key (or part thereof). Of course, such an array (the whole array) can be a key or part of a key. Even an array with varying bounds is acceptable, under the same semantic conditions; however, such arrays cause physical representation problems if our system has no list struc-tures (and this is the case for all conventional programming languages, unfortunate-ly). The argument can be extended to group fields, which can certainly not harm pro- The Years of Data Normalization vided no single element of a group field is itself a key (or part of a key). Conversely, a group (the whole group) could be a key (or part thereof). Obviously, the elements of a group are visible, i.e. they can be used by the programs. Of course, groups can be nested, in which case only the outer group could be a key. This being said, there are other ways to decompose relations. Normalization, as presented, decomposes a given table by projection only.
Index pages curate the most relevant extracts from our library of academic textbooks. They’ve been created using an in-house natural language model (NLM), each adding context and meaning to key research topics.











