1: Concept: Introduction to Data Modelling
In the past, the term data modelling was often unfamiliar to business users as data modelling tended to be the domain of IT BI professionals. But this is no longer the case, thanks to the introduction of self-service BI tools such as Power BI and Power Pivot for Excel.
What Is Data Modelling?
Data modelling is the process of taking data from various sources; loading, structuring, and relating data logically to other data; and enhancing, embellishing, and generally preparing the data for use. The objective is to allow the data to be used without having to write a custom query every time you want to look at a different subset of data.
The data modelling process includes:
- Determining the optimal structure and shape of the source data to analyse, including whether to bring in all the data, a subset of the data, or summary data.
- Loading the data from the source into the data model (Power BI in this case).
- Defining the logical relationships between the various tables (which is similar to what you do with VLOOKUP() in Excel, except the data stays in the source table in Power BI).
- Defining data types (e.g., specifying whether a column of data is a column of decimal values or a column of currency values, a column of text, etc.).
- Creating new insights from the source data so that you can analyse concepts that donât exist natively in the source data but that can be calculated or created inside the data model. For example, if you have a table of transactional data with cost price and selling price, you can extend the data model to include calculations for margin, margin percentage, etc., even though these concepts are not explicitly in the source data. Once you have modelled these new facts in the data model, they can be reused over and over by people using your workbook.
- Giving meaningful names to your new business insights (i.e., to your measures).
Power BI, Power Pivot, and SSAS Tabular
The data modelling engine that is used inside Power BI is the same one used in Power Pivot for Excel and SQL Server Analysis Services (SSAS) Tabular. The engine has quite a few names, including xVelocity, VertiPaq, SSAS Tabular, and Power Pivot. Despite the different names, the engine is essentially the same across all of these software products (with the exception of any version differences related to version release timing). It therefore follows that virtually everything you learn in this book about data modelling and DAX can be applied to Power Pivot for Excel and SSAS Tabular. In this book I most commonly refer to Power BI, but you should keep in mind that most of the content also applies to these other products.
Remember: Power BI Is a Database Tool, and Excel Is a Spreadsheet Tool
One very important concept you simply must understand is that Power BI is a database tool, and Excel is a spreadsheet tool. A database and a spreadsheet are not the same. A database has structure; it consists of one or more tables of data. Each table has zero, one, or more rows of data and one or more columns. Each column has a defined data format, and the data in each column must conform to that format. A database does not have a column-and-row reference system. A spreadsheet (e.g., an Excel worksheet) lacks the structure of a database. A spreadsheet, like a database, has columns and rows, but these columns and rows lack structure. You can put any data you want into any cell in a spreadsheet, and you can refer to a cell by using a cell reference. You have to follow a lot of rules with a database that donât apply to a spreadsheet.
Remember that Power BI is not Excel. You need to think differently if you are to master Power BI than you need to think to master Excel.
Power BI Is a Data ModelâBased Tool
Power BI is a data modelâbased BI reporting tool. Not all BI tools are data model based. One example of a non-data modelâbased BI tool is SQL Server Reporting Services (SSRS). Non-data modelâbased BI reporting tools require the report writer to first generate a query to fetch the data from a database (typically SQL Server) and return the results of that query to SSRS so the results can be rendered in a report. With a non-data modelâbased reporting tool, you can typically use a user interface that helps with the generation of the query, or you can use a scripting language such as T-SQL to fetch the data that you need for each report.
Traditional Excelâthat is, the spreadsheet tool without the modern BI add-ins of Power Query and Power Pivotâis also a non-data modelâbased BI tool. In the case of traditional Excel, the user loads data into the spreadsheet and then logically relates and aggregates the data using Excel formulas and builds a report (often on a new sheet) to summarise and present the results (the report).
Now donât confuse âa toolâ here with âa data modelâbased tool.â Excel is definitely a tool; it is a very flexible tool that lets you build virtually any report without being a programmer. In fact, I think Excel is probably the best and most popular BI tool ever invented. But it is not a data modelâbased tool because traditional Excel doesnât have a data model.
There is nothing wrong with using a non-data modelâbased BI tool; it is just a different approach from using a data modelâbased tool. The biggest issue with non-data modelâbased reporting tools is that every time you need a new report, you have to start again, often from scratch. Each report has a single purpose, and there is very little reusability or extensibility.
A data modelâbased tool like Power BI has many benefits, including the following:
- The author of the data model builds a reusable model that can be used to solve the current reporting requirements as well as (often) future requirements without the need to write further queries to retrieve a new subset of data.
- The author can often be a business user (normally with good Excel skills) and doesnât have to be a professionally trained database administrator or SQL professional.
- The data model is conceptual in nature, supported by a user-friendly interface that lets you build the data model logically, with minimal coding. Keep in mind that you do need to do some coding (formula writing) for a good data model, but it is fairly easyâno harder than building a typical spreadsheet in Excelâwhen you have the skills, as you will see later in this book.
The term data modelling can be a little bit scary, but there is no reason to be concerned. When you learn the DAX language and join your tables of data in Power BI, you are actually learning data modelling. By the time you have finished this book, you will be well on your way to being an accomplished data modeller using Power BI. Just use the techniques covered in this book and keep in mind that what you are actually doing is learning to be a data modeller.
With all this in mind, itâs time to build your first simple data model.
2: Concept: Loading Data
The first step in data modelling is to load data into Power BI Desktop. The image below shows the data connector that appears when you connect to a SQL Server database from Power BI Desktop. (There is a different data connector for each data source. Youâll see how to get to the various data connector screens later in this chapter.) There are two modes that you can use in Power BI Desktop when loading data from a database tool such as SQL Server...