I challenge you. I challenge you to think of a report that doesn’t rely on a date somewhere in it. Thought of one? If you did, is the information in the report actually valuable without some type of date range boundary? Even in academia and clinical research, where groundbreaking reports are published and stand up to scrutiny for decades, researchers still clearly call out the dates in which the data was collected. They do this because they know the results of their experiments were valid for the date and time they were collected, but could still change at any time. Remember: it was a scientifically held belief that the Earth was the center of the universe for a long time.
This same principle spreads throughout many aspects of our lives, and when building out a business intelligence infrastructure, we need to make sure we’re creating this reference point in a dependable and scalable way.
This post assumes that you have some experience with Microsoft Power BI, and it’ll be even more valuable if you’ve worked with DAX expressions previously. If you’re new to business intelligence altogether, take a look at my previous post to help get you started.
What is a Date Dimension?
You’re already familiar with the concept of a dimension: length, width, height, depth, etc. These are all measurable properties and can be applied to all physical objects (e.g. the screen you’re reading this on). Most people, apart from data scientists and astrophysicists, don’t think too much about time also being a dimension. Don’t worry, this doesn’t get too sci-fi from here on out.
Let’s jump back to grade school and take a look at a simple chart:
Woah, there it is! This line chart that simply represents the relationship between two dimensions, one of which (in this case) is time. Let’s take a look at another simple chart to reiterate this concept:
Here we’ve got another two-dimensional chart, this time with sales figures and the respective country. This information might be useful, but there’s no concept of time, which poses a problem. Are these year-to-date sales? Are they yesterday’s sales? In order to get the most out of this data, we need to represent the time dimension in some way.
Years ago, Microsoft introduced the multi-dimensional model (MDX) as a way to store analytical business data. They solved the problem we’re having by adding a third dimension, resulting in something like this:
At first glance, this can be fairly intimidating, so let’s apply our previous chart to this model.
Now, this isn’t necessarily the best visual for representing this type of data (can you imagine if we replaced year with week and had 156 little bars crammed in there?), but it helps get your mind around what the date dimension is.
Over the past decade, Microsoft has started to lean away from MDX due to its technical complexity and limited capability in favor of the data analysis expressions (DAX) syntax. This is one of the core languages in Power BI and SQL Server Analysis Services (Tabular), and is advertised as the easy-to-use, simplified data modeling language. Even though we’re transitioning away from MDX “cubes,” it’s still imperative to keep a date dimension in the picture.
Building a Date Dimension
I’m going to walk you through building a basic date dimension table in Power BI Desktop. This is the easiest way to bootstrap your data model with a date hierarchy.
- Let’s add a new table to our data model. In Power BI Desktop, select Modeling → New Table. This will allow us to enter a DAX expression in order to generate a one-column table.
- We’re going to leverage some built-in DAX functions that are cognizant of the date data type. We have two options: CALENDAR and CALENDARAUTO.
- CALENDAR allows us to specify a start and end date. This is the safest way to get a list of contiguous dates between a date range.
- CALENDARAUTO will automatically create a table of dates based on the oldest date in your data model and either the most recent date in your data model or the last day in the month specified.
CALENDAR is a safer choice to get a list of contiguous dates between a date range because we have more control over the data that is returned. Depending on the data you’re working with, you can still get smart with how you select the start and end dates for your date table. For instance, if you know that you don’t have any data with dates marked in the future, we can limit our end date to TODAY. Similarly, if you want your date table to be dynamically generated based on your data, you can set the start and end dates to the minimum or maximum date from another column in your data.
In this example, we’re going to assume that the “beginning of time” is the beginning of 2015, and that we don’t have any dates in the future, therefore no dates will be past TODAY.
- Once you’ve entered your CALENDAR expression and hit enter, a new table will appear in your data model. The greyed-out icon means that it was created in the DAX layer, rather than pulled from your data source.
Inspecting the data in this table reveals the start of our date dimension (DimDate) table. Since we’re working with dates specifically, the time will automatically be set to midnight. It’s worth changing the data type of this column to a Date, rather than DateTime, to save yourself a headache if you ever end up keying off of this Date column and have issues.
- We can now start building out some additional columns that can be used as labels on chart and table visualizations, as well as some Boolean columns to make filtering a bit easier. While viewing the DimDate table, select Modeling →New Column.
Take a look at the Reference section at the bottom of this post to see a few useful columns and their respective DAX expressions.
- And just like that, you’ve got a date dimension table! Next, we’ll go over how to use it in measurements and charts.
Using a Date Dimension
Now that we have a date dimension table, we can implement it in one of two ways: relationships and measure filters. If you’re working with data that contains one date (e.g. a sale date), relationships are the easiest way to get started. However, as soon as you start needing to represent multiple metrics that are based on different dates (e.g. a sale date and shipped date), you’ll want to consider measure filters.
If you’ve never used measures before, it’s important to understand what a measure is and how to use one. Measures allow us to define a metric that can then be used on visualizations. They grant more flexibility over the built-in aggregations that Power BI provides.
Let’s say I have a list of sales, with a corresponding dollar amount and date. I can represent the “Total Sales” measure with the following expression:
This measure by itself doesn’t respect dates. It will calculate the sum of all of our sales amounts for all eternity. Charting this over time will result in a strange looking chart rather than the trend line we were hoping for. We need to explicitly define our date bounds in the measure.
Here, we’ve explicitly stated that we want to calculate the sum of our sales where the SalesDate is between the minimum and maximum date in our date table. By itself, this will behave exactly the same as the measure without the FILTER condition. We can now start constraining our date dimension using charts and slicers.
The date table will naturally be constrained as soon as it’s plotted on an axis (as seen above). Each month can be seen as a “bucket of dates” that the measure will be calculated within. For the month of January, the measure will be evaluated with a minimum and maximum date for just that month. This is repeated for each “bucket” on the x-axis (February, March, etc).
This is just one of many patterns using the date dimension in coordination with measures. For more examples, take a look at all of the Time Intelligence Functions available.
- Add your date dimension table in to a shared database. If you rely on a date table that’s created entirely in your reporting environment (i.e. Power BI Desktop), you’ll have to recreate it for every report. It’s a best practice to have a single source of truth when it comes to dimension tables.
- Does your business deal with holidays and working days? These are things that can’t necessarily be dynamically calculated, so it might make sense to create a date table by hand.
Now that you’re considering implementing a date dimension, imagine all of the other dimensions that your data model might consist of. Business intelligence applications benefit from a common structure—normalized star or snowflake schemas. The closer you get to implementing these patterns, the easier and more robust the tooling becomes.