Akhil Narayanan Nair home

DW - Dimension Design Consideration

1) Surrogate Key

A system generated incremental integer, aka Surrogate Key, is recommended in the dimension. Surrogate key is used to:

Kimball Tips: Every join between dimension and fact tables in the data warehouse should be based on meaningless integer surrogate key. You should avoid using the natural operational production codes. None of the data warehouse keys should be smart, where you can tell something about the row just by looking at the key.

2) Star Schema

Star Schema is the most natural way to model a data warehouse, where only one join establishes the relationship between the fact table and any one of the dimension tables. Although OBIEE tool supports both Star Schema and Snowflake Schema, Star schema is recommended for data modeling. Fact tables in either star or snowflake schema is identical, however, the dimension tables are modeled differently.

Why star schema?

Oracle Database Data Warehousing Guide: Oracle recommends that you choose a star schema unless you have a clear reason not to.

3) Permissible Snowflake

The snowflake schema is a variation of the star schema in which the redundant attributes are removed from the dimension tables and placed in normalized tables.

Limitations

There are scenarios that snowflake schema could be useful:

4) Slow Changing Dimension

Slow changing dimension refers to the dimension whose attributes evolve over time, e.g. Persons change their names, work location, job code etc. For every slow changing dimension attributes, it needs to identify “change” history.

Depending on business requirement, a slowly changing dimension can be designed in three basic options.

Type 1 SCD

Handling: Overwrite the Changed Attribute value. Advantage: Easy and fast Disadvantage: Lose history of changes

Type 2 SCD

Handling: Add a new Dimension Record. Advantage: Reflect history. Disadvantage: Dimension table growth; Requires Surrogate key.

Type 3 SCD

Handling: Add a new column for prior attribute. Advantage: Before and after views. Disadvantage: Lose intermediate values.

A recap of Type 2 SCD processing (from Kimball Design Tips)

Suppose that an individual employee changes office location, affecting the values of five attributes in the employee record. Here are the steps for what we call Type 2 SCD processing:

5) Degenerate Dimension

A degenerate dimension (DD) acts as a dimension key in the fact table, however, it does not join to a corresponding dimension table. Degenerate dimensions commonly occur when the fact table’s grain is a single transaction.

Keep operational control numbers such as order numbers, invoice numbers in fact table and use it as degenerate dimension. Degenerate dimensions often play an integral role in the fact table’s primary key. Instead of combining all foreign keys in fact table (e.g. surrogate key columns) for primary key, the degenerate dimension guarantees the uniqueness of a fact table row. Degenerate dimension is a good candidate for fact record counts.

Kimball Tips: We typically do not implement a surrogate key for DD. Usually the values for the degenerate dimension are unique and reasonably sized. However, if the operational identifier is an unwieldy alpha-numeric, a surrogate key might conserve significant space, especially if the fact table has a large number of rows.

6) Role-Playing Dimension

Role playing occurs when a single physical dimension table appears several times in a fact table, each represented as a separate logical table. E.g. Order Date Dimension, Purchase Date Dimension, Service Date Dimension, Hire Date Dimension can all be sourced from Time Date Dimension. Do not create multiple physical tables for all Time Date related dimension, instead, a database view or OBIEE view can be created from the role playing Date dimension.

Kimball Tips: Role-playing in a data warehouse occurs when a single dimension simultaneously appears several times in the fact table. The underlying dimension may exist as a single physical table, but each of the roles should be presented to the data access tools in a separately labeled view.

7) Junk Dimensions

Instead of storing multiple indicators and flags in the fact table, they could be grouped together to form a junk dimension.

Kimball Tips: A junk dimension is a convenient grouping of typically low-cardinality flags and indicators. By creating an abstract dimension, we remove the flags from the fact table while placing them into a useful dimensional framework.

8) Rolled-up Dimension

Rolled-up Dimension refers to the dimension that is created from the base dimension table. It is used strictly for aggregated fact. As we do not need to build all potential aggregation combination of a fact table, neither do we need to create all levels of base dimension into rolled-up dimension tables.

Rolled-up Dimension has its own surrogate key and need to carry attributes for its level from base dimension. Consider Rolled-up dimension as an independent dimension table.

Kimball Tips: When aggregating facts, we either eliminate dimensionality or associate the facts with a roll-up dimension. These rolled-up, aggregated dimension tables should be shrunken versions of the dimension associated with the granular base fact table. In this way, aggregated dimension conform to the base dimension table.

Note: Slow changing is a business requirement, before making a dimension into a SCD; confirm the needs with business user as slow changing dimension could have performance impact on both ETL job and BI query.