SCD2 (version number) – I need my ancestors!
To implement SCD2 using the wizard, perform the following steps:
- In the Designer, navigate to Tools | Mapping Designer | Mapping | Wizard | Slowly Changing Dimensions, as shown in the following screenshot:
A new window will pop up, asking you the name of the new SCD mapping (
m_SCD2_VERSION_NUMBER
). Also, select the type of SCD you wish to implement. Select Type 2 Dimension - keep a full history of the changes in the target, as we are implementing SCD2 using a version number. Click on Next. - The next screen will ask you to select the source. Select a source from the drop-down list. We are using
EMP_FILE.txt
as the source file for our reference. Also, specify the name of the target you wish to create. We will name the targetEMPLOYEE_SCD2_VERSION_NUMBER
in this book for our reference. Then, click on Next. - In the next window, select
EMPLOYEE_ID
as Logical Key Field. Also, addLOCATION
under Fields to compare the changes, and then click on Next. - The next screen asks you to choose the option to maintain the history in the target. Select Keep the 'version' number in separate column and click on Finish.
- The wizard creates a complete mapping in your Mapping Designer Workspace. Make the necessary changes in the mapping, if required.
- Change the target data type from the flat file to
Oracle
table, as shown in the following screenshot:
When we create an SCD2 mapping using a version number, the wizard creates two additional columns in the target table:
PM_PRIMARY_KEY
: The wizard generates the primary key for each row to be inserted into target. Note thatEMPLOYEE_ID
will not be the primary key in the table.PM_VERSION_NUMBER
: The wizard generates a version number for each row inserted into table; this allows us to differentiate between current and historical records.
The Informatica PowerCenter SCD2 mapping uses a 0
lookup transformation to look up the data in the target table and uses the EXP_Detect_Changes
expression transformation to compare the target data with the source data. Based on the comparison, the expression transformation marks a record as NewFlag
or ChangedFlag
using a flag. The mapping is divided into the following two flows:
- The
FIL_InsertNewRecord
filter transformation allows only theNewFlag
record to pass further and filters theChangedFlag
record from the first flow. It passes new records toUPD_ForceInserts
, which inserts these records into the target. The sequence generator, which isSEQ_generateKeys
, generates the primary key for eachNewFlag
record. TheEXP_KeyProcessing_InsertNew
expression transformation multiplies the primary key value by1000
and loads0
as the version number for each new row into the target, which isEMPLOYEE_SCD2_VERSION_NUMBER
. - The
FIL_InsertChangedRecord
filter transformation allows only theChangedFlag
record to pass further and filters the records marked asNewFlag
from second flow. It passes the changed records toUPD_ChangedUpdate
, which replaces existing rows in the target to reflect the latest changes. The expression transformation, which isEXP_KeyProcessing_InsertChanged
, increments both the primary key and version number by1
and loads them into the target instance, which isEMPLOYEE_SCD2_VERSION_NUMBER1
.
Let's work through each transformation that is used in the SCD2 mapping:
- Source qualifier (
SQ_EMP_FILE
): This extracts the data from the file or table that you used as a source in the mapping. It passes data to the downstream transformations, that is, lookup, expression, and filter transformation. - Lookup (
LKP_GetData
): This is used to look up the target table. It caches the existing data fromEMPLOYEE_SCD2_VERSION_NUMBER
. TheEMPLOYEE_ID=IN_EMPLOYEE_ID
condition will compare the data with the source table and target table. It passes the data based on the comparison with the expression transformation. - Expression (
EXP_DetectChanges
): This receives the data from an upstream transformation and based on that, it creates two flags, which areNewFlag
andChangedFlag
:- Condition for NewFlag:
IIF(ISNULL(PM_PRIMARYKEY), TRUE, FALSE)
- Condition for ChangedFlag:
IIF(NOT ISNULL(PM_PRIMARYKEY) AND (DECODE(LOCATION,PM_PREV_LOCATION,1,0)=0), TRUE, FALSE)
It passes the data to downstream filter transformations.
- Condition for NewFlag:
- Filter (
FIL_InsertNewRecord
): This filters the records that come from an upstream expression transformation and are marked asChangedFlag
; it only allows records withNewFlag
to get passes to theUPD_ForceInserts
update strategy. - Filter (
FIL_UpdateChangedRecord
): This filters the records that come from an upstream expression transformation and are marked asNewFlag
; it only allows records withChangedFlag
to get passed to theUPD_ChangedInserts
update strategy. - Update strategy (
UPD_ForceInserts
): This uses theDD_INSERT
condition to insert data into the target, which isEMPLOYEE_SCD2_VERSION_NUMBER
. - Update strategy (
UPD_ChangedInserts
): This uses theDD_UPDATE
condition to overwrite existingLOCATION
value into the target instance, which isEMPLOYEE_SCD2_VERSION_NUMBER1
. - Sequence generator (
SEQ_GenerateKeys
): This generates a sequence of values for each new row marked asNewFlag
, which incrementally comes into the target by1
. It populates the value into thePM_PRIMARYKEY
column in theEMPLOYEE_SCD2_VERSION_NUMBER
target. - Expression (
EXP_KeyProcessing_InsertNew
): This multipliesNEXTVAL
generated by the sequence generator by1000
using theNEXTVAL*1000
condition. Note that you can change this number as per your requirement. Using1000
here means that we can maintain a1000
history of a particular record. - Expression (
EXP_KeyProcessing_InsertChanged
): This is used to increment the primary key by1
and also increment the version number by1
for every changed record. - Target (
EMPLOYEE_SCD2_VERSION_NUMBER
): This is the target table instance that accepts new records into the target table. - Target (
EMPLOYEE_SCD2_VERSION_NUMBER1
): This is the target table instance that accepts changed records into the target table.