Sunday, April 19, 2015

Maintain Historical and Current data in data Warehouse

When you want to keep history in Data Warehouse then need to manipulate END_DT along with EFF_DT in the table to find out active rows and inactive rows.
For example source table Customer look like below:

                                     
                                       Fig: 1.1 Source table ‘Customer’

The above source table doesn’t have any end date but when you will get the date from source to target you need to add end date. As we can see from the above data, we have two customers and each customer has many rows, sometimes they changes address and sometimes name.

When you get data into target table only one row should  have active for one customer by saying end_date as ‘9999-12-31’ and all other rows for the customer will have end_date as previous row’s EFF_DT. The data should look like below:


 
Fig: 1.2 How End_date works in  ‘Customer’ table

Why we care about End date (END_DT) in data warehouse?

In data warehouse initially we take full load and then we take delta (each day’s data) every day.  You may ask, can’t we do exact copy from source table?  Do we need to add End Date (END_DT)??!!
And simple answer of your question will be YES and one of the reasons of having data warehouse is isolate active data from the historical data by putting END_DT.  

Take an example, if you want to make query and get the latest attributes of customer_id=101 then how will you find the latest information of the customer. If you make query like below :

Select * from Customer
where customer_id=101;

--Result: you will find all five rows like above

And to get latest information of the customer ‘PAUL’, please run below query:

Select * from Customer
Where customer_id=101 and date between EFF_DT and END_DT

Result:

Customer_id
CUSTOMER_NAME
Address
EFF_DT
END_DT
101
PAULAS
Bondhev 52
10-04-2015
31-12-9999

             Fig 1.3: Find out the latest information of the customer

Without having END_DT your data warehouse will behave same like as OLTP database where you have to make complex query to find out the latest information.


Now we will find out how we can populate END_DT from EFF_DT by using Window function: Rows between 1 preceding and 1 preceding (Coming in next post.)

No comments: