Posts

Showing posts from September, 2011

Comparing Master Data Services instances

As I have mentioned in previous posts, I've been working on a data warehouse project. Within the project we decided to use SQL Server 2008 R2 Master Data Services (MDS) to store all the warehouse specific reference data. Here are some reasons why MDS was used: Allowed us to manage the reference data. Reference data could be loaded into the warehouse like a source system Would allow the client’s data governance team to easily update the reference data and keep the warehouse up-to-date For active development of entities and data loads we use a sandpit instance. When the ETL was ready to use the new entities or loaded data, a cut of the sandpit instance would be promoted to the development environment. We came across a problem when we needed to identify some changes which were accidentally made on the development instance. I came up with a method which helped to identify the changes fairly easily, below is the method I used. I will say that it’s not a perfect solution and migh

Choosing the right CDC tool for the job

Introduction I have, as mentioned in a previous blog , been working on a data warehouse project using CDC for extracting the source system data. I would like to share some of the experiences and evaluation criteria used for selecting a CDC tool for our project. The reason for using a specialist tool was as follows: Couldn't move all the source databases to SQL Server 2008 Needed the capture change data to be sent to another server instance A consistent management experience for CDC between all our source systems. The two database vendors the tool needed to support was SQL Server and IBM AS400 Timescales First we underestimated how long it would take to select a CDC tool. We plan for 1 month, for both evaluation criteria and running the tests, and it took us about 2/3 months. The time was taken in evaluating the tools against the criteria and gaining access to database systems from other projects. However our development of the ETL solution was able to continue, while t

Source system data to warehouse via CDC.

Image
Introduction I have been working on a data warehouse project with a difference. The difference is that the ETL is not doing the classic extract of source system data, instead the source system is going to send its data to the data warehouse by using change data capture (CDC). The decision for using CDC was as follows: To have the ETL only process the data that has changed within the source system between each run of the ETL. This would help the ETL perform as it would only have to process the change data and not work out what the changes were first. Not to have the ETL processes impact the central source system database. The CDC would be responsible for delivering the changes of the source system to another database server. Then the ETL could be run at any time within the day and wouldn't be responsible for blocking the transactional source system. As there was plans to have the ETL run at the end of the day for two time zones e.g. US and UK. I would like to share some o