Thursday, November 3, 2016

BI Data 201A: Assignment 05: ETL Coding

Tim Pauley
BIDATA 201b Assignment 05
Assignment due: Nov 7

Blog location




Assignment 05: ETL Coding

In this “hands on” activity you will learn about creating ETL code to fill a Data Warehouse using Microsoft’s SQL Server. This activity will take you about 3 to 6 hours, so plan accordingly!

The activity consists of the follow steps:

  1. Watch the module videos.
  2. Read the 6th chapter of your textbook.
  3. Perform the Exercises in the textbook.
  4. Review the Instructor’s BI solution spreadsheet.
  5. Update the Instructor’s BI solution spreadsheet.
  6. Create your own SQL ETL script.
  7. Update your Blog. (optional)
  8. Submit your work to the Canvas web site.

Watch the module videos
Please watch the video I created about performing this homework. This can be found under Module 05 on the Canvas website.


Read the sixth chapter of your textbook.
In chapter 6 you will learn how to create an ETL Script. This script will then be used to fill a data warehouse.

Perform the Exercises in the textbook.

EXERCISE 6-1. CREATING AN ETL SCRIPT
In this exercise, you create the ETL code needed to fill the Publication Industries data warehouse. You can choose either to type the SQL code presented here or to use the Query Designer tool to accomplish your goal. We recommend using both.


Review the Instructor’s BI solution spreadsheet.
Please review the current data warehouse design given to you by your project manager (me, your instructor).  I have provided you with my new version of the spreadsheet, called “Instructors BISolutionWorksheets.xlsx”, so please use that version for the assignment. Make special note of the new column I have added.

Update the Instructor’s BI solution spreadsheet.
Now that you have examples of the code you will need to perform the ETL process, you need to update the “Instructors BISolutionWorksheets.xlsx” with the new information as show here.


Review the instructors STARTER ETL script
To help you get started with the ETL scripting process I have created a starter script called “04_StarterETLScript.sql.” Open this file and add your own code ETL code to fill up the DimCustomers, DimDates, and FactOrders table.

IMPORTANT: Run the scripts 01_StarterScript Create the OLTP DB.sql and 02_SetupScript Create the DW DB.sql first, so that your data warehouse and OLTP database are consistent with your instructor’s version of these databases!


Create your own SQL ETL script
Create a SQL script that will create all the dimension tables, fact tables, and foreign key constraints as outlined in the instructor’s Mod5_BISolutionWorksheets.xlsx file.

Remember that your code will look very similar to what was shown in the book.



Update your Blog (OPTIONAL).
If you have been blogging, now it the time to update your Blog on how you created your SQL ETL script. If not, you can skip this part!


Submit a SQL final ETL script to the class web site
After you have created your final SQL ETL Script file, upload it to the Canvas web site.

Note: For you to get full points, I must be able to run your script on my PC using the provided source and destination databases. So, test that your code runs as a complete script!

Step 1 Update excel spreadsheet with transformation and etl functions


To complete this task I needed to transform the Object name OLTP into ETL function that would be used for the OLAP. Exercise 6 helped with this task. My conversion worksheet is below


Step 2 Ran create the oltp db


The instructor provided a script that created tables and relations. Ran in SSMS.

/**************************************************************
 Data Warehouse Design LAB
(STEP 1) Create a lite version of the Northwind database
(STEP 2) Create a new Data Warehouse called DWNorthwindLite
        a) The DW should have 3 regular dimensions and 1 fact dimensions
        b) The DW should have one fact table and 2 measures
 *************************************************************/

/***** (STEP 1) Create a lite version of the Northwind database *****/
-- Run this script to create the database you will need for the lab

If Exists(Select name from master.dbo.sysdatabases Where Name = 'NorthwindLite')
Begin
      USE [master];
      ALTER DATABASE [NorthwindLite] SET SINGLE_USER WITH ROLLBACK IMMEDIATE;
      DROP DATABASE [NorthwindLite];
End;
Go

Create Database NorthwindLite;
Go

USE NorthwindLite;
Go

SELECT
 [ProductID]
,[ProductName]
,[CategoryID]
INTO [Products]
FROM [Northwind].[dbo].[Products];
Go

SELECT
 [CategoryID]
,[CategoryName]
INTO [Categories]
FROM [Northwind].[dbo].[Categories];
Go

SELECT
 [CustomerID]
,[CompanyName]
,[ContactName]
,[Address]
,[City]
,[Country]
INTO [Customers]
FROM [Northwind].[dbo].[Customers];
Go

SELECT
  [OrderID]
 ,[CustomerID]
 ,[OrderDate]
INTO [Orders]
FROM [Northwind].[dbo].[Orders];
Go

SELECT [OrderID]
,[ProductID]
,[UnitPrice]
,[Quantity]
INTO [OrderDetails]
FROM [Northwind].[dbo].[Order Details];
Go

-- Add Primary keys
ALTER TABLE dbo.Products ADD CONSTRAINT
      PK_Products PRIMARY KEY CLUSTERED (ProductID );

ALTER TABLE dbo.Categories ADD CONSTRAINT
      PK_Categories PRIMARY KEY CLUSTERED ( CategoryID );

ALTER TABLE dbo.Customers ADD CONSTRAINT
      PK_Customers PRIMARY KEY CLUSTERED ( CustomerID );

ALTER TABLE dbo.Orders ADD CONSTRAINT
      PK_Orders PRIMARY KEY CLUSTERED ( OrderId );

ALTER TABLE dbo.OrderDetails ADD CONSTRAINT
      PK_OrderDetais PRIMARY KEY CLUSTERED ( OrderId, ProductId );
Go

-- Add For
ALTER TABLE dbo.Products ADD CONSTRAINT
      FK_Products_Categories FOREIGN KEY(   CategoryID )
      REFERENCES dbo.Categories( CategoryID );

ALTER TABLE dbo.OrderDetails ADD CONSTRAINT
      FK_OrderDetails_Products FOREIGN KEY ( ProductID )
      REFERENCES dbo.Products  ( ProductID );

ALTER TABLE dbo.OrderDetails ADD CONSTRAINT
      FK_OrderDetails_Orders FOREIGN KEY ( OrderID )
      REFERENCES dbo.Orders ( OrderID ) ;

ALTER TABLE dbo.Orders ADD CONSTRAINT
      FK_Orders_Customers FOREIGN KEY ( CustomerID )
      REFERENCES dbo.Customers (     CustomerID  );
Go
Select 'New database made with these objects:'
Select Name From SysObjects Where xtype in ('u', 'pk', 'f')
SELECT [TABLE_NAME]
      ,[COLUMN_NAME]
      ,[IS_NULLABLE]
      ,[DATA_TYPE]
      ,[CHARACTER_MAXIMUM_LENGTH]
      ,[NUMERIC_PRECISION]
      ,[NUMERIC_SCALE]
FROM [NorthwindLite].[INFORMATION_SCHEMA].[COLUMNS];
Go

/* (STEP 2) Create a new Data Warehouse DWNorthwindLite
        a) The DW should have 3 regular dimensions and 1 fact dimensions
        b) The DW should have one fact table and 2 measures
*/

Step 3 Ran create the dw db


The instructor provided a script that created tables and relations. Ran in SSMS.

/**************************************************************
 Data Warehouse Design LAB
(STEP 1) Create a lite version of the Northwind database
(STEP 2) Create a new Data Warehouse called DWNorthwindLite
        a) The DW should have 3 regular dimensions and 1 fact dimensions
        b) The DW should have one fact table and 2 measures
 *************************************************************/

/***** (STEP 1) Create a lite version of the Northwind database *****/
-- Run this script to create the database you will need for the lab

If Exists(Select name from master.dbo.sysdatabases Where Name = 'NorthwindLite')
Begin
      USE [master];
      ALTER DATABASE [NorthwindLite] SET SINGLE_USER WITH ROLLBACK IMMEDIATE;
      DROP DATABASE [NorthwindLite];
End;
Go

Create Database NorthwindLite;
Go

USE NorthwindLite;
Go

SELECT
 [ProductID]
,[ProductName]
,[CategoryID]
INTO [Products]
FROM [Northwind].[dbo].[Products];
Go

SELECT
 [CategoryID]
,[CategoryName]
INTO [Categories]
FROM [Northwind].[dbo].[Categories];
Go

SELECT
 [CustomerID]
,[CompanyName]
,[ContactName]
,[Address]
,[City]
,[Country]
INTO [Customers]
FROM [Northwind].[dbo].[Customers];
Go

SELECT
  [OrderID]
 ,[CustomerID]
 ,[OrderDate]
INTO [Orders]
FROM [Northwind].[dbo].[Orders];
Go

SELECT [OrderID]
,[ProductID]
,[UnitPrice]
,[Quantity]
INTO [OrderDetails]
FROM [Northwind].[dbo].[Order Details];
Go

-- Add Primary keys
ALTER TABLE dbo.Products ADD CONSTRAINT
      PK_Products PRIMARY KEY CLUSTERED (ProductID );

ALTER TABLE dbo.Categories ADD CONSTRAINT
      PK_Categories PRIMARY KEY CLUSTERED ( CategoryID );

ALTER TABLE dbo.Customers ADD CONSTRAINT
      PK_Customers PRIMARY KEY CLUSTERED ( CustomerID );

ALTER TABLE dbo.Orders ADD CONSTRAINT
      PK_Orders PRIMARY KEY CLUSTERED ( OrderId );

ALTER TABLE dbo.OrderDetails ADD CONSTRAINT
      PK_OrderDetais PRIMARY KEY CLUSTERED ( OrderId, ProductId );
Go

-- Add For
ALTER TABLE dbo.Products ADD CONSTRAINT
      FK_Products_Categories FOREIGN KEY(   CategoryID )
      REFERENCES dbo.Categories( CategoryID );

ALTER TABLE dbo.OrderDetails ADD CONSTRAINT
      FK_OrderDetails_Products FOREIGN KEY ( ProductID )
      REFERENCES dbo.Products  ( ProductID );

ALTER TABLE dbo.OrderDetails ADD CONSTRAINT
      FK_OrderDetails_Orders FOREIGN KEY ( OrderID )
      REFERENCES dbo.Orders ( OrderID ) ;

ALTER TABLE dbo.Orders ADD CONSTRAINT
      FK_Orders_Customers FOREIGN KEY ( CustomerID )
      REFERENCES dbo.Customers (     CustomerID  );
Go
Select 'New database made with these objects:'
Select Name From SysObjects Where xtype in ('u', 'pk', 'f')
SELECT [TABLE_NAME]
      ,[COLUMN_NAME]
      ,[IS_NULLABLE]
      ,[DATA_TYPE]
      ,[CHARACTER_MAXIMUM_LENGTH]
      ,[NUMERIC_PRECISION]
      ,[NUMERIC_SCALE]
FROM [NorthwindLite].[INFORMATION_SCHEMA].[COLUMNS];
Go

/* (STEP 2) Create a new Data Warehouse DWNorthwindLite
        a) The DW should have 3 regular dimensions and 1 fact dimensions
        b) The DW should have one fact table and 2 measures
*/

Step 4 Review ETL script

The instructor provided a script that created tables and relations. Ran in SSMS.
**Important: Test from Data Warehouse NOT FROM OLTP
Testing examples in YouTube video at 6:49 mark

/**************************************************************
 Create the Data Warehouse
*************************************************************/

--****************** [DWNorthwindLite] *********************--
-- This file contains ETL code for use with the
-- [DWNorthwindLite] database.
--****************** Your Version ***************************--

USE [DWNorthwindLite];
Go

--********************************************************************--
-- 1) Drop the FOREIGN KEY CONSTRAINTS and Clear the tables
--********************************************************************--
Alter Table [DWNorthwindLite].dbo.FactOrders -- The Many Child Table
      Drop Constraint [FK_FactOrders_DimProducts];

Truncate Table [DWNorthwindLite].dbo.DimProducts; -- The Single Parent Table

--********************************************************************--
-- 2) FILL the Tables
--********************************************************************--

/****** [dbo].[DimProducts] ******/
INSERT INTO [DWNorthwindLite].dbo.DimProducts
SELECT
  [ProductID] = Products.ProductID
 ,[ProductName] = CAST(Products.ProductName as nVarchar(100))
 ,[ProductCategoryID] = Products.CategoryID
 ,[ProductCategoryName] = CAST(Categories.CategoryName as nVarchar(100))
 ,[StartDate] = 20000101
 ,[EndDate] = Null
 ,[IsCurrent] = 'Yes'
FROM [NorthwindLite].dbo.Categories
INNER JOIN [NorthwindLite].dbo.Products
ON Categories.CategoryID = Products.CategoryID;
Go

/****** [dbo].[DimCustomers] ******/

Go

/****** [dbo].[DimDates] ******/

Go

/****** [dbo].[FactOrders] ******/

Go

--********************************************************************--
-- 3) Re-Create the FOREIGN KEY CONSTRAINTS
--********************************************************************--
ALTER TABLE DWNorthwindLite.dbo.FactOrders -- The Many Child Table
  ADD CONSTRAINT FK_FactOrders_DimProducts
  FOREIGN KEY (ProductKey) REFERENCES DimProducts(ProductKey); -- The Single Parent Table

--********************************************************************--
-- Review the results of this script
--********************************************************************--
Select * from [dbo].[DimProducts];
Select * from [dbo].[DimCustomers];
Select * from [dbo].[DimDates];
Select * from [dbo].[FactOrders];

Step 5 Drop the FOREIGN KEY CONSTRAINTS and Clear the tables

I needed to fill in this code. Than execute
/**************************************************************
 Create the Data Warehouse
*************************************************************/

--****************** [DWNorthwindLite] *********************--
-- This file contains ETL code for use with the
-- [DWNorthwindLite] database.
--****************** Tim Pauley Assignment 05 ***************************--

USE [DWNorthwindLite];
Go


--********************************************************************--
-- 1) Drop the FOREIGN KEY CONSTRAINTS and Clear the tables
--********************************************************************--
Alter Table [DWNorthwindLite].dbo.FactOrders -- The Many Child Table
       Drop Constraint [FK_FactOrders_DimProducts];

Alter Table [DWNorthwindLite].dbo.FactOrders -- The Many Child Table
       Drop Constraint [FK_FactOrders_DimCustomers];

Alter Table [DWNorthwindLite].dbo.FactOrders -- The Many Child Table
       Drop Constraint [FK_FactOrders_DimDates];

Truncate Table [DWNorthwindLite].dbo.DimProducts; -- The Single Parent Table
Truncate Table [DWNorthwindLite].dbo.DimCustomers; -- The Single Parent Table
Truncate Table [DWNorthwindLite].dbo.DimDates; -- The Single Parent Table

Step 6 FILL the Tables

--********************************************************************--
-- 2) FILL the Tables
--********************************************************************--

/****** [dbo].[DimProducts] ******/
INSERT INTO [DWNorthwindLite].dbo.DimProducts
SELECT
  [ProductID] = Products.ProductID
 ,[ProductName] = CAST(Products.ProductName as nVarchar(100))
 ,[ProductCategoryID] = Products.CategoryID
 ,[ProductCategoryName] = CAST(Categories.CategoryName as nVarchar(100))
 ,[StartDate] = 20000101
 ,[EndDate] = Null
 ,[IsCurrent] = 'Yes'
FROM [NorthwindLite].dbo.Categories
INNER JOIN [NorthwindLite].dbo.Products
ON Categories.CategoryID = Products.CategoryID;
Go
--SELECT * FROM [DWNorthwindLite].dbo.DimProducts --test DimProducts Table

/****** [dbo].[DimCustomers] ******/
INSERT INTO [DWNorthwindLite].dbo.DimCustomers
SELECT
  [CustomerID] = Customers.CustomerID
 ,[CustomerName] = CAST(Customers.CompanyName as nVarchar(100))
 ,[CustomerCity] = CAST(Customers.City as nVarchar(100))
 ,[CustomerCountry] = CAST(Customers.Country as nVarchar(100))
 ,[StartDate] = -1
 ,[EndDate] = Null
 ,[IsCurrent] = 'Yes'
FROM [NorthwindLite].dbo.Customers
Go

--SELECT * FROM [DWNorthwindLite].dbo.DimCustomers  --test DimCustomers Table

/****** [dbo].[DimDates] ******/
-- Create variables to hold the start and end date
Declare @StartDate datetime = '01/01/1990'
Declare @EndDate datetime = '12/31/1999'

-- Use a while loop to add dates to the table
Declare @DateInProcess datetime
Set @DateInProcess = @StartDate

While @DateInProcess <= @EndDate
 Begin
 -- Add a row into the date dimension table for this date
 Insert Into [DWNorthwindLite].dbo.DimDates
 ( [DateKey], [USADateName], [MonthKey], [MonthName], [QuarterKey],
[QuarterName], [YearKey], [YearName] )
 Values (
    Cast(Convert(nvarchar(50), @DateInProcess , 112) as int) -- [DateKey]
  , DateName( weekday, @DateInProcess ) + ', ' + Convert(nvarchar(50), @DateInProcess , 112) -- [USADateName]
  , Left(Cast(Convert(nvarchar(100), @DateInProcess , 112) as int), 6) -- [MonthKey]  
  , DateName( MONTH, @DateInProcess ) + ', ' + Cast( Year(@DateInProcess ) as nVarchar(100) ) -- [MonthName]
  ,  Cast(Cast(YEAR(@DateInProcess) as nvarchar(50))  + '0' + DateName( QUARTER, @DateInProcess) as int) -- [QuarterKey]
  , 'Q' + DateName( QUARTER, @DateInProcess ) + ', ' + Cast( Year(@DateInProcess) as nVarchar(100) ) -- [QuarterName]
  , Year( @DateInProcess ) -- [YearKey]
  , Cast( Year(@DateInProcess ) as nVarchar(100) ) -- [YearName]
  )
 -- Add a day and loop again
 Set @DateInProcess = DateAdd(d, 1, @DateInProcess)
 End

-- 2e) Add additional lookup values to DimDates
Insert Into [DWNorthwindLite].[dbo].[DimDates]
  ( [DateKey]
  , [USADateName]
  , [MonthKey]
  , [MonthName]
  , [QuarterKey]
  , [QuarterName]
  , [YearKey]
  , [YearName] )
  Select
    [DateKey] = -1
  , [DateName] = Cast('Unknown Day' as nVarchar(50) )
  , [Month] = -1
  , [MonthName] = Cast('Unknown Month' as nVarchar(50) )
  , [Quarter] =  -1
  , [QuarterName] = Cast('Unknown Quarter' as nVarchar(50) )
  , [Year] = -1
  , [YearName] = Cast('Unknown Year' as nVarchar(50) )
  Union
  Select
    [DateKey] = -2
  , [DateName] = Cast('Corrupt Day' as nVarchar(50) )
  , [Month] = -2
  , [MonthName] = Cast('Corrupt Month' as nVarchar(50) )
  , [Quarter] =  -2
  , [QuarterName] = Cast('Corrupt Quarter' as nVarchar(50) )
  , [Year] = -2
  , [YearName] = Cast('Corrupt Year' as nVarchar(50) )
Go
 --SELECT * FROM [DWNorthwindLite].[dbo].[DimDates] -- Test Dim Dates


/****** [dbo].[FactOrders] ******/
INSERT INTO [DWNorthwindLite].dbo.FactOrders
SELECT
  [OrderID] = Orders.OrderID
, [CustomerKey] = DimCustomers.CustomerKey
, [OrderDateKey] = DimDates.DateKey
, [ProductKey] = DimProducts.ProductKey
, [ActualOrderUnitPrice] = OrderDetails.UnitPrice
, [ActualOrderQuantity] = OrderDetails.Quantity
FROM  [DWNorthwindLite].dbo.DimCustomers
INNER JOIN  [NorthwindLite].dbo.Orders
ON DimCustomers.CustomerID = Orders.CustomerID
INNER JOIN [NorthwindLite].dbo.OrderDetails
ON Orders.OrderID = OrderDetails.OrderID
INNER JOIN [DWNorthwindLite].dbo.DimProducts
ON OrderDetails.ProductID = DimProducts.ProductID
INNER JOIN [DWNorthwindLite].dbo.DimDates

On DimDates.DateKey = isNull(Convert(nvarchar(50), Orders.OrderDate, 112),

'-1')
Go

Step 7 Re-Create the FOREIGN KEY CONSTRAINTS

I needed to fill in this code. Than execute

--********************************************************************--
-- 3) Re-Create the FOREIGN KEY CONSTRAINTS
--********************************************************************--
ALTER TABLE DWNorthwindLite.dbo.FactOrders -- The Many Child Table
  ADD CONSTRAINT FK_FactOrders_DimProducts
  FOREIGN KEY (ProductKey) REFERENCES DimProducts(ProductKey); -- The Single Parent Table

 
ALTER TABLE DWNorthwindLite.dbo.FactOrders
  ADD CONSTRAINT FK_FactOrders_DimCustomers
  FOREIGN KEY (CustomerKey) REFERENCES DimCustomers(CustomerKey)

ALTER TABLE DWNorthwindLite.dbo.FactOrders
  ADD CONSTRAINT FK_FactOrders_DimDates
  FOREIGN KEY (OrderDateKey) REFERENCES DimDates(DateKey)

Step 8 Test



--********************************************************************--
-- Review the results of this script
--********************************************************************--
Select * from [dbo].[DimProducts];
Select * from [dbo].[DimCustomers];
Select * from [dbo].[DimDates];
Select * from [dbo].[FactOrders];




Additional Resources:
Here are a few additional resources you may want to check out when you have time, but they are not required.


No comments:

Post a Comment