Tim Pauley
BIDATA 201b Assignment 05
Assignment due: Nov 7
Blog location
Assignment
05: ETL Coding
In this “hands on” activity you will learn about creating ETL
code to fill a Data Warehouse using Microsoft’s SQL Server. This activity will take you about 3 to 6
hours, so plan accordingly!
The activity consists of the follow steps:
- Watch the module videos.
- Read the 6th chapter of your textbook.
- Perform the Exercises in the textbook.
- Review the Instructor’s BI solution spreadsheet.
- Update the Instructor’s BI solution spreadsheet.
- Create your own SQL ETL script.
- Update your Blog. (optional)
- Submit your work to the Canvas web site.
Watch the
module videos
Please watch the video I created about performing this
homework. This can be found under Module 05 on the Canvas website.
Read the sixth
chapter of your textbook.
In chapter 6 you will learn how to create an ETL Script.
This script will then be used to fill a data warehouse.
Perform the
Exercises in the textbook.
EXERCISE 6-1. CREATING AN ETL
SCRIPT
In this exercise, you create the ETL code
needed to fill the Publication Industries data warehouse. You can choose either
to type the SQL code presented here or to use the Query Designer tool to
accomplish your goal. We recommend using both.
Review the
Instructor’s BI solution spreadsheet.
Please review the current data warehouse design given to you
by your project manager (me, your instructor). I have provided you with my new version of the
spreadsheet, called “Instructors BISolutionWorksheets.xlsx”, so please use that
version for the assignment. Make special note of the new column I have added.
data:image/s3,"s3://crabby-images/61b78/61b7801beaf0e747d310c3707a57d5847c79f01e" alt=""
Update the
Instructor’s BI solution spreadsheet.
Now that you have examples of the code you will need to
perform the ETL process, you need to update the “Instructors
BISolutionWorksheets.xlsx” with the new information as show here.
data:image/s3,"s3://crabby-images/927ac/927aca8e7c367dc5a9f4853d7fccc59147b69c54" alt=""
Review the
instructors STARTER ETL script
To help you get started with the ETL scripting process I
have created a starter script called “04_StarterETLScript.sql.” Open this
file and add your own code ETL code to fill up the DimCustomers, DimDates, and
FactOrders table.
IMPORTANT: Run the scripts 01_StarterScript
Create the OLTP DB.sql and 02_SetupScript Create the DW DB.sql
first, so that your data warehouse and OLTP database are consistent with your
instructor’s version of these databases!
data:image/s3,"s3://crabby-images/199e6/199e633011f80d8dd333337b0aac947ba62df560" alt=""
Create your
own SQL ETL script
Create a SQL script that will create all the dimension
tables, fact tables, and foreign key constraints as outlined in the
instructor’s Mod5_BISolutionWorksheets.xlsx file.
Remember that your code will look very similar to what was
shown in the book.
data:image/s3,"s3://crabby-images/eea34/eea34baac617bf6ecb004cdb1d57f54759d4123b" alt=""
Update your
Blog (OPTIONAL).
If you have been blogging, now it the time to update your
Blog on how you created your SQL ETL script. If not, you can skip this part!
Submit a SQL
final ETL script to the class web site
After you have created your final SQL ETL Script file,
upload it to the Canvas web site.
Note: For you to
get full points, I must be able to run your script on my PC using the provided
source and destination databases. So, test that your code runs as a complete
script!
Step 1 Update excel spreadsheet with transformation and etl functions
To complete this task I needed to transform the Object name
OLTP into ETL function that would be used for the OLAP. Exercise 6 helped with
this task. My conversion worksheet is below
Step 2 Ran create the oltp db
The instructor provided a script that created tables and
relations. Ran in SSMS.
/**************************************************************
Data Warehouse Design LAB
(STEP 1) Create a lite version of the Northwind database
(STEP 2) Create a new Data Warehouse called DWNorthwindLite
a) The DW should have 3 regular dimensions
and 1 fact dimensions
b) The DW should have one fact table and 2
measures
*************************************************************/
/***** (STEP 1) Create a lite version of the Northwind database
*****/
-- Run this script to create the database you will need for the
lab
If Exists(Select name from master.dbo.sysdatabases Where Name = 'NorthwindLite')
Begin
USE
[master];
ALTER DATABASE [NorthwindLite] SET
SINGLE_USER WITH
ROLLBACK IMMEDIATE;
DROP DATABASE [NorthwindLite];
End;
Go
Create Database
NorthwindLite;
Go
USE NorthwindLite;
Go
SELECT
[ProductID]
,[ProductName]
,[CategoryID]
INTO [Products]
FROM [Northwind].[dbo].[Products];
Go
SELECT
[CategoryID]
,[CategoryName]
INTO [Categories]
FROM [Northwind].[dbo].[Categories];
Go
SELECT
[CustomerID]
,[CompanyName]
,[ContactName]
,[Address]
,[City]
,[Country]
INTO [Customers]
FROM [Northwind].[dbo].[Customers];
Go
SELECT
[OrderID]
,[CustomerID]
,[OrderDate]
INTO [Orders]
FROM [Northwind].[dbo].[Orders];
Go
SELECT [OrderID]
,[ProductID]
,[UnitPrice]
,[Quantity]
INTO [OrderDetails]
FROM [Northwind].[dbo].[Order Details];
Go
-- Add Primary keys
ALTER TABLE dbo.Products ADD CONSTRAINT
PK_Products PRIMARY
KEY CLUSTERED (ProductID );
ALTER TABLE dbo.Categories ADD CONSTRAINT
PK_Categories PRIMARY
KEY CLUSTERED ( CategoryID );
ALTER TABLE dbo.Customers ADD CONSTRAINT
PK_Customers PRIMARY
KEY CLUSTERED ( CustomerID );
ALTER TABLE dbo.Orders ADD CONSTRAINT
PK_Orders PRIMARY
KEY CLUSTERED ( OrderId );
ALTER TABLE dbo.OrderDetails ADD CONSTRAINT
PK_OrderDetais PRIMARY
KEY CLUSTERED ( OrderId, ProductId );
Go
-- Add For
ALTER TABLE dbo.Products ADD CONSTRAINT
FK_Products_Categories FOREIGN KEY( CategoryID )
REFERENCES
dbo.Categories(
CategoryID );
ALTER TABLE dbo.OrderDetails ADD CONSTRAINT
FK_OrderDetails_Products FOREIGN KEY ( ProductID )
REFERENCES
dbo.Products (
ProductID );
ALTER TABLE dbo.OrderDetails ADD CONSTRAINT
FK_OrderDetails_Orders FOREIGN KEY ( OrderID )
REFERENCES
dbo.Orders ( OrderID ) ;
ALTER TABLE dbo.Orders ADD CONSTRAINT
FK_Orders_Customers FOREIGN KEY (
CustomerID )
REFERENCES
dbo.Customers ( CustomerID );
Go
Select 'New database made with
these objects:'
Select Name From SysObjects Where
xtype in ('u', 'pk', 'f')
SELECT [TABLE_NAME]
,[COLUMN_NAME]
,[IS_NULLABLE]
,[DATA_TYPE]
,[CHARACTER_MAXIMUM_LENGTH]
,[NUMERIC_PRECISION]
,[NUMERIC_SCALE]
FROM [NorthwindLite].[INFORMATION_SCHEMA].[COLUMNS];
Go
/* (STEP 2) Create a new Data Warehouse DWNorthwindLite
a) The DW should have 3 regular dimensions
and 1 fact dimensions
b) The DW should have one fact table and 2
measures
*/
Step 3 Ran create the dw db
The instructor provided a script that created tables and
relations. Ran in SSMS.
/**************************************************************
Data Warehouse Design LAB
(STEP 1) Create a lite version of the Northwind database
(STEP 2) Create a new Data Warehouse called DWNorthwindLite
a) The DW should have 3 regular dimensions
and 1 fact dimensions
b) The DW should have one fact table and 2
measures
*************************************************************/
/***** (STEP 1) Create a lite version of the Northwind database
*****/
-- Run this script to create the database you will need for the
lab
If Exists(Select name from master.dbo.sysdatabases Where Name = 'NorthwindLite')
Begin
USE
[master];
ALTER DATABASE [NorthwindLite] SET
SINGLE_USER WITH
ROLLBACK IMMEDIATE;
DROP DATABASE [NorthwindLite];
End;
Go
Create Database
NorthwindLite;
Go
USE NorthwindLite;
Go
SELECT
[ProductID]
,[ProductName]
,[CategoryID]
INTO [Products]
FROM [Northwind].[dbo].[Products];
Go
SELECT
[CategoryID]
,[CategoryName]
INTO [Categories]
FROM [Northwind].[dbo].[Categories];
Go
SELECT
[CustomerID]
,[CompanyName]
,[ContactName]
,[Address]
,[City]
,[Country]
INTO [Customers]
FROM [Northwind].[dbo].[Customers];
Go
SELECT
[OrderID]
,[CustomerID]
,[OrderDate]
INTO [Orders]
FROM [Northwind].[dbo].[Orders];
Go
SELECT [OrderID]
,[ProductID]
,[UnitPrice]
,[Quantity]
INTO [OrderDetails]
FROM [Northwind].[dbo].[Order Details];
Go
-- Add Primary keys
ALTER TABLE dbo.Products ADD CONSTRAINT
PK_Products PRIMARY
KEY CLUSTERED (ProductID );
ALTER TABLE dbo.Categories ADD CONSTRAINT
PK_Categories PRIMARY
KEY CLUSTERED ( CategoryID );
ALTER TABLE dbo.Customers ADD CONSTRAINT
PK_Customers PRIMARY
KEY CLUSTERED ( CustomerID );
ALTER TABLE dbo.Orders ADD CONSTRAINT
PK_Orders PRIMARY
KEY CLUSTERED ( OrderId );
ALTER TABLE dbo.OrderDetails ADD CONSTRAINT
PK_OrderDetais PRIMARY
KEY CLUSTERED ( OrderId, ProductId );
Go
-- Add For
ALTER TABLE dbo.Products ADD CONSTRAINT
FK_Products_Categories FOREIGN KEY( CategoryID )
REFERENCES
dbo.Categories(
CategoryID );
ALTER TABLE dbo.OrderDetails ADD CONSTRAINT
FK_OrderDetails_Products FOREIGN KEY ( ProductID )
REFERENCES
dbo.Products (
ProductID );
ALTER TABLE dbo.OrderDetails ADD CONSTRAINT
FK_OrderDetails_Orders FOREIGN KEY ( OrderID )
REFERENCES
dbo.Orders ( OrderID ) ;
ALTER TABLE dbo.Orders ADD CONSTRAINT
FK_Orders_Customers FOREIGN KEY (
CustomerID )
REFERENCES
dbo.Customers ( CustomerID );
Go
Select 'New database made with
these objects:'
Select Name From SysObjects Where
xtype in ('u', 'pk', 'f')
SELECT [TABLE_NAME]
,[COLUMN_NAME]
,[IS_NULLABLE]
,[DATA_TYPE]
,[CHARACTER_MAXIMUM_LENGTH]
,[NUMERIC_PRECISION]
,[NUMERIC_SCALE]
FROM [NorthwindLite].[INFORMATION_SCHEMA].[COLUMNS];
Go
/* (STEP 2) Create a new Data Warehouse DWNorthwindLite
a) The DW should have 3 regular dimensions
and 1 fact dimensions
b) The DW should have one fact table and 2
measures
*/
Step 4 Review ETL script
The instructor provided a script that created tables and
relations. Ran in SSMS.
**Important: Test from Data Warehouse NOT FROM OLTP
Testing examples in YouTube video at 6:49 mark
/**************************************************************
Create the Data Warehouse
*************************************************************/
--****************** [DWNorthwindLite] *********************--
-- This file contains ETL code for use with the
-- [DWNorthwindLite] database.
--****************** Your Version ***************************--
USE [DWNorthwindLite];
Go
--********************************************************************--
-- 1) Drop the FOREIGN KEY CONSTRAINTS and Clear the tables
--********************************************************************--
Alter Table
[DWNorthwindLite].dbo.FactOrders
-- The Many Child Table
Drop Constraint [FK_FactOrders_DimProducts];
Truncate Table
[DWNorthwindLite].dbo.DimProducts; -- The Single Parent
Table
--********************************************************************--
-- 2) FILL the Tables
--********************************************************************--
/****** [dbo].[DimProducts] ******/
INSERT INTO
[DWNorthwindLite].dbo.DimProducts
SELECT
[ProductID] =
Products.ProductID
,[ProductName]
= CAST(Products.ProductName
as nVarchar(100))
,[ProductCategoryID]
= Products.CategoryID
,[ProductCategoryName]
= CAST(Categories.CategoryName
as nVarchar(100))
,[StartDate] = 20000101
,[EndDate] = Null
,[IsCurrent] = 'Yes'
FROM [NorthwindLite].dbo.Categories
INNER JOIN
[NorthwindLite].dbo.Products
ON Categories.CategoryID
= Products.CategoryID;
Go
/****** [dbo].[DimCustomers] ******/
Go
/****** [dbo].[DimDates] ******/
Go
/****** [dbo].[FactOrders] ******/
Go
--********************************************************************--
-- 3) Re-Create the FOREIGN KEY CONSTRAINTS
--********************************************************************--
ALTER TABLE
DWNorthwindLite.dbo.FactOrders
-- The Many Child Table
ADD CONSTRAINT FK_FactOrders_DimProducts
FOREIGN KEY (ProductKey) REFERENCES
DimProducts(ProductKey);
-- The Single Parent Table
--********************************************************************--
-- Review the results of this script
--********************************************************************--
Select * from [dbo].[DimProducts];
Select * from [dbo].[DimCustomers];
Select * from [dbo].[DimDates];
Select * from [dbo].[FactOrders];
Step 5 Drop the FOREIGN KEY CONSTRAINTS and Clear the tables
I needed to fill in this code. Than execute
/**************************************************************
Create the Data Warehouse
*************************************************************/
--****************** [DWNorthwindLite] *********************--
-- This file contains ETL code for use with the
-- [DWNorthwindLite] database.
--****************** Tim Pauley Assignment 05
***************************--
USE [DWNorthwindLite];
Go
--********************************************************************--
-- 1) Drop the FOREIGN KEY CONSTRAINTS and Clear the tables
--********************************************************************--
Alter Table
[DWNorthwindLite].dbo.FactOrders
-- The Many Child Table
Drop Constraint [FK_FactOrders_DimProducts];
Alter Table
[DWNorthwindLite].dbo.FactOrders
-- The Many Child Table
Drop Constraint [FK_FactOrders_DimCustomers];
Alter Table
[DWNorthwindLite].dbo.FactOrders
-- The Many Child Table
Drop Constraint [FK_FactOrders_DimDates];
Truncate Table
[DWNorthwindLite].dbo.DimProducts; -- The Single Parent
Table
Truncate Table
[DWNorthwindLite].dbo.DimCustomers; -- The Single Parent
Table
Truncate Table
[DWNorthwindLite].dbo.DimDates; -- The Single Parent Table
Step 6 FILL the Tables
--********************************************************************--
-- 2) FILL the Tables
--********************************************************************--
/****** [dbo].[DimProducts] ******/
INSERT INTO
[DWNorthwindLite].dbo.DimProducts
SELECT
[ProductID] =
Products.ProductID
,[ProductName]
= CAST(Products.ProductName
as nVarchar(100))
,[ProductCategoryID]
= Products.CategoryID
,[ProductCategoryName]
= CAST(Categories.CategoryName
as nVarchar(100))
,[StartDate] = 20000101
,[EndDate] = Null
,[IsCurrent] = 'Yes'
FROM [NorthwindLite].dbo.Categories
INNER JOIN
[NorthwindLite].dbo.Products
ON Categories.CategoryID
= Products.CategoryID;
Go
--SELECT * FROM [DWNorthwindLite].dbo.DimProducts --test
DimProducts Table
/****** [dbo].[DimCustomers] ******/
INSERT INTO
[DWNorthwindLite].dbo.DimCustomers
SELECT
[CustomerID] =
Customers.CustomerID
,[CustomerName]
= CAST(Customers.CompanyName
as nVarchar(100))
,[CustomerCity]
= CAST(Customers.City as nVarchar(100))
,[CustomerCountry]
= CAST(Customers.Country as nVarchar(100))
,[StartDate] = -1
,[EndDate] = Null
,[IsCurrent] = 'Yes'
FROM [NorthwindLite].dbo.Customers
Go
--SELECT * FROM [DWNorthwindLite].dbo.DimCustomers --test DimCustomers Table
/****** [dbo].[DimDates] ******/
-- Create variables to hold the start and end date
Declare @StartDate datetime
= '01/01/1990'
Declare @EndDate datetime
= '12/31/1999'
-- Use a while loop to add dates to the table
Declare @DateInProcess datetime
Set @DateInProcess =
@StartDate
While @DateInProcess <=
@EndDate
Begin
-- Add a row into the date dimension table for this date
Insert Into
[DWNorthwindLite].dbo.DimDates
( [DateKey], [USADateName],
[MonthKey], [MonthName],
[QuarterKey],
[QuarterName], [YearKey],
[YearName] )
Values (
Cast(Convert(nvarchar(50), @DateInProcess ,
112) as int) -- [DateKey]
, DateName( weekday, @DateInProcess ) + ', ' + Convert(nvarchar(50), @DateInProcess , 112) -- [USADateName]
, Left(Cast(Convert(nvarchar(100), @DateInProcess ,
112) as int), 6) -- [MonthKey]
, DateName( MONTH, @DateInProcess ) + ', ' + Cast( Year(@DateInProcess ) as nVarchar(100) ) -- [MonthName]
, Cast(Cast(YEAR(@DateInProcess) as nvarchar(50)) + '0' + DateName( QUARTER, @DateInProcess) as int) -- [QuarterKey]
, 'Q' + DateName( QUARTER, @DateInProcess ) + ', ' + Cast( Year(@DateInProcess) as nVarchar(100) ) -- [QuarterName]
, Year( @DateInProcess ) -- [YearKey]
, Cast( Year(@DateInProcess ) as nVarchar(100) ) -- [YearName]
)
-- Add a day and loop again
Set @DateInProcess = DateAdd(d, 1, @DateInProcess)
End
-- 2e) Add additional lookup values to DimDates
Insert Into
[DWNorthwindLite].[dbo].[DimDates]
( [DateKey]
, [USADateName]
, [MonthKey]
, [MonthName]
, [QuarterKey]
, [QuarterName]
, [YearKey]
, [YearName] )
Select
[DateKey]
= -1
, [DateName] = Cast('Unknown Day' as nVarchar(50) )
, [Month] = -1
, [MonthName] = Cast('Unknown Month' as nVarchar(50) )
, [Quarter] = -1
, [QuarterName] = Cast('Unknown Quarter' as nVarchar(50) )
, [Year] = -1
, [YearName] = Cast('Unknown Year' as nVarchar(50) )
Union
Select
[DateKey]
= -2
, [DateName] = Cast('Corrupt Day' as nVarchar(50) )
, [Month] = -2
, [MonthName] = Cast('Corrupt Month' as nVarchar(50) )
, [Quarter] = -2
, [QuarterName] = Cast('Corrupt Quarter' as nVarchar(50) )
, [Year] = -2
, [YearName] = Cast('Corrupt Year' as nVarchar(50) )
Go
--SELECT * FROM
[DWNorthwindLite].[dbo].[DimDates] -- Test Dim Dates
/****** [dbo].[FactOrders] ******/
INSERT INTO
[DWNorthwindLite].dbo.FactOrders
SELECT
[OrderID]
= Orders.OrderID
, [CustomerKey] =
DimCustomers.CustomerKey
, [OrderDateKey] =
DimDates.DateKey
, [ProductKey] =
DimProducts.ProductKey
, [ActualOrderUnitPrice] =
OrderDetails.UnitPrice
, [ActualOrderQuantity] =
OrderDetails.Quantity
FROM [DWNorthwindLite].dbo.DimCustomers
INNER JOIN [NorthwindLite].dbo.Orders
ON DimCustomers.CustomerID
= Orders.CustomerID
INNER JOIN
[NorthwindLite].dbo.OrderDetails
ON Orders.OrderID = OrderDetails.OrderID
INNER JOIN
[DWNorthwindLite].dbo.DimProducts
ON OrderDetails.ProductID
= DimProducts.ProductID
INNER JOIN
[DWNorthwindLite].dbo.DimDates
On DimDates.DateKey
= isNull(Convert(nvarchar(50), Orders.OrderDate, 112),
'-1')
Go
Step 7 Re-Create the FOREIGN KEY CONSTRAINTS
I needed to fill in this code. Than execute
--********************************************************************--
-- 3) Re-Create the FOREIGN KEY CONSTRAINTS
--********************************************************************--
ALTER TABLE
DWNorthwindLite.dbo.FactOrders
-- The Many Child Table
ADD CONSTRAINT FK_FactOrders_DimProducts
FOREIGN KEY (ProductKey) REFERENCES
DimProducts(ProductKey);
-- The Single Parent Table
ALTER TABLE
DWNorthwindLite.dbo.FactOrders
ADD CONSTRAINT
FK_FactOrders_DimCustomers
FOREIGN KEY (CustomerKey) REFERENCES DimCustomers(CustomerKey)
ALTER TABLE
DWNorthwindLite.dbo.FactOrders
ADD CONSTRAINT
FK_FactOrders_DimDates
FOREIGN KEY (OrderDateKey) REFERENCES DimDates(DateKey)
Step 8 Test
--********************************************************************--
-- Review the results of this script
--********************************************************************--
Select * from [dbo].[DimProducts];
Select * from [dbo].[DimCustomers];
Select * from [dbo].[DimDates];
Select * from [dbo].[FactOrders];
Additional
Resources:
Here are a few additional resources you may want to check
out when you have time, but they are not required.
No comments:
Post a Comment