Hyperion Essbase 9.3.1 – OBE Series – Designing Essbase Databases – BSO Outlines & Data Load – Global Schema – Part 4 – Loading Data
Posted by Venkatakrishnan J on December 17, 2008
In the last 3 blog entries, we saw how to go about loading and designing the dimensions for a block storage cube. Today we shall see how to go about loading data. A block storage cube can accept input data at any level. The data need not always be at the lowest level. The process of loading the data into Essbase is again done through rule files. As a general rule, lets create one rule file for each measure (UNITS, UNIT_PRICE & UNIT_COST). Generally the number of rule files is determined by how your SQL source is structured. The SQL for the rule file(for UNITS) in our case is given below
SELECT MONTH_DSC, 'Units', CHANNEL_DSC, ITEM_DSC, SHIP_TO_DSC, SUM(UNITS) DATA FROM UNITS_FACT UF, TIME_DIM TD, CHANNEL_DIM CD, PRODUCT_DIM PD, CUSTOMER_DIM CD1 WHERE UF.MONTH_ID = TD.MONTH_ID AND UF.CHANNEL_ID = CD.CHANNEL_ID AND UF.ITEM_ID = PD.ITEM_ID AND UF.SHIP_TO_ID = CD1.SHIP_TO_ID GROUP BY MONTH_DSC, CHANNEL_DSC, ITEM_DSC, SHIP_TO_DSC
As a general rule, it is recommended to have columns in the select statement arranged in the same order as the dimensions in the Outline. This will ensure that the data load is fast. The process of creating the rule files remain the same as for the dimensions. The major difference lies in property settings. Lets start with creating a new rule file. Go to Open SQL and enter the above SQL. Then ensure that the Data Load Property is set.
The next step is to make sure that each and every column is properly tagged to the corresponding dimension as shown below
This needs to be set for all the columns apart from the data. The last data column needs to set with a data property. Also, by default in a BSO cube if the existing cells have data they would be overwritten. There are properties which can change this default property. This is shown below
Once this is done, validate the rule file and save it. Now, lets load the data from this rule file using the same approach as before. The only difference in the data load is that, while doing the data load we need to choose the load only option.
This would load all the data in the Essbase Cube. This can be verified from the database properties. You would see that the Input Level 0 blocks would have data.
The various statistics above are very important while loading an Essbase Cube. In our case, we have a block size of 3K which is on the lower side. This can be increased further by making another dimension as dense. But i would not do that in this blog entry. Also, our Block density is Ok. Anything around 25% block density is considered very good. This density would reduce as we have not aggregated the cube yet. Also, the Index hit ratio of 1 is very good. So, this means that Index Cache size is being utilized completely and there is scope for increasing this value further. Before we look at Calculation Scripts lets create the remaining 2 rule files to load the UNIT_PRICE and UNIT_COST measures. The SQL for both the rule files are provided below.
SELECT MONTH_DSC, 'Unit Price', 'Channels', ITEM_DSC, 'Customer', SUM(UNIT_PRICE) DATA FROM PRICE_AND_COST_FACT UF, TIME_DIM TD, PRODUCT_DIM PD WHERE UF.MONTH_ID = TD.MONTH_ID AND UF.ITEM_ID = PD.ITEM_ID GROUP BY MONTH_DSC, ITEM_DSC
SELECT MONTH_DSC, 'Unit Cost', 'Channels', ITEM_DSC, 'Customer', SUM(UNIT_COST) DATA FROM PRICE_AND_COST_FACT UF, TIME_DIM TD, PRODUCT_DIM PD WHERE UF.MONTH_ID = TD.MONTH_ID AND UF.ITEM_ID = PD.ITEM_ID GROUP BY MONTH_DSC, ITEM_DSC
Since UNIT_PRICE and UNIT_COST have the grain of only two dimensions, we need to load the data against Topmost node for all the other dimensions(non-grain dimensions). Thats the reason why the Topmost node of non-grain dimensions have been hardcoded in the above query. The screenshots below would act as a reference for creating the rule files.
Now, lets load the data from these 2 rule files to our cube again. Once the data load is done, let us look at the statistics again
As you see, the number of upper level blocks have increased since we are loading the data directly to the Topmost node of Channels and Customer Dimensions. Now, the next step is to aggregate the cube. Aggregation in Block Storage cubes are done through scripts called as calculation scripts. The major advantage of these scripts is that they provided the flexibility to aggregate any portion of the cube. So, we shall create a single calculation script which will aggregate UNITS across all the dimension and UNIT_PRICE, UNIT_COST across 2 dimension. The calculation script is provided below
SET CACHE HIGH; SET CALCPARALLEL 2; FIX ("Units") CALC DIM ("Time"); AGG("Channels"); AGG("Product"); AGG("Customer"); ENDFIX; FIX("Unit Price","Unit Cost","Channels","Customer") CALC DIM("TIME"); AGG("Product"); ENDFIX;
Basically what this does, it calculates all the dimensions for the Units measure. But for the Unit Price and Unit Cost measure it calculates only the Product and Time dimensions.
Now, lets execute this calculation and look at the database properties.
As you see, the number of upper level blocks have increased. Also the block density has gone up which is good. Now as the last step let us look at the data.