Search This Blog

Saturday, December 28, 2013

Data mart and dwh

Datawarehouse and DataMart:Definition & Difference

Data warehousing and Data mart are tools used in data storage.With passage of time, small companies become big & this is when they realize that they have amassed huge amounts of data in various departments of the organization. Every department has its own database that works well for that department. But when organizations intend to sort data from various departments for sales, marketing or making plans for future, the process is referred to as Data Mining. Data Warehousing and Data Marts are two tools that help companies in this regard. Just what the difference between data warehousing and data marts is and how they compare with each other is what this article intends to explain.

Data Warehousing
A data warehouse is a collection of data marts representing historical data from different operations in the company. This data is stored in a structure optimized for querying and data analysis as a data warehouse. Table design, dimensions and organization should be consistent throughout a data warehouse so that reports or queries across the data warehouse are consistent. A data warehouse can also be viewed as a database for historical data from different functions within a company.This is the place where all the data of a company is stored. It is actually a very fast computer system having a large storage capacity. It contains data from all the departments of the company where it is constantly updated to delete redundant data. This tool can answer all complex queries pertaining data.

Data Mart
A data mart is a segment of a data warehouse that can provide data for reporting and analysis on a section, unit, department or operation in the company, e.g. sales, payroll, production. Data marts are sometimes complete individual data warehouses which are usually smaller than the corporate data warehouse.It is an indexing and extraction system. Instead of putting the data from all the departments of a company into a warehouse, data mart contains database of separate departments and can come up with information using multiple databases when asked.
IT managers of any growing company are always confused as to whether they should make use of data marts or instead switch over to the more complex and more expensive data warehousing. These tools are easily available in the market, but pose a dilemma to IT managers.

Difference between Data Warehousing and Data Mart
It is important to note that there are huge differences between these two tools though they may serve same purpose. Firstly, data mart contains programs, data, software and hardware of a specific department of a company. There can be separate data marts for finance, sales, production or marketing. All these data marts are different but they can be coordinated. Data mart of one department is different from data mart of another department, and though indexed, this system is not suitable for a huge data base as it is designed to meet the requirements of a particular department.
Data Warehousing is not limited to a particular department and it represents the database of a complete organization. The data stored in data warehouse is more detailed though indexing is light as it has to store huge amounts of information. It is also difficult to manage and takes a long time to process. It implies then that data marts are quick and easy to use, as they make use of small amounts of data. Data warehousing is also more expensive because of the same reason.
Summary
• Data mart and data warehousing are tools to assist management to come up with relevant information about the organization at any point of time
• While data marts are limited for use of a department only, data warehousing applies to an entire organization
• Data marts are easy to design and use while data warehousing is complex and difficult to manage

Friday, December 27, 2013

DWH Questions

What is an ODS?
An Operational Data store is a construct, which has been designed to overcome the need for Information (EIS) in the Operational area. It is similar to a data warehouse but differs in many respects. Sitting in between the operational, transaction processing environment and the enterprise data warehouse is this structure called the ODS. The ODS provides foundation to achieve tangible integrated operational results in a short time frame. While the data warehouse offers no relief to the organization struggling with nonintegrated operational systems, the ODS offers immediate relief.

What is a Federated Data Warehouse?
Most techniques used by organizations to build a data warehousing system employ either a top-down or bottom-up development approach. In the top-down approach, an
Most techniques used by organizations to build a data warehousing system
employ either a top-down or bottom-up development approach.
In the top-down approach, an enterprise data warehouse (EDW) is built in an iterative manner
and underlying dependent data marts are created as required. In the bottom-up approach,
 independent data marts are created with the view to integrating them into an enterprise
data warehouse at some time in the future. There are many pros and cons of the two approaches
but there is a steady trend toward the use of independent data marts, especially with the move toward
the use of turnkey analytic application packages.

A solution must offer low cost and rapid ROI advantages of the independent data mart approach
without problems of data integration in the future. Such a solution is called the federated data warehouse.
Two components of a federated data warehouse are the common business model and shared information
staging areas.

What is a Virtual Data Warehouse?
When end-users access the “system of record” (the OLTP system) directly and generate “summarized data” reports and thereby given the feel of a “data warehouse”, such a data warehouse is known as a “virtual data warehouse”.

What is Top-Down architecture of Warehouse development?
The interaction associated with the architecture begins with an Extraction, Transformation, Migration, and Loading (ETML) process working from legacy and/or external data sources. Extraction transformation, and migration, process data from these sources and output it to a centralized Data Staging Area. Following this, data and metadata are loaded into the Enterprise Data Warehouse and the centralized metadata repository. Once these are constituted, Data Marts are created from summarized data warehouse data and metadata.

What is Bottom-up architecture of Warehouse development?
The second data warehousing systems architecture, the "Bottom-up" architecture became popular because 
the Top-down architecture took too long to implement, was often politically unacceptable, 
and was too expensive.

What is Normalization, First Normal Form, Second Normal Form , Third Normal Form?
1.Normalization is process for assigning attributes to entities–Reduces data redundancies–
Helps eliminate data anomalies–Produces controlled redundancies to link tables
 2.Normalization is the analysis of functional dependency between attributes / 
data items of user views,It reduces a complex user view to a set of small and stable subgroups of fields / relations
1NF: Repeating groups must be eliminated, Dependencies can be identified, 
All key attributes defined,No repeating groups in table
2NF: The Table is already in1NF, includes no partial dependencies–
No attribute dependent on a portion of primary key, Still possible to exhibit transitive dependency,
 Attributes may be functionally dependent on non-key attributes.
3NF: The Table is already in 2NF, Contains no transitive dependencies

Why do we need a different schema for Data warehousing/Business Intelligence?
 A data warehouse stores historical data that is collected from various sources and the data is time variant. 
The data warehouse is updated in a scheduled and controlled manner. An online system usually does not 
store historical data, however if history data is stored it would pose severe performance issues, 
further being an Online environment 2 users accessing the same set of information are 
likely to derive different results. An online system would provide data only from a single source, 
whereas a data warehouse usually has data being put in from various sources.

What are the various features of Enterprise Data Warehouse?
 The various features of an EDW are:
 An EDW may or may not interact with Data Mart. The most important aspect of any 
Data Warehouse is the consistency of information.
An EDW may have same level of granularity as a Data Mart. 
This means that the data from the Data Mart will be dumped into an EDW.
An EDW may have more detailed level of data than a Data Mart. 
This means that the data requirement 
is at a more granular level that may not be catered by the Data Mart.

Tuesday, December 17, 2013

What are the primary differences between the OBIEE 10g and 11g security models

What are the primary differences between the OBIEE 10g and 11g security models and what happens during upgrade?
Security Task/ObjectOBIEE 10gOBIEE 11gWhat happens during upgrade from 10g to 11g?
Define Users and Groups in RPD file using OBIEE Admin ToolDefaultN/A. By default, users are defined in embedded WLS LDAP via FMW EM Console, or alternatively, in external LDAP.By default, existing users and groups migrated to embedded WLS LDAP. Existing groups are automatically mapped to an Application role.
Defining security policiesPolicies in the catalog and repository can be defined to reference groups within a directory.Policies are defined in terms of application roles, which map to users and groups in a directory.10g catalog groups are automatically migrated in the upgraded catalog and assigned the same privileges, access and membership.
“Administrator” userUnique user with full administrative privileges.No single user named tor full administrative privileges. Administration can be performed by any user who is member of BIAdministrators group.“Administrator” user automatically added as member of “BIAdministrators” group in embedded WLS LDAP and granted Administrator role. The user specified during OBIEE 11g installation (i.e. “weblogic”, “biadmin”) is also a member of the BIAdministrators group.
Repository EncryptionAvailable on sensitive elements only - i.e. user passwords, connection pool passwords, etc.Entire RPD encrypted via a password.Prompted to set a repository password while running the upgrade assistant. Do not lose this password as there is no feature to recover a lost password.
External Authentication and OBIEE Initialization (Init) BlocksInit blocks are requiredfor external PDAP or external table authentication.Init blocks not required for WLS embedded LDAP. Init blocks are required for external LDAP or external table authentication.Upgraded RPD will continue to point to 10g LDAP or external tables. Initblocks may need to be modified to ensure that depreciated, or reserved word, variable names are renamed. NOTE: If you intend to use another LDAP server, such at Oracle Identity Management (OID), then you must upgrade to the embedded LDAP server. Please see Upgrade Guide for further details.
Catalog GroupsDefined in Presentation Server Administration linkAvailable for backward compatibility. Use of Application Roles in FMW EM Console recommended.Existing groups will be migrated. Recommendation is to use application roles instead. Privileges on catalog objects may be granted to an application role via BI Presentation server Administration link.
SA System Subject AreaOptionalAvailable for backward compatibility and requires init blocks and external tables. Use of Embedded LDAP is recommended.Upgraded 10g RPD will point to external tables, Initblocks may need to be modified to ensure that depreciated, or reserved word, variable names are renamed.
“Everyone” Presentation Server GroupDefaultReplaced with AuthenticatedUser role.“Everyone” group migrated to AuthenticatedUser role.

Wednesday, December 11, 2013

DAC FAQ...

Oracle DAC is an essential part of BI Apps, which is seldom being introduced in a systematic training course, although we use them all the time. There can be quite a lot of things to ask about when it comes to working with DAC, especially during interviews for BI Apps related projects. So I am going to gather some of the common interview questions with regard to DAC.

1. Name some of the DAC source system parameters:
TYPE2_FLAG, GLOBOL1_CURR_CODE, Initial_extract_date etc.. (The goal is just to name a few and of course, nobody remembers exactly the spelling)

2. To configure for initial full load, what are the things that needs to be done:
A, in DAC, set the value for initial_extract_date to avoid loading way too many data into target
B, to load base table W_DAY_D, nullify all of the refresh date to enable to full load. Do the same for all other aggregated time table like W_Week_D etc. At each task level where day dimension is being part of (SIL_daydimension), set the $$start date and $$end date parameter values at the task level to determine how long period your day dimension should store.
C. If your company does have multiple currency, then you need to configure currency in DAC by assigning currency code and exchange rate to DAC parameters like globol1 (2,3)_curr_code and globol1 (2,3)_curr_rate_type. BI Apps support up to 3 types of currency.
D. Configure GL Hierarchy so the info stores in W_Hierarchy_D. No DAC configuration needed
E. DATASOURCE_NUM_ID is a DAC parameters that determine which datasource system the extraction is taking place. In physical data source tab under 'setup' view, this field can be edited with integer number from 1 to 10 to represent different DB source.

3. Explain how to set up metadata in DAC to load data into the target

For basic intro on how DAC work in terms of executing the tasks, find out here

4. How to configure incremental loading in DAC
A. The refresh date under physical data source stores the last ETL run time, by nullifying this, the DAC will run full load or it will run incremental load based on the refresh date value.
B. Under task, there is 'incremental load' commend, by checking this, it will do either full load or incremental load regardless of refresh date.


-------------------------------------------------------------------------------

Below are the list of questions about DAC found through googling, since these questions have NOT been provided with answers, I have provided my answers, feel free to read it for your reference:

1. Over all architecture of DAC ?
DAC server and DAC Client. They must co-locate with Informatica Integration service, repository service and Informatica repository


2. Why we should use DAC and not control all execution through informatica ?
For better performance management, such as creating index, dropping index, truncating before load. Without DAC a custom ETL process will be needed, which has to survive the upgrate

3. Can we run multiple execution plan at the same time in DAC ?
Yes. only if the execution plan are not loading into the same table or using the same phyiscal table source

4. Explain DAC export/import
A way to import or export DAC repository metadata for upgrade or backup. Logica, System, runtime objects can be import/export

5. Have you change any of the DAC parameters ? If so which one and why ?
You have to understand what are the DAC parameters and the purpose of each. For example, Initial_extract_date can be modified when configure for initial full load, so the value for initial extract date will be used to filter out records from the source that are older than this date.

6. How do you Determine the Informatica Server Maximum Sessions Parameter Setting in DAC?
One you register informatica server in Dac client

7. Can dac send an email just in case of any failures ?
In DAC Client, toolbar, click email recipient, then in Tools--> DAC Server setup, Email configuration

8. Can you execute the sql scrip through DAC ? If yes how ?

Yes, at task level, in execution type, select SQL file. As a bonus to this answer, this article explains how to run store procedures in DAC.

9. in DAC How you can disable table indexes before loading and enable the index once load is complete ?
Just drop and recreate index

10.Let say you are running the normal incremental load. But just for today you want to extract data from AP_INVOCIES_ALL from 12/12/2011? How you can achieve this ?

Modify the refresh date to be 12/12/2011


11.How DAC Determines the Order of Task Execution within an Execution Plan ?
Based on tasks source/target table, Task phase (extract dim, load fact etc) and 'truncate always' properties, to run them in particular order, create task group


12.What are Micro ETL Execution Plans ? How can you Build and run them ?

According to Oracle document:
Micro ETL execution plans are ETL processes that you schedule at very frequent intervals, such as hourly or half-hourly. They usually handle small subject areas or subsets of larger subject areas. The DAC tracks refresh dates for tables in micro ETL execution plans separately from other execution plans and uses these refresh dates in the change capture process.

in design -- subject areas, create copy of subject area, inactive the unwanted tasks and create new execution plan for this subject area

13.From you past experience – explain scenario where Micro ETL Execution Plans produced wrong results on reports?

According to Oracle Document:
CAUTION:  Micro ETL processes can cause issues with data inconsistencies, data availability, and additional load on the transactional database. Therefore, you should consider the following factors before implementing a micro ETL process:

For related star schemas, if one schema is omitted from a micro ETL execution plan, the cross-star reports may be inaccurate. For example, if the Person fact table is refreshed more frequently than the Revenue fact table, a report that spans the Person and Revenue dimensional schemas may produce inconsistent results.
If you omit dimension tables from a micro ETL execution plan, the foreign keys for the fact tables will point to Unspecified rows for the new dimension records. The foreign key references will be resolved when the Complete ETL execution plan is run, but users of the reports should be aware of such inconsistencies.
If you do not include aggregate tables in micro ETL execution plans, the reports that use data from these tables will be inconsistent with the reports that use data from the detailed fact tables. However, if aggregate tables are included in the micro ETL execution plan, the aggregate calculations are performed for each ETL process, which will take a constant amount of time and may be inefficient to perform at such frequent intervals.
Hierarchy tables are rebuilt during every ETL execution plan by querying the base dimension tables. This operation takes a constant amount of time. If the base tables are big, this operation may take a long time and may be inefficient if the micro ETL execution plan runs several times a day. However, if you avoid populating the hierarchy tables during micro ETL processes, data inconsistencies will occur.
With micro ETL execution plans, caching will occur more frequently, which may have performance implications.
Micro ETL execution plans will put more load on the transactional database because of the frequent extracts.


14. Let say you can not use DAC scheduler to schedule you execution plan. What other options do you have ? How you can achieve this ?

Use Informatica scheduler.

15.Does DAC keeps track of refresh dates for all the source/target tables ?

According to Oracle Document:
Refresh dates are tracked only for tables that are either a primary source or a primary target on tasks in a completed run of an execution plan. The DAC runs the full load command for tasks on which a table is a primary source or target if the refresh date against the table is null. When there are multiple primary sources, the earliest of the refresh dates will trigger a full load or an incremental load. If any one of the primary source tables has no refresh date, then the DAC will run the full load command.


16.Consider the scenario as below for task T1
Primary Source has not null last refresh date
Primary Target has null last refresh date
Will task T1 executes in full or incremental ?

Based on answers provided from question 15, what do you think?


17.Explain the upgrade/merge options for DAC 7.8.4 & below and new versions ?

Use upgrade/merge wizzard.
1. Repository Upgrade (DAC 784) --- upgrade Dac
2. Refresh Base --- For upgrading BI Apps
3. Simplified Refresh From Base -- This option is similar to the Refresh Base option. It allows you to upgrade the DAC Repository from an older release of Oracle BI Applications to a new release without comparing repositories and creating a Difference Report.
4. Replace Base --- Upgrade when phasing out older transaction system to newer one
5. Peer to Peer Merge  --- Mergre different DAC instance of repository

18. Using DAC command line – write a script to check weather informatica services are up or not ?

use dacCmdLine InformaticaStatus. Below is the list of all commend lines according to Oracle:




19.Can we have two DAC server on the same machine ?
You can run two DAC servers on the same machine as long as they are listening on different ports and pointing to two different repositories

20.Explain briefly What kind of DAC Repository Objects Held in Source System Containers ?

Subject Areas -- A logical grouping of tables related to a particular subject or application context. It also includes the tasks that are associated with the tables, as well as the tasks required to load the tables. Subject areas are assigned to execution plans, which can be scheduled for full or incremental loads

tables -- Phsyical tables in DB

Indexes -- Just like your physical DB indexes

Tasks -- Unit of work for loading tables

Task groups  ---- Grouping of tasks that can be bundled to run as a group

Execution plans -- A data transformation plans defined on subject areas that needs to be transformed at certain frequencies of time

Schedules -- Determine how often execution plan runs.


21.What is Authentication file ? If you have dac client installed can you access DAC repository without Authentication file ?

According to Oracle Document:
When you configure a connection to the DAC Repository, the configuration process includes creating a new authentication file or selecting an existing authentication file. The authentication file authenticates the database in which the repository resides. If you create a new authentication file, you will specify the table owner and password for the database.

22.Explain Index, Table and Task Actions in DAC ?

According to Oracle Document:
Index action: Override the default behavior for dropping and creating indexes

Table action: Override the default behavior for truncating and analyzing tables

Task action: Can add new functionality of task behavior, such as precedinf action, success action, failure action, upon failure restart



23.How DAC Handles Parameters at Runtime ?

According to Oracle Document:
During an ETL execution, DAC reads and evaluates all parameters associated with that ETL run, including static and runtime parameters defined in DAC, parameters held in flat files, and parameters defined externally to DAC. DAC consolidates all the parameters for the ETL run, deduplicates any redundant parameters, and then creates an individual parameter file for each Informatica session. This file contains the evaluated name-value pairs for all parameters, both static and runtime, for each workflow that DAC executes. The parameter file contains a section for each session under a workflow. DAC determines the sessions under a workflow during runtime by using the Informatica pmrep function ListObjectDependencies.

The naming convention for the parameter file is

....txt

DAC writes this file to a location specified in the DAC system property InformaticaParameterFileLocation. The location specified by the property InformaticaParameterFileLocation must be the same as the location specified by the Informatica parameter property $PMSourcefileDir.

24. How DAC Determines Tasks Required for any given subject area ?

According to Oracle Document:
You define a subject area by specifying a fact table or set of fact tables to be the central table or tables in the subject area. When a subject area is defined, DAC performs the following process to determine the relevant tasks:

DAC identifies the dimension tables associated with the facts and adds these tables to the subject area.

DAC identifies the related tables, such as aggregates, associated with the fact or dimension tables and adds them to the subject area definition.

DAC identifies the tasks for which the dimension and fact tables listed in the two processes above are targets tables and adds these tasks into the subject area.

Tasks that DAC automatically assigns to a subject area are indicated with the Autogenerated flag (in the Tasks subtab of the Subject Areas tab).

You can inactivate a task from participating in a subject area by selecting the Inactive check box (in the Tasks subtab of the Subject Areas tab). When the Inactive check box is selected, the task remains inactive even if you reassemble the subject area.

You can also remove a task from a subject area using the Add/Remove command in the Tasks subtab of the subject Areas tab, but when you remove a task it is only removed from the subject area until you reassemble the subject area.

DAC identifies the source tables for the tasks identified in the previous process and adds these tables to the subject area.

DAC performs this process recursively until all necessary tasks have been added to the subject area. A task is added to the subject area only once, even if it is associated with several tables in the subject area. DAC then expands or trims the total number of tasks based on the configuration rules, which are defined as configuration tags. This process can be resource intensive because DAC loads all of the objects in the source system container into memory before parsing.


25.Difference between Homogeneous and Heterogeneous execution plans.

According to Oracle Document:

Homogeneous

This type of execution plan extracts data from multiple instances of the same source system. For example, a business might have an instance of Oracle EBS 11i in one location and time zone and another instance of Oracle EBS 11i in another location and time zone. In such cases, the timing of data extraction from the different instances can be staggered to meet your business requirements.

Heterogeneous

This type of execution plan extracts data from one or more instances of dissimilar source systems. For example, a business might have an instance of Siebel 7.8 in one location, an instance of Oracle EBS 11i in another location, and a second instance of Oracle EBS 11i in yet a third location. You can also stagger the timing of data extraction when you use this type of execution plan.