Understanding Characteristics of Data Warehouse In Detail!
Table of content:
- What's A Data Warehouse?
- Characteristics Of A Data Warehouse
- Data Warehouse Architecture
- Data Warehouse Views
- Components Of A Data Warehouse
- Layers Of Data Warehousing
- Types Of Data Warehousing
- Stages Of Data Warehousing
- Dimensions Of A Data Warehouse
- Need Of Data Warehousing
- Advantages Of Data Warehousing
- Disadvantages Of Data Warehousing
- Frequently Asked Questions
A data warehouse is a centralized repository where huge volumes of data from multiple sources are stored and managed for analysis and reporting. Because it offers a single source of data truth, it is optimised for quick querying and analysis, allowing organisations to make educated choices.
A data warehouse should be subject-oriented, time-variant, integrated, and analysis-optimized among other things. Instead of supplying information about the company's present activities, a data warehouse concentrates on delivering information on a certain theme or issue. Large volumes of historical data are stored and managed by it, allowing for the tracking of data over time and facilitating time-related analyses like trend analyses and forecasting.
What's A Data Warehouse?
Data warehousing is the central process of the Business Intelligence (BI) system, responsible for storing, collecting, and managing data from different heterogeneous sources of business data, like flat files, mainframe, relational databases, etc. in order to analyze the data for business-informed decisions.
- This is an efficient method essential for the decision-making process and the forecasting process of an organization.
- It is a form of a Relational Database Management System (RDBMS Technology), comprising an entire set of relational databases, which gives an idea about strategies that need to be planned for the future of a company.
- The data warehouse is used for research and collection of data from different sources, such that there is a minimal number of clients yet interactions are wide. This can help boost business agility.
- There are various functions of a data warehouse. It is not widely used in the transaction process but has the ability to act as an analytical tool with few tables and improve business processes.
What is the difference between a database and a data warehouse?
Database consists of a repository of real intervals of time information which is specific to a particular application. This is a transactional system as it is used in the transaction process.
Data Warehouse consists of a repository of a huge volume of structured data, which compiles relational databases and information centers from a wide variety of sources which is of huge help to a company for decision making. This is not used in transactions, but is beneficial from a historical perspective.
For example, a database will give information regarding a person's current phone number while a data warehouse will contain information about all the phone numbers used by a person in a predefined period of time in the past. This is because it will compile all the data from all the databases.
History of Data Warehouse
The historical insights of data warehousing date back to the late 1980s when Paul Murphy and Barry Devlin of IBM came up with the idea of 'Business Data Warehouse'. This was initially built as an architectural model for the information flow from operational systems to decision support environments. This helped in cost analysis and business analysis and hence was developed further in the next few years to make it what it is now.
Characteristics Of A Data Warehouse
The four characteristics of a data warehouse, also called features of a data warehouse are: subject-oriented, time-variant, integrated, and non-volatile.
These features of a data warehouse differentiate it from any other set of databases or data by characterization and help in robust data analysis.
I. SUBJECT ORIENTED
Decision makers of a business can analyze data easily, by constricting to a particular subject area of the data warehouse. This makes understanding and analysis of the data concise and straightforward by excluding unwanted information on some subject that is not needed for decision-making. This means that the ongoing operations of an organization are not taken into consideration.
II. INTEGRATED
Data warehouses consist of data from different variable sources integrated under one platform. This data obtained is extracted and transformed maintaining uniformity without depending on the source it was obtained from. This is the integrated feature. Standards are established which are universally acceptable for the data present in the warehouse.
III. TIME VARIANT
One of the important properties of the data warehouse is the historical perspective it holds. It keeps the huge volume of data from all databases stored in accordance with the elements of time. It consists of a temporal element and an extensive time horizon. The inability to change the element of time is an essential aspect of time variance. The record key is used to display time variance.
IV. NON-VOLATILE
Data is updated by uploading data in the data warehouse to protect data from momentary changes. This means that once data is fed, there can be no alterations or changes. The inability to be erased is the non-volatile character of the data warehouse environment. Data is read-only and allows only two functions to be performed: Access and Loading.
What is the difference between operational application and data warehouse?
Operational application codes for complex programs with normalized data are based on complex technology. Data warehouse doesn't allow updating of the data which is not stored in normalized form. Technology is simple here.
Data Warehouse Architecture
The architecture of a data warehouse can be divided into three main approaches: Single-Tier, Double-Tier or Two-Tier, and Three-Tier approaches.
- SINGLE-TIER APPROACH: A single-tier warehouse architecture minimizes storage and redundancy.
- DOUBLE-TIER OR TWO-TIER APPROACH: It's the simplest type but has multiple limitations and thus, is not used to avoid network limitations. There's also restricted expansion ability and an inability to support end users.
- THREE-TIER APPROACH: This approach is user-friendly and is the most widely used.
Three-Tier Architecture Of Data Warehouse
The architecture of a data warehouse consists of three tiers, which are as follows:
1. Bottom Tier-
- The bottom tier includes the Data Warehouse database server where the back-end tools and utilities perform the function of data compilation.
- The back-end tools extract, clean and transform the data and then load the data.
- It finally refreshes the system.
- Examples include Open-Linking and Embedding for Databases (OLE-DB), Open Database Connection (ODBC), and Java Database Connection (JDBC).
2. Middle Tier-
- The middle tier consists of an OLAP (Online Analytical Processing) server which acts as an intermediary between the single database and the end user.
- This server has the capacity to present a multidimensional database (MDDB) to the user.
- The two models that come under OLAP are the ROLAP model and the MOLAP model.
- ROLAP refers to Relational OLAP which converts multidimensional databases to relational databases to carry out analytical processing and decision process.
- MOLAP refers to Multidimensional OLAP which carries out all functions in the multidimensional database itself.
3. Top Tier- This tier consists includes the front-end client layer.
- It consists of API and various data warehouse tools to manage the data in the operational databases. These tools are Query tools, Reporting tools, Analysis tools and Data mining tools.
- While Query tools and Reporting tools produce a report, Analysis Tools analyze the data and plot a graph or chart and Data mining tools observe patterns and trends.
Data Warehouse Views
There are four views:
- Top-down view- This includes the relevant information present in the warehouse.
- Data source view- This involves the use of operational systems to manage, store and collect the data to form individual data source tables and integrated data source tables.
- Data warehouse view- This represents the data in the form of fact tables and dimension tables.
- Business Query view- This includes representing the data from the end user's viewpoint.
Components Of A Data Warehouse
The five components of a data warehouse are:
I. DATA WAREHOUSE DATABASE-
- The data warehouse database is the central database and core component that forms the base for the data warehousing environment.
- This is based on RDBMS technology, though the fact that this is better optimized for transactional systems can prove to be a challenge. For example, ad-hoc query, multi-table joins, aggregates, etc.
- For better functioning of the data warehouse, relational databases are also allowed to function in parallel, known as Parallel Relational Databases, which provides easier and better scalability on parallel processing or multiprocessor configurations. Multidimensional databases are also combined with relational databases along with new index structures to increase speed. For example, Essbase from Oracle database.
II. SOURCING, ACQUISITION, CLEAN-UP, AND TRANSFORMATION TOOLS (ETL TOOLS)-
- ETL (Extract, Transform and Load) Tools are another component that includes tools for sourcing, migration, transformation and acquisition to convert, summarize and transform the data present into a unified format.
- ETL tools help in dealing with the problems faced by the system due to Database and data heterogeneity.
-
Functions:
- DUPLICATION: It removes the duplicated or repeated values obtained from multiple data sources.
- SUMMARIES: It is used in calculating summaries, deriving data and generating plots.
- DEFAULT VALUE: Default value addition in case of absence of any data.
- UNWANTED DATA: Elimination of unwanted data present in operational databases of the data warehouse.
- COMMON NAMES AND SEARCHES: For data from heterogeneous sources, common names and searches can be found and replaced for efficient strategic informed decisions.
- ANONYMOUS DATA: Based on regulatory affairs, some data can be made anonymous using ETL tools.
III. METADATA-
Metadata uses high-level technology to build, maintain and manage the data warehouse. It provides information regarding the source, the use, values involved, and features of a data warehouse along with changing and processing of the data.
It is of two types:
1. Technical Meta Data- This is used by the designers and administrators to gather information and maintain it.
2. Business Meta Data- This is used by end-users to interpret the data stored in the warehouse.
IV. QUERY TOOLS-
Query tools allow the interaction of users with the data warehouse systems to make strategic decisions for a company. There are four categories of Query Tools:
1. Query and reporting tools: This can further be divided into 2 categories-
a. Reporting Tools- This includes desktop report writer which allows end users to carry out their analysis and production reporting tools which generate regular basis operational reports for business performance. Examples of reporting tools consist of Brio, Business Objects, Oracle database, PowerSoft, and SAS Institute.
b. Managed Query Tools- This is an access tool that performs the function of resolving snags in database, SQL and single database structure by the insertion of a meta-layer between the user and the database.
2. Application development tools- Application Development tools develop custom analytical reports for situations where built-in graphical and analytical tools cannot satisfy the needs of the user.
3. Data mining tools- These tools automatically collect large amounts of data and carry out mining of the data to find correlations, patterns and trends in the data mined.
4. OLAP tools- OLAP tools comprise multidimensional databases that a user uses to analyze the data using multidimensional views.
V. DATA MARTS-
- A data mart is a subordinate of a data warehouse that helps in providing output for large-sized data by partitioning the data.
- It is less time-consuming and cheaper.
- They can be created in the same database or a separate database.
Layers Of Data Warehousing
Based on the properties of the data warehouse, it can be divided into three main layers:
1. STAGING- This involves storing the data for future use by the administrators, designers and developers.
2. INTEGRATION- This involves integrating the data such that a level of abstraction can be seen in the information. It involves establishing a standard unit of measurement for all related data.
3. ACCESS- Accessing refers to producing the relevant information or output for the user as specified by the user.
Types Of Data Warehousing
There are three types of Data Warehousing:
1. CLASSIC DATA WAREHOUSE- The classic data warehouse is the traditional form.
2. VIRTUAL DATA WAREHOUSE- Virtual or computer data warehouse is utilized in place of a conventional warehouse.
3. CLOUD DATA WAREHOUSE- Cloud Warehouse platform is a secure, scalable type useful for carrying out the daily operations of an organization or company.
Stages Of Data Warehousing
There are four stages of the Data Warehousing process:
1. OFFLINE DATABASE- First stage of warehousing where data is copied from operational systems to external servers.
2. OFFLINE DATA WAREHOUSE- Second stage involves scheduled updating of the data in the central repositories.
3. REAL-TIME DATA WAREHOUSE- Third stage where data is now updated in a real-time limit format, into the warehouse.
4. INTEGRATED DATA WAREHOUSE- This stage involves transfer from the data warehouse to the operational databases and the data is allowed to flow between systems.
Dimensions Of A Data Warehouse
Dimension is an element of a data warehouse that provides a structure to the unstructured data. It plays an important role in filtering, grouping and labeling. Various hierarchies can be applied to a Dimension. The user can filter, categorize or report the data.
It can be divided into the following types:
A. Conformed Dimension- This type of dimension provides a set of attributes for a multiple database table which can either be identical or be a subset of the other. They come in different flavors. At the basic level, they are joined to the fact table.
B. Junk Dimension- Junk Dimension refers to a dimension table comprising of low-cardinality data (e.g. non-generic comments) that doesn't belong to either a fact table or any other dimension table. Junk dimensions can hold a lot of independent binary indicators that can be converted into a describing attribute. This helps in creating a fact table with many foreign keys. For a blank comment field, a single row can act as a surrogate key in the fact table.
C. Degenerate Dimension- Degenerate Dimension comprises a dimension key without any attributes in a fact table. They represent a unique identifier that is not a part of the actual dimension table. Hence, they can be used to identify the primary key.
D. Role-Playing Dimension- They represent recyclable dimensions that can be used for carrying out multiple applications in the same database
Need Of Data Warehousing
Here are five reasons why we need data warehousing:
1. For use by non-technical people- Business users are nontechnical people who need to gather information in a summarized, elementary fashion. This function is fulfilled by Data Warehousing.
2. Storing the historical data- The time variable related data from the past needs to be stored for future use.
3. Strategic decision-making- Data warehousing helps in making strategic decisions based on the data given in the warehouse. This data helps draw valuable insights.
4. Data consistency and quality of data- Data warehousing helps in maintaining the consistency and uniformity of the data, even though it has been derived from heterogeneous sources.
5. Response time is fast- Data warehousing provides a significant degree of flexibility and faster response time that helps it deal with a lot of load and queries.
Advantages Of Data Warehousing
The advantages of Data warehousing includes:
1. ANALYSIS: It helps in effective analysis. The valuable data analytics it provides help make the decision-making process better. It extracts information from a raw collection in an efficient manner to provide actionable insights.
2. DATA ACCESS: It makes accessing data fast and simple.
3. DATA QUALITY: It helps maintain consistency in the quality of the data.
4. TRENDS: It helps in understanding trends better and forecasting decisions easily, leading to an increase in productivity.
5. DATA VOLUME: It includes repositories consisting of huge volumes of data and helps in managing such data.
6. DEMANDS: It is an efficient method to handle the demands of different users.
7. QUERY PERFORMANCE: It is designed to be able to easily handle complex queries.
8. DATA MAINTENANCE: It helps in maintaining the data with respect to the history of the information provided.
9. STORAGE: It is valuable for merger organizations since it allows the storage of heterogeneous data.
10. EXTRA FUNCTIONS: Coding, Descriptions, Flagging, and Fixations are all possible due to data warehousing.
11. RESTRUCTURING DATA: Data can be restructured, though it is not erased, based on the operations desired by the business user.
12. OPERATIONAL BUSINESS APPLICATIONS: It adds value to operational business applications like customer relationship management (CRM) systems.
13. DATA MODEL: It can merge data to form a common data model.
14. LESS TIME CONSUMING AND CHEAP: It takes less time to provide output and is cheaper than other programs.
15. USE OF BI ANALYTICS: It uses better BI analytics (Business Intelligence tools) called enterprise data warehouse (EDW) - a centralized data repository to analyze and generate reports.
Disadvantages Of Data Warehousing
The disadvantages of data warehousing are:
- FLEXIBILITY REDUCED: Homogenization of data makes working a little less flexible and also leads to loss of data. This problem can be solved by monitoring the data cleaning process.
- COPYRIGHT ISSUES: Since data is added to a centralized warehouse, copyright issues may occur that make data insecure.
- INCREASED REPORTS: A large volume of data means an increased amount of report generation and use of resources. Categorization of data can prevent this.
- MAINTENANCE PROBLEMS: Problems in this system remain hidden and hence, need proper maintenance.
Conclusion
This article focuses on the way warehousing solutions, and data warehouse in general, helps an organization in making accurate decisions. The following points are brought into light using the article:
- Data warehouse is an efficient central information repository which is a component of a Business Intelligence system based on RDBMS technology.
- It differs from a database since a Data Warehouse compiles data from heterogeneous sources while a Database provides real-time data.
- Data warehousing is essential to make strategic decisions, store data history, maintain quality and consistency of data, and enable faster response to a query. It also provides actionable insights and makes it easier for non-technical people to work using various databases.
- The architecture of a data warehouse can be created using three approaches- Single-tier, Two-tier, and Three-tier. Among these, the three-tier approach is widely used and consists of bottom tier, middle tier, and top tier.
- Data warehouse allows four views: top-down, data source, data warehouse, and business query view based on different viewpoints of data representation.
- There are five components used for the set-up - Database server, ETL tools, Meta Data, Query tools, and Data mart.
- Data warehousing comprises three layers - Staging, Integration and Access. And has three types - Classic data warehouse, Virtual data warehouse and Cloud data warehouse.
- The process of data warehousing, developed by Murphy and Devlin in the 1980s, can be divided into four stages - Offline database, Offline Data warehouse, Real-time analytics, and finally Integrated Data warehouse.
- There are four dimensions of data warehousing- Junk dimension, Conformed, Degenerate, and Role-playing. It also has four characteristics- Subject-oriented, Integrated, Time-variant, and Non-Volatile.
Frequently Asked Questions
1. Tell some reasons data warehouse projects fail.
Some reasons data warehouse projects fail are:
Forgetting about long-term maintenance: Data warehouse projects often overlook the ongoing maintenance required, such as data format changes, increasing data velocity, and the time cost of adding or fixing data connections.
Underestimating data transformation requirements: Data transformation is a crucial step in data warehouse projects, and underestimating the complexity and effort required can lead to failure.
Treating it purely as a technology project: Data warehouse projects should not be treated solely as technology projects. Focusing on the end users and delivering value should be a priority.
It's crucial to note that this list of causes is not complete and that other problems and parameters could contribute to the failure of data warehouse projects.
2. What are the 4 characteristics of a data warehouse?
Four characteristics of a data warehouse are:
Subject-Oriented: A data warehouse is designed to provide information about a specific subject or theme, rather than focusing on an organization's current operations.
Integrated: A data warehouse's ability to integrate data is one of its important features. Data from diverse sources, including transaction processing, must be gathered, cleaned, transformed, and then consolidated into a single, cohesive picture.
Time-Variant: A data warehouse allows for the examination of historical data changes over time. Users may look at trends, patterns, and changes in the data over different time frames since it saves and retains track of past data.
Non-Volatile: The data in a data warehouse is not volatile since no real-time updates or modifications are made to it. Data that has been placed into a data warehouse is normally not changed or removed.
3. Why do organizations need a data warehouse?
There are many reasons why an organization needs a data warehouse. Some of the reasons are:
Fast access to information: A data warehouse enables quick access to information regarding previous performance, assisting businesses in forecasting future trends and making wise decisions.
Enhanced business intelligence: A data warehouse delivers enhanced business intelligence by providing a single platform for accessing data from different sources.
Saves time: A data warehouse saves time by providing a single source of truth for data analysis.
4. What is data aggregation in a data warehouse?
The act of collecting and compiling data from many sources into a more consumable and thorough format is known as data aggregation. In the process, groupings of observed aggregates are replaced with summary statistics based on those observations. Atomic data rows are also replaced with totals or summary statistics. Data aggregation is frequently seen in a data warehouse because it can quickly and effectively answer analytical queries about vast amounts of data.
5. What's the difference between data warehousing and data mining?
The differences between data warehousing and data mining are:
DATA WAREHOUSING | DATA MINING |
A data warehouse is created to assist with management tasks | Data mining is a technique used to uncover patterns and meaningful information in data. |
Information collection in a data warehouse is known as data warehousing. | Data mining is a process of finding hidden information and patterns in different data sets. |
Data warehouses combine information from several sources while maintaining the accuracy, consistency, and quality of the data. | Data mining extracts useful information and insights from a large amount of data. |
6. How is data transformed in a data warehouse?
Data transformation in a data warehouse is the process of transforming raw data into a format appropriate for analysis and modeling. In order to extract useful knowledge and insights from the data, data transformation seeks to make the data ready for data mining.
Data transformation often entails a number of stages, including,
- Data Cleaning
- Data Integration
- Data Aggregation
- Data Normalization
- Data Encoding
- Data Scaling
- Data Derivation
7. How does a data warehouse support business intelligence (BI)?
Business intelligence is aided by data warehouses because they combine and integrate data from many sources, clean and transform the data, store historical data, increase query efficiency, uphold data security and governance, and provide a platform for business analytics.
It helps in company operations as businesses can extract useful information from their data and make decisions that will lead to financial success.
You might also be interested in reading:
- Cohesion Vs Coupling: Know The Differences Between The Two For A Better Software Design!
- What is the difference between Class and Object? Explained
- What Is Programming? A Guide to Computer Programming
- Advantages And Disadvantages Of SQL: A Popular Choice For Databases!
- Advantages And Disadvantages Of Internet| Know How Internet Is Affecting Human Brain
Login to continue reading
And access exclusive content, personalized recommendations, and career-boosting opportunities.
Comments
Add comment