Day to day, our brain collects and processes a lot of information, so we can make decisions needed in our work and routine life. Like people do, companies also require the unit for storing and organizing the information; this tool is named a data warehouse.
Let’s take a deeper look at what the enterprise data warehouse is, how it works, and what features and types it has.
If you are interested in how much information сan be placed in the EDW, this fact is for you: the number of data US Netflix subs have used in the last month is 527 million terabytes. So, an EDW is not just a database.
EDW is a corporate repository that stores and manages all historical business data of the enterprise. This data goes from the CRM, ERP, flat files, and physical recordings, and it should be in the one storage for a versatile analysis. So, the enterprise can manage large data sets without several databases, just an EDW.
This kind of data storing is used in business intelligence, a scope of technologies and methods of transforming raw data into actionable insights.
Every warehouse, first of all, is the database, connected with sources of raw data with the help of data integration tools and analytical interfaces. The warehouse is the place for data storing, transforming, and moving it to the end-user.
The main difference between the usual warehouse and enterprise warehouse is in its features and architectural diversity. EDW can include small databases that are more suitable for customer’s requests.
Let’s consider what functions an enterprise warehouse provides and with what models of data it works.
The pillars of an enterprise data warehouse as a technological phenomenon are its functions.
Data warehouses can differ for different kinds of business, data scope, analysis level, budget, security, etc. Also, you can always customize your data warehouse according to your company’s needs. Let’s consider the most popular data warehouse types.
This is unified physical storage with its own specialized hardware and software. If you are using a classic ETL, you don’t need to implement integration tools between several databases. EDW can connect with data sources via API, so it allows ETL to get information and transform it without interruptions. So, in the role of place of actions can be the staging area (data transforming place before the loading to the data warehouse), or the warehouse itself.
The work with the classic data warehouse is not complicated by additional abstraction layers so that data scientists can manage data flow on the preprocessing side without problems.
The classic data warehouse has its disadvantages, as other data warehouse types do, but they can vary according to its realization in the company:
Where to use:
The classic data warehouse can be used in all kinds of enterprises that need data processing. This kind of data warehouse can transform into different architectural styles of data platforms and scale up and down.
This is an alternative solution that is used instead of a classic data warehouse. In simple words, it contains several databases connected virtually, so they work as a single system.
This approach is convenient for companies: data stays in their sources but can be pulled with the help of analytics tools. You can use a virtual data warehouse if you don’t want to work with basic infrastructure, or when your data is easy to manage.
Disadvantages of the virtual data warehouse:
Where to use:
It is a solution for companies whose raw data is in standard form and doesn’t require complex analytics. Also, it suits enterprises that just begin or use BI from time to time.
Cloud data warehouses have become a usual thing for enterprises in the last years. So, you can choose one of the most popular solutions of the warehouse-as-a-service on the market:
Snowflake suggests EDW as a standalone service, other providers allow companies to use fully-managed warehousing as part of their BI tools. If you choose a cloud data warehouse, you don’t need to build your physical servers, databases, and managing tools: its infrastructure is maintained for you. Such kind of warehouse’s pricing depends on the quantity of needed memory and computing capabilities for processing requests.
Disadvantages of the cloud data warehouse:
The main drawback of this solution is data security. While you should check your cloud warehouse vendor and be sure that you can trust this company. Business-data is a very sensitive thing, and it can be a risky moment when your information isn’t in your hands.
Where to use:
Cloud data warehouses can be a great solution for any size of enterprises. With a warehouse, you also get the integration of managing data, warehouse maintenance, and BI support.
Let’s consider the most popular approaches to enterprise data warehouse architecture that improve warehouse capabilities. The data pipeline is divided into three layers:
Tools for extraction, transforming and data loading belong to the separate layer, named ETL. The ETL stage stands between the raw data layer and the moment when data comes to the warehouse.
Data also can be transformed at the moment of loading to the warehouse. So, the warehouse requires some functionality for cleaning, standardization, and dimensionalization; and these tasks can define the kind of warehouse or architecture. Let’s take a deeper look at how the requirements of the organization can influence the warehouse architecture.
One-tier architecture is the simplest type of warehouse. In this case, a warehouse is a relational database with modules for the usage of multidimensional data. Also, it can be a database that performs elementary actions and separates domain-specific information
to make easy access.
EDW with one-tier architecture means a database that is directly connected with analytical interfaces for end-users requests. Such kind of connection can be tricky:
One-tier architecture approach demonstrates slowness and unpredictability of work, so it doesn’t suit large-scale data platforms. It can be enhanced with low-level instances if it is needed to simplify data access and perform advanced data queries.
A two-tier architecture is complemented with the data mart layer between the user interface and EDW. A data mart contains information related to the particular domain, so it is a small database, a part of EDW, with dedicated information for sales departments, marketing, etc.
A data mart implementation requires additional resources: establishing hardware and integration with another part of an architecture. These costs are reasonable for several moments:
The three-tier architecture approach allows companies to use in their work data online analytical processing (OLAP) cube. This type of database provides data from several dimensions. It makes it possible to compile data in several dimensions and move between these dimensions.
In simple words, the OLAP cube looks like several Excel tables connected with each other. Data in the OLAP cube is segmented in multiple ways at the same time, for example, by locations and periods of time, so it is called multidimensional data.
OLAP cube is a perfect solution for detailed reporting. With its help, customers can select and mix needed data. OLAP cubes optimized for work with warehouses; they can be used both with EDW and provide access to all corporate data scope. It is an available and common solution; there are no difficulties with implementation, as almost all warehouse vendors offer OLAP as a service.
Let’s take a deeper look at the fundamental difference between a data warehouse, a data mart, and a data lake.
Let’s list all components that constitute a data warehouse platform and build some kind of EDW glossary.
DW database: as a rule, data warehouses are relational databases, they contain a database management system and additional storage for metadata.
ETL (extraction, transforming, and loading) layer: tools that fulfill an actual connection with the source data, its extraction, and loading to the place for transforming. It is essential to distinguish ETL from the ELT approach. The ELT means data loading before transforming and transforming performs in the warehouse.
Meta-data module: meta-data are explanations and hints for users or administrators about data subject or domain. It can be technical meta-data, for example, the initial source, or business meta-data, for instance, a region of sales. So, all meta-data stores in a separate module that is managed by the meta-data manager.
Reporting level or BI-interface: tools that provide access to the data for end-users. It is needed for data visualization, building reports, and extraction of particular parts of data.
Sources: databases that store raw data.
Staging area: it is a place where data is loading before going to the EDW. In the staging area, data is cleared and transformed into the needed data model. There also can be tools for data quality management.
The planning of the EDW deployment is a complicated and time-consuming process: you should figure out with data warehouse construction, its tools, features, your business requirements, and end-users needs. While, it will be a good solution to turn to the experienced specialists in warehousing, BI, and ETL. They will help you to choose the most suitable technical solution that meets your company’s size and specialization.