How to use the data warehouse? Master it in seven steps
Building a data warehouse always involves the entire company. At the outset, therefore, top management must make the important decision whether it is strategic for the company to have a data warehouse. The function of a data warehouse is to support strategic and operational decision-making based on dashboards, reports, data analytics and meeting the reporting requirements of regulators. So when to decide on a data warehouse and how to proceed?
1. Define your key metrics
First, each department in the company needs to define the key metrics it needs to assess its own performance - whether in terms of finance and employee productivity, in terms of commissions and efficiency of production and business processes, or in terms of meeting regulatory requirements. The goal is for the organization to get a framework idea of what business questions the data warehouse should know the answers to, which reports and data sets should be automated, and which data should be historicized.
After all, in every company there is a set of data outputs or reports that are essential to the running of the entire organization. However, often these outputs are processed in a partial or completely manual way. Therefore, the automation of output creation and the requirement to historicize source data can be the first motive for defining and formalizing data warehouse requirements. An example would be the creation of financial statements for an annual report tagged with XBRL tags.
2. Identify key business requirements and give them a clear structure
Formalising requirements is often the point at which an external supplier should step in to prepare the data warehouse development. This will help the organization create structured documentation of user requirements with mapping to business processes and existing data sources. The resulting document should be structured enough to be linked to other parts of the structured documentation of the overall work - i.e. the other metadata of the data warehouse.
A good vendor should have reference models and best-practices for the business area. Only in this way can a company be sure that no area of its business is forgotten in the design of the data warehouse and that the proposed data model will be functional and can be managed and easily extended.
Formalizing the requirements for a data warehouse is also a step that can reveal inefficiencies in some business processes or parts of them and can become the impetus for modifying them. A structured definition of data warehouse requirements thus creates a synergistic effect with high added value for the entire organization.
3. Choose the right approach to the development and further development of the data warehouse
A data warehouse is not an enterprise application whose development is definitively completed after all requirements have been resolved. At this point, the organization should determine, with the help of the vendor, what approach to take in order to develop the data warehouse in line with the growth of its activities.
The selected data warehouse process, data and technical architecture and design patterns must then reflect the requirements for availability and speed of output processing, the ability to implement change quickly, the need for coherence of business and technical documentation and security. A poorly designed data warehouse architecture in terms of process, technical and data increases the overall cost of the project.
4. Configure the infrastructure to reflect the technical architecture
Data warehouse technologies reflect the requirements for availability of outputs, ease of data warehouse management, and speed of change. However, the development, testing, and production environments must also be set up well.
At this stage, the concept of separate data warehouse environments that mimic proven practices in software development has proven successful. It ensures the quality of the data warehouse and differentiates the development from the production environment. Moreover, the separation of the environments is appropriate both from a security point of view and to ensure the stability of the production environment.
The test environment is live and replicates production processing to some extent. When a newly developed increment is run through a simulated production run on it, it reduces the risk of errors in "live", production processing.
5. Establish basic user requirements in the development process
The development process typically takes an approach that reflects a prioritised strategy for handling user requirements. Specifically, the user requirements reflect the data model, which, hand in hand with the data transformation mapping, represents the main resource for the developer. And based on these definitions, it creates transformation procedures in the development environment.
Before committing the finished increment, it is imperative to ensure that the user requirements documentation is linked to the specific business processes, the data model, the data transformation mapping, and the technical implementation of the requirement.
The feasibility of impact analysis across the entire solution - from business definitions to technical implementation - is also key in data warehouse development. Otherwise, there is a risk that the business user will not trust the output, and the business and IT development worlds will not get along.
6. Test data warehouse outputs
If the business does not have confidence in the data warehouse outputs, the whole investment will go to waste. Often, the trust of the customer disappears when the presented values of key metrics in the newly deployed solution do not match the expected values. To avoid this crisis of confidence, it is necessary to test the entire solution on a properly set up test environment before deploying it into a production environment. This should in a sense be a mirror image of the production environment.
However, test scenarios should not be limited to a selection of a few examples. Ideally, the values of the metrics should be tested on the entire portfolio of facts under consideration, and the differences must always be justified. It must be remembered that it is not only the data presented that needs to be tested, but also the security of access to it. Therefore, it is advisable to test the correct setting of access permissions on the deployed outputs.
7. Consider further development and change management
The data warehouse occupies a somewhat special place in the IT world. Unlike other applications, it needs to have a long lifespan, as design patterns for data consolidation and especially data historization cannot be easily changed. Therefore, the architectural decisions made in the initial stages of data warehouse development apply throughout its lifetime.
No data is deleted from the data warehouse unless it is a strict requirement of the regulator, for example in the context of GDPR. The underlying assumptions therefore also determine how development and change management within the data warehouse is implemented. The golden rule applies here: every additional requirement and its solution must be documented, new increment of the data warehouse must always follow the set design pattern, and the data warehouse data model should preferably be extended according to the reference data model with adherence to defined conventions.