Blog

Agile Data Warehousing - It’s time.

The goal of this article is to provide a case for the adoption of agile software methodologies in data warehouse software development and demonstrate how it is possible.

Background

Data warehouses have been employed as a means of providing a single source of truth about what’s occurring in a business since the late 80s.

For a long time these systems have been stored in data centers on monolithic servers. If the database administrators started struggling with the size of the data set, they would have to break up the data warehouse and create physically separate data marts. Loading data into the data warehouse is performed overnight in large batch operations that check for duplicates, perform data cleaning activities and then transform the data into a nice clean dimensional model for easy access by the end users.

These systems are commonplace today and represent decades of development… without a whole lot of innovation.

The Cloud

The cloud opens up new options for data warehousing. No longer do businesses have to pay huge upfront costs for gigantic servers — now we can simply spin up a cloud database and start loading stuff into it. Furthermore, scaling is a much less of a challenge as it becomes largely the vendors problem. The joys of Software as a Service!

There are not many cloud options for data warehousing yet, but there will be. The main players are Amazon Redshift, Snowflake Data Warehouse, Teradata Cloud, IBM dashDB, Microsoft SQL Data Warehouse and many of these are so new in 2016 they are not yet well-verified enterprise products. There are a number of Business Intelligence vendors getting into the space too with integrated data warehouses sitting underneath their cloud BI solutions.

This means that the options in data warehousing in the cloud are just beginning and for that reason not many companies are leveraging them. They present an opportunity for new ways of thinking about how to manage data and at THE ICONIC we are obsessed with doing things a better way.

For this reason we selected Amazon Redshift as our data warehouse platform. We already had a footprint in AWS and were using Redshift as a data lake without doing anything useful with it. Some people called it a “data warehouse” but nobody cared as it wasn’t particularly useful without sensible names and intuitive structures.

For a while we considered using a DWH/BI platform hybrid like Birst but decided against it due to the complications with leveraging that data fluidly in other connected systems.

Agile Data Warehousing

Everyone knows the virtues of being agile in software development. Database developers have often had their own set of rules as it simply hasn’t been possible to be agile with huge data sets. Data warehouses tend to all use fairly similar infrastructure and designs, so normal software methodologies have been too hard to apply in the past.

With the advent of cloud technologies it is now possible to apply agile techniques. I will run you through how we are now able to conform to The Twelve Principles of Agile Software Development as stipulated by the Agile Manifesto:
1. Our highest priority is to satisfy the customer through early and continuous delivery of valuable software.

  • Early delivery of a new DWH means delivering one data mart at a time. We opted to strike a balance of 3 data marts for our initial project release but releasing to the business after our 2nd data mart was completed.

2. Welcome changing requirements, even late in development. Agile processes harness change for the customer’s competitive advantage.

  • Metric definitions may change as the DWH is built as the build will be done alongside the business and this means that things will come up. This is great as it means we can refine it as we go.
  • As we proceeded we discovered shortcomings in our dimensional model. We simply discussed what would be clearer for consumers to understand and improved the model.

3. Deliver working software frequently, from a couple of weeks to a couple of months, with a preference to the shorter timescale.

  • With our new approach it is possible to do better than people have in traditional DWH construction. We operated on 3 week sprints and were able to deliver value from this as we went.
  • Our entire core data warehouse took 4 * 3 week sprints to complete with an average of 180 hours of effort invested across all our developers each sprint. We allowed our Business Intelligence team to start using each of the data marts as they were created.

4. Business people and developers must work together daily throughout the project.

  • In the first sprint we focussed on building the platform and ETL in import and structure the historical data. After that we started designing the dimensional model and this was the point at which we started reaching out to business people and integrating them into the design process.

5. Build projects around motivated individuals. Give them the environment and support they need, and trust them to get the job done.

  • Our Data team is absolutely pumped to be building the data warehouse of the future! We’re doing something that few others in the world are doing and want to set a global standard for an approach to how to do data warehousing in the cloud.
  • Our business people have been hobbling along for a long time and are very excited about a data platform. Few people understand what a data warehouse really means for them but know that it means easier access to data and that is enough to garner attention and cooperation. Our Business Intelligence team is working with analysts across the business to educate them on how to use it and why it’s important.

6. The most efficient and effective method of conveying information to and within a development team is face-to-face conversation.

  • We are lucky enough to be co-located with most of our business users and as such interaction is easy. We have a warehouse at a separate location which has meant a little travel but it’s hardly an inconvenience.

7. Working software is the primary measure of progress.

  • We released data marts as the data warehouse project progressed. This allowed us to verify our approach was useful.
  • With a data warehouse in use by many analysts across the business, we’re very happy with our progress.

8. Agile processes promote sustainable development. The sponsors, developers, and users should be able to maintain a constant pace indefinitely.

  • As we started reaching out to business users for input on the design, we established channels for them to continue working with us into the future. We work with them in a way that means they are not privy to the details of the project delivery timelines but more are experiencing this as an ongoing activity.

9. Continuous attention to technical excellence and good design enhances agility.

  • From the start of this project I made it clear that I wanted everyone to think things through carefully and work together on arriving at the best solutions. We value quality over speed and are willing to invest heavily up-front on quality design and incremental implementation rather than compromising quality design to deliver something sooner.
  • Our goal was to create a data warehouse that requires less effort to build and maintain, and this has inherently required good design.

10. Simplicity — the art of maximizing the amount of work not done — is essential.

  • We have done this by taking a different approach to build. Instead of loading the data, we load the metadata that describes the table definitions and perform the data migration.

11. The best architectures, requirements, and designs emerge from self-organizing teams.

  • The initiative to build a data warehouse was conceived by a technical team. As such there were no business users who naturally fell into the project team. Instead we sought out the people who would be key users in each of the departments across the business and involved them in the design process.

12. At regular intervals, the team reflects on how to become more effective, then tunes and adjusts its behavior accordingly.

  • We use sprint retrospective meetings to go track ourselves to our estimates and evaluate better ways to work.

Conclusion

The adoption of an agile methodology has resulted in the creation of a great data warehouse. As can be seen above we focussed on conforming to the agile manifesto and it has worked well.

There are new data warehouse technologies available that greatly improve the manner with which data is stored and managed, and when combined with modern software development techniques can prove powerful assets for an organisation.

In my next blog post I will walk you through some of the differentiating characteristics of our application that set a new standard for how data warehouses are built.