Using Data Containers To Boot Your Development Environment In Seconds

Published on Jul 7, 2020

One of the most time consuming parts of booting a Docker development environment is initializing databases. The data container pattern gets around this obstacle by taking advantage of some less known features of volumes. With data containers, you can easily distribute, maintain, and load your database’s seed data.

Data containers are a commonly overlooked tool for building the nirvana of development environments: booting the environment with a single command that works every time.

You’re probably already using volumes to save you some time working with databases during development; if one of your development containers crashes, volumes will prevent you from losing the database’s state. But interestingly, Docker volumes have some cool quirks that we can leverage for the data containers pattern.

In this post, I’ll:

Skip Directly to the Example

Boot it with docker-compose up (or blimp up!)

Take advantage of some quirks of Docker volumes to copy the data from the container into the database, so that the database is fully initialized when it starts. Check out the repo >

img

Standard Techniques for Initializing Databases

When developing using Docker, there are three approaches developers commonly use for setting up their databases. All of them have serious drawbacks.

1) Initialize Your Database By Hand

Most people start by setting up their databases by hand. But this has several serious drawbacks:

2) Initialize Your Database Using a Script

Using a script can save you a lot of manual work. But it comes with its own set of headaches:

3) Use a Remote Database

Using a remote database – typically your staging database – is certainly faster than running scripts or initializing your database by hand. But there’s a big downside: you’re sharing the database with other developers. That means you don’t have a stable development environment. All it takes is one developer mucking up the data to ruin your day.

A Better Way: Data Containers

Data containers are containers that store your database’s state, and are deployed like any other container in your Docker Compose file. They take advantage of some quirks of Docker volumes to copy the data from the container into the database, so that the database is fully initialized when it starts.

To see how volumes can speed up your development work with databases, let’s take an example from the Magda data catalog system. Here’s a snippet from the Magda Docker Compose file:

services:
  postgres:
    image: "gcr.io/magda-221800/magda-postgres:0.0.50-2"
    volumes:
      - 'db-data:/data'
    environment:
      - "PGDATA=/data"

  postgres-data:
    image: "gcr.io/magda-221800/magda-postgres-data:0.0.50-2"
    entrypoint: "tail -f /dev/null"
    volumes:
      - 'db-data:/data'

volumes:
  db-data:

When you run docker-compose up in the Magda repo, all the Magda services start, and the Postgres database is automatically initialized.

How It Works

This setup takes advantages of two features of Docker volumes:

1) Docker copies any files masked by volumes into the volume. The Magda example has the following in its Docker Compose file.

  postgres-data:
    image: "gcr.io/magda-221800/magda-postgres-data:0.0.50-2"
    entrypoint: "tail -f /dev/null"
    volumes:
      - 'db-data:/data'

When postgres-data starts, it mounts a volume to /data. Because we built the gcr.io/magda-221800/magda-postgres-data image to already have database files at /data, Docker copies those files into the volume.

2) Volumes can be shared between containers. So any files written to db-data by postgres-data are visible in the postgres container because the postgres container also mounts the db-data volume:

  postgres:
    image: "gcr.io/magda-221800/magda-postgres:0.0.50-2"
    environment:
      - "PGDATA=/data"
    volumes:
      - 'db-data:/data'

Putting this all together, when you run docker-compose up, the following happens:

In short, instead of having to spend a lot of time repeatedly initializing your databases by hand or creating and maintaining scripts, with remarkably little work you are good to go. You’ve got a fully automated system in place that will work every time.

Benefits

This approach has 3 major benefits:

Downsides

The main downside of this approach is that it can be hard to maintain the data container. Maintaining it by hand has the same downsides as initializing the database manually or with scripts – the data can get stale as the db schema changes.

Teams that use this approach tend to generate their data containers using CI. The CI job snapshots and sanitizes the data from production or staging, and pushes it to the Docker registry. This way, the container generation is fully automated, and developers don’t have to worry about it.

Conclusion

Data containers are a cool example of how Docker Compose does so much more than just boot up containers. Used properly, it can substantially increase developer productivity.

We’re excited to share these developer productivity tips because we’ve noticed that the development workflow has become an afterthought during the move to containers. The complexity of modern applications requires new development workflows. We built Blimp so that development teams can quickly build and test containerized software, without having to reinvent a development environment approach.

Resources

Check out another trick for increasing developer productivity by using host volumes get rid of container rebuilds.

Try an example with Blimp to see how easily development on Docker Compose can be scaled into the cloud.

Read common Docker Compose mistakes for more tips on how to make development faster.


By: Kevin Lin

Related Article

Why DevOps Should be Responsible for Development Environments

Read Now →