Catalog deployment runbook

Tip

For more information on how deployments work, please see the general deployment guide.

Setup

Check the running DAGs in Airflow to make sure no DAGs are running.

Caution

It is possible to perform a deploy if the image and audio refresh DAGs are running, but only if they are currently waiting on an HttpSensor step. If that is the case, you should pause the DAG, complete the deploy, and then unpause it back.

Publish the release

Publish the drafted catalog release in the GitHub release page of the monorepo.

Deployment

The catalog only exists in production, so there is no staging deployment. After the app is built and tagged, deploy production:

  1. Checkout the infrastructure repository and bump the catalog version with the just bump production airflow command.

  2. just ansible/playbook production airflow.yml -t airflow and verify the plan before deploying. Unless configuration variables are changing along with the docker image version, the only change should be to the docker image tag in the compose file. Run the playbook with -e airflow_apply=true to instruct the playbook to actually apply any changes.

  • If any DAGs are running, the playbook will not apply the changes and will let you know that. If this happens, visit Airflow and confirm the list of running DAGs. If they can be stopped, stop them. If they need to be waited for, wait until they are done, then run the playbook again. If you must deploy and cannot wait for the DAGs to finish (or, if they are deferred and cannot finish), run the playbook with -e airflow_force=true to ignore the running DAGs check.

  • See the setup section above for more information about when to decide if it is okay to deploy when DAGs are running.

Post-deployment steps

  1. Check for any Sentry errors in the maintainer’s #openverse-alerts channel, or in the Sentry UI.

  2. Ensure that Airflow is accessible at https://airflow.openverse.org.

  3. If an Airflow version upgrade was deployed, ensure that the version is correct in the Airflow UI (bottom left of the footer on any page).

  4. Review and Approve the automatically-generated changelog pull request in the repository.

  5. Push up a PR to the infrastructure repository with the Ansible group var changes you pushed.

  6. In the event of errors or problems, rollback the application by running the appropriate deployment workflow from the WordPress/openverse-infrastructure repository using the tag of the latest stable version. You can find the release version number in the changelogs, and then the tag to pass to the action is the version number prefixed with “rel-”, for example “rel-2023.07.03.17.52.00”.

  7. If anything else goes wrong or service is disrupted, consider this a Production Incident and notify the team.