Deployment runbook#
Setup#
Check the running DAGs in Airflow to make sure no DAGs are running.
Caution
It is possible to perform a deploy if the image and audio refresh DAGs are running, but only if they are currently waiting on an
HttpSensorstep. If that is the case, you should pause the DAG, complete the deploy, and then unpause it back.Publish the drafted catalog release in the GitHub release page of the monorepo
Here you can preview the changes included in the catalog release and decide whether a release is necessary and adjust monitoring during the deployment accordingly.
Deployment#
The catalog only exists in production, so there is no staging deployment. After the app is built and tagged, deploy production:
Checkout the infrastructure repository and bump the catalog version with the
just bump prod catalog-airflowcommand.Once you’ve verified that no DAGs are running, update the value of
running_dags_clearedtotruein the production module declaration.just apply prod catalog-airflowand verify the plan before deploying.Restore the value of
running_dags_clearedback tofalse.
Post-deployment steps#
Check for any Sentry errors in the maintainer’s
#openverse-alertschannel, or in the Sentry UI.Ensure that Airflow is accessible at https://airflow.openverse.engineering.
If an Airflow version upgrade was deployed, ensure that the version is correct in the Airflow UI (bottom left of the footer on any page).
Review and Approve the automatically-generated changelog pull request in the repository.
Push up a PR to the infrastructure repository with the Terraform changes you pushed (the version bump for the relevant module).
In the event of errors or problems, rollback the application by running the appropriate deployment workflow from the WordPress/openverse-infrastructure repository using the tag of the latest stable version. You can find the release version number in the changelogs, and then the tag to pass to the action is the version number prefixed with “rel-”, for example “rel-2023.07.03.17.52.00”.
If anything else goes wrong or service is disrupted, consider this a Production Incident and notify the team.