Index migration runbook#
From time to time, we will need to update our Elasticsearch indices. These modifications can be classified into two broad-strokes categories, depending on whether the changes affect the main consumer of the indices, the API.
These changes are safe modifications to the ES schema that do not affect the API. As such they do not need any migration process. Examples:
addition of new fields or subfields
removal of fields that are not referenced or used by the API
changing the type to another compatible type (like
For API-free changes, we deploy the ingestion server and perform one of the two:
standard data-refresh (either triggered manually or as scheduled)
The indices will be updated to the new schema and will be made available to the API.
These changes are modifications to fields that already are in use by the API and involve code changes in both the ingestion server and the API. Examples:
removal of a field
changing the type to an incompatible type
renaming of a field
Such kinds of changes need us to precisely deploy the API in coordination with the promotion of new index because of these reasons:
If the API deployment lags behind index promotion, the old field that the API uses will disappear.
If the API deployment leads ahead of index promotion, the new field the API uses will not be present.
This runbook documents guidelines and processes for API-involved migrations.
Our goal is to break down an API-involved change into multiple small, atomic changes with each step affecting at most one of the ingestion server or the API and ensuring that the API and ES remain compatible throughout the process.
Pull request guidelines#
A change that involves modification to the ES index as well as its usage in the API requires at least three steps, each associated with exactly one PR that modifies exactly one of the ingestion server or the API to allow them to be deployed independently.
Change the ES index mapping in the ingestion server. Ensure that the change is purely additive, keeping the old fields unchanged and creating new fields that contain the data the API will need.
This PR should make changes only within the
ingestion_server/directory, more specifically the following two files concerned with ES mappings and document schemas:
Update the API code to reference and use the new ES fields added in the previous step. Ensure that the old fields become unreferenced.
The PR should make changes only within the
Change the ES index mapping in the ingestion server to remove the old, now-unreferenced fields.
Like PR number 1, this PR should also make changes only within the
Get the PRs reviewed in advance so that the entire process has been vetted by the team and there are no surprises or delays when the plans have been set into motion.
Each PR in the chain should branch from, and point to, its predecessor in the chain so that CI continues to pass for each PR.
Assume we have a field
foo with type
text in the index. It has a subfield
keyword with type
keyword. The API uses
foo.keyword for all purposes. We
foo field to have type
keyword and for the API to use
foo.keyword. To accomplish this without downtime, we need three PRs:
keywordwould be an API-free change because it is a type change between two compatible types and does not affect the nested field
foo.keywordthat is in use by the API. Technically the outer field can be assumed to be “new” because it was not being used at all.
Then we make an API change to use
foodirectly instead of
foo.keyword. Any other accommodations to make use of
foocan be made in this step. In this case
foowill be the same as
foo.keywordso no other changes will be needed.
Removal of the
foo.keywordfield would now also be an API-free change because the field would no longer be in use.
The entire migration process can be classified into 3 phases.
Create the new fields#
At the close of this phase we have all the new information for the API to use.
Use the new fields instead of the old#
Merge PR number 2. This will automatically deploy the API to staging.
Verify that the staging API continues to work.
Deploy the API to production.
Verify that the production API continues to work.
At the close of this phase the API is exclusively using the new fields and the old ones have become unreferenced.
Remove the old fields#