2024-03-25 Implementation Plan: Document all media properties in the API

Author: @dhruvkb

Reviewers

Overview

This implementation plan devises a method for documenting the media properties for media models defined in the API.

Expected Outcomes

The outcome of this implementation plan will be a documentation page for the media properties pertaining to the models in the Django API, similar to the documentation for the media properties for the catalog.

This implementation plan will make changes to the location of the catalog media properties documentation so that it can be extended to include information about the API and, in the future, the frontend.

Step-by-step plan

The plan consists of the following steps:

  • Define management command and just recipe.

  • Introspect Django models.

  • Merge with additional notes.

  • Generate documentation page.

  • Ensure freshness using CI.

Step details

Define management command and just recipe

The complete process of generating the documentation will be encapsulated in a Django management command. This command will be named documentmedia and will be defined in the api app in the API project.

On its own the command will be insufficient to write the documentation page to the documentation project as it will run inside the Django context and will not have access to the filesystem. This management command will write the documentation to the API project root.

We will invoke this command from a just recipe which will move the file to its destination, lint it and also function as the CI test which returns a non-zero exit code when the file is modified in the process. This recipe will be named generate-docs with the doc argument set to "media-props" to allow for more autogenerated documentation in the future.

Introspect Django models

All media models in the Django API inherit from the AbstractMedia class. This class be used to obtain a list of all media models using the __subclasses__ attribute. Django models can be introspected using the _meta property on the model class and their field list can be obtained using _meta.get_fields().

Using introspection we can obtain a list of all fields on the model, and then extract the following data points for each field.

Data-point

Type

Description

name

str

the name of the field

internal_type

str

the class name of the field

db_type

str

the database type of the field

is_relation

bool

whether the field is relation to other models

The following data points are only present (and relevant) if the field is a relationship.

Data-point

Type

Description

nature

str

the one/many to one/many nature of the relationship

to

str

the name of the related model

The following datapoints are only present (and relevant) if the field is not a relationship.

Data-point

Type

Description

allows_blank

bool

whether the field accepts blank strings

allows_null

bool

whether the field accepts null values

is_pk

bool

whether the field has a primary key constraint

needs_unique

bool

whether the field has a unique index

default

bool

the value to use if one is not provided

help_text

str

a human-readable description of the field

Note

  • The is_relation data-point is used to render two separate tables for relation and value fields.

  • The allows_null data-point is not relevant for relations as they are always nullable.

  • The max_length data-point is not captured because it is automatically covered under the db_type data-point.

  • is_pk, needs_unique, allows_null and allows_blank are presented collectively as “constraints” although not technically DB-level constraints.

The information extracted from this introspection will be stored in memory for the next stage.

Merge with additional notes

Additional notes or hardcoded information may be extra information about a field such as known inconsistencies or special cases. This information, if documented using a consistent format, can be used to augment the information extracted in the previous step.

We will document this information in the docstring of the model classes from where it can be accessed using inspect.getdoc. To ensure that these notes are co-located with the fields’ code, we will parse docstrings of all ancestors and mixins of a model class.

Additional information about fields will be specific inside the model docstring as valid YAML under the “Properties” heading. Note the format should be exactly as follows with “Properties” being the final subheading.

class Media(AbstractMedia):
    """
    This is a generic media model. In the real world, assume this to be
    ``Image`` or ``Audio``. It can have its own docstring here.

    Properties
    ==========
    title: >-
        the title of the media; This documentation can span multiple lines and
        _can contain Markdown_ as it will be injected as-is into the doc page.
    """

    pass

Generate documentation page

With all information in-memory about the media properties, the output will be written to a page by the management command. This requires some restructuring in the pages created for the catalog media properties documentation.

The final page will be written to a Markdown file at the path documentation/meta/media_properties/api.md. To include this Markdown file an index page will be required at documentation/meta/media_properties/index.md, which will itself be included in the documentation/meta/index.md file.

Note

If the catalog media properties documentation is already created at documentation/meta/media_properties.md by the time this implementation plan is implemented, it will need to be moved one-level deeper to documentation/meta/media_properties/catalog.md.

The final documentation hierarchy would look as follows.

documentation
└── meta
    ├── index.md
    └── media_properties
        ├── index.md (new, not autogenerated)
        ├── api.md (new, autogenerated)
        ├── catalog.md (not in this IP)
        └── frontend.md (not in this IP)

Similar to the catalog documentation, the output document will include the preamble.md file from api/api/docs/media_properties/preamble.md. This will be followed by the Markdown generated from the media properties as a table.

Verify freshness using CI

Using the existing precedent for API recipes being used in the API tests in the CI + CD workflow (django-checks), we will invoke gen-media-props and check for a zero return status.

Workstreams

The entire plan can be implemented in one stream. A lot of components of the plan are shared with the catalog media properties documentation but code reuse between the two projects will be limited as they follow different idioms.

The project assumes that the implementation of the catalog documentation project will be completed before this project begins, and includes some reorganisation of the output of that project. If that is not the case, the reorganisation, described in section “Generate documentation page” above, can be incorporated into the ongoing work on that project.

Dependencies

We will be using Django’s own introspection capabilities. No additional libraries other than the already installed Django packages will be needed.

The documentation generated will be added as a page to documentation site without needed any new packages.

Alternatives

The approach for the catalog uses SQL files. Unlike that, we will be using Django’s introspection capabilities which are quite powerful and allow for document generation from the models themselves. With Django’s migration system, we can trust that the output of the introspection will be synced to the state of the database.

The approach for the catalog uses a media_props.md file for additional notes. Unlike that we will be documenting notes inside the model docstrings because it allows for the documentation to be close to the code and for the documentation to be updated alongside the code.

Although we have used utility scripts for the catalog documentation, for the API we will be done using a management command as that is more idiomatic and gives us access to the Django ORM models.

Future improvements

This implementation plan does not document the schema for Django’s JSONField (jsonb) in much detail. This is because of several complications regarding JSON data.

  • JSON fields do not impose or enforce a schema. We can set expectations about the schema coming out of the field but that currently does not exist and will be unenforceable at the DB layer or in Python.

  • JSON can contain different representations of emptiness, such as a field being null or being absent in the JSON. We need to define a standard and stick to it or convert them into a standard version in Python.

  • Introspecting serializers to get the JSON field documentation is not accurate because serializers describe our expectations about the data and not the reality in the database. This defeats our ability to prevent errors that occur when the DB does not contain what the serializer expects.

For now, it is possible to document the shape of the JSON fields manually in the additional notes. In future iterations, we can try to use techniques to manage the data going into the JSON field during data-refresh.

  • validation in clean/save

  • validation in subclass of JSONField

  • CHECK constraints in the DB

  • pre-(insert/update) triggers in PostgreSQL

These techniques can be used to ensure consistency and to document the shape of these fields. Since that scope goes beyond “documenting” media properties, we can consider it as a separate project or, by redefining the scope of this one, we can consider it as a separate implementation plan.

Prior art