2024-03-25 Implementation Plan: Document all media properties in the API¶
Author: @dhruvkb
Reviewers¶
Project links¶
Overview¶
This implementation plan devises a method for documenting the media properties for media models defined in the API.
Expected Outcomes¶
The outcome of this implementation plan will be a documentation page for the media properties pertaining to the models in the Django API, similar to the documentation for the media properties for the catalog.
This implementation plan will make changes to the location of the catalog media properties documentation so that it can be extended to include information about the API and, in the future, the frontend.
Step-by-step plan¶
The plan consists of the following steps:
Define management command and
justrecipe.Introspect Django models.
Merge with additional notes.
Generate documentation page.
Ensure freshness using CI.
Step details¶
Define management command and just recipe¶
The complete process of generating the documentation will be encapsulated in a
Django management command. This command will be named documentmedia and will
be defined in the api app in the API project.
On its own the command will be insufficient to write the documentation page to the documentation project as it will run inside the Django context and will not have access to the filesystem. This management command will write the documentation to the API project root.
We will invoke this command from a just recipe which will move the file to its
destination, lint it and also function as the CI test which returns a non-zero
exit code when the file is modified in the process. This recipe will be named
generate-docs with the doc argument set to "media-props" to allow for more
autogenerated documentation in the future.
Introspect Django models¶
All media models in the Django API inherit from the AbstractMedia class. This
class be used to obtain a list of all media models using the __subclasses__
attribute. Django models can be introspected using the _meta property on the
model class and their field list can be obtained using _meta.get_fields().
Using introspection we can obtain a list of all fields on the model, and then extract the following data points for each field.
Data-point |
Type |
Description |
|---|---|---|
|
str |
the name of the field |
|
str |
the class name of the field |
|
str |
the database type of the field |
|
bool |
whether the field is relation to other models |
The following data points are only present (and relevant) if the field is a relationship.
Data-point |
Type |
Description |
|---|---|---|
|
str |
the one/many to one/many nature of the relationship |
|
str |
the name of the related model |
The following datapoints are only present (and relevant) if the field is not a relationship.
Data-point |
Type |
Description |
|---|---|---|
|
bool |
whether the field accepts blank strings |
|
bool |
whether the field accepts null values |
|
bool |
whether the field has a primary key constraint |
|
bool |
whether the field has a unique index |
|
bool |
the value to use if one is not provided |
|
str |
a human-readable description of the field |
Note
The
is_relationdata-point is used to render two separate tables for relation and value fields.The
allows_nulldata-point is not relevant for relations as they are always nullable.The
max_lengthdata-point is not captured because it is automatically covered under thedb_typedata-point.is_pk,needs_unique,allows_nullandallows_blankare presented collectively as “constraints” although not technically DB-level constraints.
The information extracted from this introspection will be stored in memory for the next stage.
Merge with additional notes¶
Additional notes or hardcoded information may be extra information about a field such as known inconsistencies or special cases. This information, if documented using a consistent format, can be used to augment the information extracted in the previous step.
We will document this information in the docstring of the model classes from
where it can be accessed using inspect.getdoc. To ensure that these notes are
co-located with the fields’ code, we will parse docstrings of all ancestors and
mixins of a model class.
Additional information about fields will be specific inside the model docstring as valid YAML under the “Properties” heading. Note the format should be exactly as follows with “Properties” being the final subheading.
class Media(AbstractMedia):
"""
This is a generic media model. In the real world, assume this to be
``Image`` or ``Audio``. It can have its own docstring here.
Properties
==========
title: >-
the title of the media; This documentation can span multiple lines and
_can contain Markdown_ as it will be injected as-is into the doc page.
"""
pass
Generate documentation page¶
With all information in-memory about the media properties, the output will be written to a page by the management command. This requires some restructuring in the pages created for the catalog media properties documentation.
The final page will be written to a Markdown file at the path
documentation/meta/media_properties/api.md. To include this Markdown file an
index page will be required at documentation/meta/media_properties/index.md,
which will itself be included in the documentation/meta/index.md file.
Note
If the catalog media properties documentation is already created at
documentation/meta/media_properties.md by the time this implementation plan is
implemented, it will need to be moved one-level deeper to
documentation/meta/media_properties/catalog.md.
The final documentation hierarchy would look as follows.
documentation
└── meta
├── index.md
└── media_properties
├── index.md (new, not autogenerated)
├── api.md (new, autogenerated)
├── catalog.md (not in this IP)
└── frontend.md (not in this IP)
Similar to the catalog documentation, the output document will include the
preamble.md file from api/api/docs/media_properties/preamble.md. This will
be followed by the Markdown generated from the media properties as a table.
Verify freshness using CI¶
Using the existing precedent for API recipes being used in the API tests in the
CI + CD workflow
(django-checks),
we will invoke gen-media-props and check for a zero return status.
Workstreams¶
The entire plan can be implemented in one stream. A lot of components of the plan are shared with the catalog media properties documentation but code reuse between the two projects will be limited as they follow different idioms.
The project assumes that the implementation of the catalog documentation project will be completed before this project begins, and includes some reorganisation of the output of that project. If that is not the case, the reorganisation, described in section “Generate documentation page” above, can be incorporated into the ongoing work on that project.
Dependencies¶
We will be using Django’s own introspection capabilities. No additional libraries other than the already installed Django packages will be needed.
The documentation generated will be added as a page to documentation site without needed any new packages.
Alternatives¶
The approach for the catalog uses SQL files. Unlike that, we will be using Django’s introspection capabilities which are quite powerful and allow for document generation from the models themselves. With Django’s migration system, we can trust that the output of the introspection will be synced to the state of the database.
The approach for the catalog uses a media_props.md file for additional notes.
Unlike that we will be documenting notes inside the model docstrings because it
allows for the documentation to be close to the code and for the documentation
to be updated alongside the code.
Although we have used utility scripts for the catalog documentation, for the API we will be done using a management command as that is more idiomatic and gives us access to the Django ORM models.
Future improvements¶
This implementation plan does not document the schema for Django’s JSONField
(jsonb) in much detail. This is because of several complications regarding
JSON data.
JSON fields do not impose or enforce a schema. We can set expectations about the schema coming out of the field but that currently does not exist and will be unenforceable at the DB layer or in Python.
JSON can contain different representations of emptiness, such as a field being
nullor being absent in the JSON. We need to define a standard and stick to it or convert them into a standard version in Python.Introspecting serializers to get the JSON field documentation is not accurate because serializers describe our expectations about the data and not the reality in the database. This defeats our ability to prevent errors that occur when the DB does not contain what the serializer expects.
For now, it is possible to document the shape of the JSON fields manually in the additional notes. In future iterations, we can try to use techniques to manage the data going into the JSON field during data-refresh.
validation in
clean/savevalidation in subclass of
JSONFieldCHECKconstraints in the DBpre-(insert/update) triggers in PostgreSQL
These techniques can be used to ensure consistency and to document the shape of these fields. Since that scope goes beyond “documenting” media properties, we can consider it as a separate project or, by redefining the scope of this one, we can consider it as a separate implementation plan.