2023-07-20 Implementation Plan: Additional Search Views#

Author: @obulat

Reviewers#

Expected Outcomes#

API endpoints return all media with the selected tag, from the selected source or by the selected creator, sorted by date added to Openverse.

Frontend allows to browse media items by a selected creator, source, or with a selected tag.

The single result pages link to these collection views; the external links are also updated to clearly show that they are external.

Step-by-step plan#

  1. Update the Elasticsearch index to enable exact matching of the tag, source and creator fields (both the query analyzer and the index analyzer). This will require reindexing.

  2. Add API endpoints for exact matching of the tag, source and creator fields.

  3. Create the new components: VCollectionHeader, VCollectionLink and VTag.

  4. Update the store and utils used to construct the API query to allow for searching by tag, creator or source, in addition to the current search by title/description/tags combination.

  5. Add a switchable “additional_search_views” feature flag.

  6. Create a page for tag / creator /source collections. The page should handle fetching and updating the search store state.

  7. Update the single result pages: tags area, the “creator” and “source” area under the main media item.

  8. Add the Analytics event VISIT_SOURCE_LINK and update where VISIT_CREATOR_LINK is sent.

  9. Cleanup after the feature flag is removed:

    • Remove conditional rendering on the single result pages.

    • Remove the additional_search_views feature flag and VMediaTag component.

Step details#

1. Search controller updates#

Currently, when filtering the search results, the API matches some query parameters in a fuzzy way: an item matches the query if the field value contains the query string as a separate word. When indexing the items, we “analyze” them, which means that we split the field values by whitespace and stem them. We also do the same to the query string. This means that the query string “bike” will match the field value “bikes”, “biking”, or “bike shop”.

For these pages, however, we need an exact match.

One alternative implementation considered when writing this plan was to use the database instead of the Elasticsearch to get the results. This would make it easy to get the exact matches. However, there are some problems with using the database rather than ES to access anything:

  • The database does not cache queries in the same way that ES does. Repeated queries will not necessarily be as efficient as from ES.

  • The database does not score documents at all, so the order will different dramatically to the way that ES would order the documents. That’s an issue with respect to popularity data today already, but will become even more of an issue if we start to score documents based on other metrics as theorised by our search relevancy discussions.

  • creator is not indexed in the API database, so a query against it will be very slow.

To enable exact matching, we don’t need any changes in Elasticsearch index because we already have the .keyword fields for creator, source and tags. We just need to use them in the query. This will allow for exact matching of the values (e.g. bike will not match bikes or biking), and will probably make the search more performant since the fields and the query won’t need to be analyzed and/or stemmed.

The search controller’s search method should be refactored to be smaller and allow for more flexibility when creating the search query. The current implementation of query building consists of 3 steps.

We first apply the filters: if the query string has any parameters other than q, we use them for exact matches that must be in the search results, or must be excluded from the search results (if the parameter starts with exclude_).

Then, if q parameter is present, we apply the q parameter, which is a full-text search within tags, title and description fields. This is a fuzzy search, which means that the query string is stemmed and analyzed, and the field values are stemmed and analyzed, and the documents are scored based on the relevance of the match. If q is not present, but one of the creator/source/tags parameter is present, we search within those fields for fuzzy matches.

Finally, we apply the ranking and sorting parameters, and “highlight” the fields that were matched in the results.

The new search controller should allow for using different filters for the first step and to not use the full-text search. We should also create a new serializer for collection search requests. It should include the common parameters for list requests, such as page and page_size, and the parameters for the exact matches: tag, creator and source.

2. New API endpoints#

The new routes should use path parameters instead of query parameters for the tag, creator and source values. This will make the URLs more readable, easier to share, will be easier to cache or perform cache invalidation required by #1969. The path parameters should be URL encoded to preserve special characters and spaces.

Instead of using query strings, we can describe the resource via the path: /<media type>/source/<source>/creator/<creator> is very clean, easy to read and understand, and very easy to manage the cache for because it is a static path. The source page can use the same route by leaving off the creator. This removes the need to manage specific query params as well and would allow us to add querying within these routes more easily in the future behind the regular q parameter if we wanted.

For the tag route, the singular tag rather than plural tags should be used for legibility since we are presenting a single tag.

The new views should use the same pagination and dead link cleanup as the search views.

3. New and updated components#

Extract the VAudioCollection component#

Currently, it is not possible to reuse the audio collection from the audio search result page because it is a part of the audio.vue page. We should extract the part that shows the loading skeleton, the column of VAudioTrack rows and the Load more section into VAudioCollection component. This component will be reused in the audio search page and on the Additional search views.

Add a VCollectionHeader component#

The header should have an icon (tag, creator or source) and the name of the tag/creator/source. For source and the creator, there should be an external link button if it’s available (not all creators have urls).

The header should also display the number of results, “251 audio files with the selected tag”, “604 images provided by this source”, “37 images by this creator in Hirshhorn Museum and Sculpture Garden”.

Note: There are sources that only have works by one creator. In this case, we should probably still have two separate pages for the source and the creator, but we might want to add a note that the creator is the only one associated with this source.

Figma links: creator desktop and mobile, source desktop and mobile, tag desktop and mobile.

4. Nuxt store and API request changes#

We can reuse the search store as is for these pages. Currently, the frontend can perform searches by source parameter. If searchBy value is set, then the q parameter is replaced with the <searchBy>=<searchTerm> query parameter.

For this project, we should add new values to the searchBy filter.

The API request URL is constructed from the search store state in the prepare-search-query-params method. We will need to update this method to use the searchBy filter value to construct the API request path as described in the “API Changes” section.

Update the searchBy filter#

The searchBy filter will be used to determine the shape of the API request. While currently these parameters will be mutually exclusive (we can only search by one of them), we might want to allow searching by multiple parameters in the future.

For other filters, we only use toggle method to update the value. However, for searchBy, we need to be able to check one of the searchBy parameters, and uncheck the others. To enable that, we should add a new search store method.

If searchBy is set to tag, creator or source, then the media store should create search path instead of the search query. So, instead of calling prepareSearchQuery to create the query parameters, it should call prepareSearchPath to create the path.

const searchPathOrQuery = searchBy
  ? prepareSearchPath(searchParams, mediaType)
  : prepareSearchQuery(searchParams, mediaType)

const prepareSearchPath = (
  searchParams: Record<string, string>,
  mediaType: SupportedMediaType
) => {
  let path
  if (searchBy === "tag") {
    path = `${mediaType}/tag/${searchTerm}`
  } else {
    path = `${mediaType}/source/${searchParams[`${mediaType}Provider`]}`
    if (searchBy === "creator") {
      path += `/creator/${searchTerm}`
    }
  }
  return path
}

5. Add the additional_search_views feature flag#

The flag should be switchable, and off by default.

6. Create a page for tag / creator /source collections.#

Nuxt allows creating nested dynamic routes like /pages/_collection/_mediaType/_term.

We should add the following pages:

  • /pages/_mediaType/tag/_tag

  • /pages/_mediaType/source/_source

  • /pages/_mediaType/source/_source/creator/_creator (this page might not be needed as it might be handled by the source page)

To make sure that the mediaType, source and creator parameters are valid, this page should use the validate method to make sure that and show an error page if necessary.

function validate({ params, $pinia }): boolean {
  const { collection, mediaType, term } = params
  // Check that collection is one of ["tag", "creator" or "source"],
  // and mediaType is one of `supportedMediaTypes`.
  // Check that `term` is correctly escaped.
  // If the params are not valid, return `false` to show the error page.
  return isValid ? true : false
}

This page should also update the state (searchType, searchTerm and searchBy and provider filters) in the search store and handle fetching using mediaStore’s fetchMedia method in the useFetch hook.

Since it is not possible to change the path or query parameters from this page client-side, fetching can be much simpler than on the current search page (that has to watch for changes in the route and fetch if necessary).

This page should use VCollectionHeader and the image grid or the audio collection.

7. Update the single result pages#

All of these changes should be conditional on whether the additional_search_views feature flag is enabled.

The Figma links for new designs:

Update the tags area on the single result page#

The tags area should be collapsible to make long lists of tags collapsible: https://github.com/WordPress/openverse/issues/2589

8. Additional analytics events#

Some existing events will already track the new views events. The views can be tracked as page views, so no separate event is necessary. The only way to access the pages is directly or via links on the single results, which will all be captured by standard page visits. Clicking on the items will be tracked as SELECT_SEARCH_RESULT events. These events can be narrowed by pathname (/search or /tag\*, for example) to determine where the event occurred.

Two analytics events should be added or updated:

  • The clicks on external creator link in the VCollectionHeader should be tracked as VISIT_CREATOR_LINK events.

  • We should also a special event for visiting source VISIT_SOURCE_LINK, similar to VISIT_CREATOR_LINK.

9. Cleanup after the feature flag is enabled in production#

After the feature flag is enabled in production, we should remove the conditional rendering on the single result pages and remove the additional_search_views feature flag and (old) VMediaTag component.

Tests#

We should add visual-regression tests for the new views. To minimize flakiness due to slow loading of the images, we should probably use the { filter: brightness(0%); } trick for the images on the page.

The search store tests should be updated to reflect the changes to the filters.

Dependencies#

Infrastructure#

These views potentially might cause more load on our infrastructure due to increase in scraping activity.

Tools & packages#

No new tools or packages are necessary.

Other projects or work#

Not applicable.

Design#

Figma designs in the dev mode

Parallelizable streams#

The API changes can be done independently of the frontend changes, although they should be finished before the final testing of the frontend changes.

Adding the new components (step 3), Nuxt store update (step 4) and the additional_search_views feature flag (step 5) can be done in parallel, and are not dependent on anything.

The work on the single result pages (step 7) can be done in parallel with the work on the collection pages (step 6), but should follow the previous steps.

Blockers#

The main blocker could be the maintainer capacity.

Accessibility#

We should make sure that the search titles are accessible, and the pages clearly indicate the change of context.

Rollback#

To roll back the changes, we would need to set the feature flag to OFF.

Risks#

The biggest risk I see is that this project might be seen as an “invitation” to scraping. Hopefully, frontend rate limiting and the work on providing the dataset would minimize such risks.

Prior art#