SUPERSEDED 2023-07-20 Implementation Plan: Additional Search Views¶
Author: @obulat
Reviewers¶
[x] @zackkrida
[x] @sarayourfriend
Warning
This is the original version. For the current version, see the updated plan.
Project links¶
Expected Outcomes¶
API endpoints return all media with the selected tag, from the selected source or by the selected creator, sorted by date added to Openverse.
Frontend allows to browse media items by a selected creator, source, or with a selected tag.
The single result pages link to these collection views; the external links are also updated to clearly show that they are external.
Step-by-step plan¶
Update the Elasticsearch index to enable exact matching of the
tag,sourceandcreatorfields (both the query analyzer and the index analyzer). This will require reindexing.Add API endpoints for exact matching of the
tag,sourceandcreatorfields.Create the new components:
VCollectionHeader,VCollectionLinkandVTag.Update the store and utils used to construct the API query to allow for searching by
tag,creatororsource, in addition to the current search by title/description/tags combination.Add a switchable “additional_search_views” feature flag.
Create a page for
tag/creator/sourcecollections. The page should handle fetching and updating the search store state.Update the single result pages: tags area, the “creator” and “source” area under the main media item.
Add the Analytics event
VISIT_SOURCE_LINKand update whereVISIT_CREATOR_LINKis sent.Cleanup after the feature flag is removed:
Remove conditional rendering on the single result pages.
Remove the
additional_search_viewsfeature flag andVMediaTagcomponent.
Step details¶
1. Search controller updates¶
Currently, when filtering the search results, the API matches some query parameters in a fuzzy way: an item matches the query if the field value contains the query string as a separate word. When indexing the items, we “analyze” them, which means that we split the field values by whitespace and stem them. We also do the same to the query string. This means that the query string “bike” will match the field value “bikes”, “biking”, or “bike shop”.
For these pages, however, we need an exact match.
One alternative implementation considered when writing this plan was to use the database instead of the Elasticsearch to get the results. This would make it easy to get the exact matches. However, there are some problems with using the database rather than ES to access anything:
The database does not cache queries in the same way that ES does. Repeated queries will not necessarily be as efficient as from ES.
The database does not score documents at all, so the order will different dramatically to the way that ES would order the documents. That’s an issue with respect to popularity data today already, but will become even more of an issue if we start to score documents based on other metrics as theorised by our search relevancy discussions.
creatoris not indexed in the API database, so a query against it will be very slow.
To enable exact matching, we don’t need any changes in Elasticsearch index
because we already have the .keyword fields for creator, source and
tags. We just need to use them in the query. This will allow for exact
matching of the values (e.g. bike will not match bikes or biking), and
will probably make the search more performant since the fields and the query
won’t need to be analyzed and/or stemmed.
The search controller’s search method should be refactored to be smaller and
allow for more flexibility when creating the search query. The current
implementation of query building consists of 3 steps.
We first apply the filters: if the query string has any parameters other
than q, we use them for exact matches that must be in the search results, or
must be excluded from the search results (if the parameter starts with
exclude_).
Then, if q parameter is present, we apply the q parameter, which is a
full-text search within tags, title and description fields. This is a
fuzzy search, which means that the query string is stemmed and analyzed, and the
field values are stemmed and analyzed, and the documents are scored based on the
relevance of the match. If q is not present, but one of the
creator/source/tags parameter is present, we search within those fields
for fuzzy matches.
Finally, we apply the ranking and sorting parameters, and “highlight” the fields that were matched in the results.
The new search controller should allow for using different filters for the first
step and to not use the full-text search. We should also create a new serializer
for collection search requests. It should include the common parameters for
list requests, such as page and page_size, and the parameters for the
exact matches: tag, creator and source.
2. New API endpoints¶
The new routes should use path parameters instead of query parameters for the
tag, creator and source values. This will make the URLs more readable,
easier to share, will be easier to cache or perform cache invalidation required
by #1969. The path parameters should be URL encoded to preserve special
characters and spaces.
Instead of using query strings, we can describe the resource via the path:
/<media type>/source/<source>/creator/<creator> is very clean, easy to read
and understand, and very easy to manage the cache for because it is a static
path. The source page can use the same route by leaving off the creator. This
removes the need to manage specific query params as well and would allow us to
add querying within these routes more easily in the future behind the regular q
parameter if we wanted.
For the tag route, the singular tag rather than plural tags should be used
for legibility since we are presenting a single tag.
The new views should use the same pagination and dead link cleanup as the search views.
3. New and updated components¶
Extract the VAudioCollection component¶
Currently, it is not possible to reuse the audio collection from the audio
search result page because it is a part of the audio.vue page. We should
extract the part that shows the loading skeleton, the column of VAudioTrack
rows and the Load more section into VAudioCollection component. This component
will be reused in the audio search page and on the Additional search views.
Add a VCollectionHeader component¶
The header should have an icon (tag, creator or source) and the name of the tag/creator/source. For source and the creator, there should be an external link button if it’s available (not all creators have urls).
The header should also display the number of results, “251 audio files with the selected tag”, “604 images provided by this source”, “37 images by this creator in Hirshhorn Museum and Sculpture Garden”.
Note: There are sources that only have works by one creator. In this case, we should probably still have two separate pages for the source and the creator, but we might want to add a note that the creator is the only one associated with this source.
Figma links: creator desktop and mobile, source desktop and mobile, tag desktop and mobile.
Add VCollectionLink component¶
This component should be a VButton with as="VLink", should have an icon, and
should accept a localized link to the creator or source page.
Figma link: creator and source buttons
Update links in the “information” section¶
The links to creator in the image and audio single result pages Information section should have an “external link” icon.
Audio creator link should also be updated to match the image creator link. It
should be a conditional component: VLink if the creator_url is truthy and
span if the creator_url is falsy.
Currently, the foreign_landing_url is linked to the “source” in the
image
page and “provider” in the
audio page.
The audio page should be updated to match the image page: the
foreign_landing_url link should be added to the “source”, not provider.
4. Nuxt store and API request changes¶
We can reuse the search store as is for these pages. Currently, the frontend can
perform searches by source parameter. If searchBy value is set, then the q
parameter is replaced with the <searchBy>=<searchTerm> query parameter.
For this project, we should add new values to the searchBy filter.
The API request URL is constructed from the search store state in the
prepare-search-query-params method.
We will need to update this method to use the searchBy filter value to
construct the API request path as described in the “API Changes” section.
Update the searchBy filter¶
The searchBy filter will be used to determine the shape of the API request.
While currently these parameters will be mutually exclusive (we can only search
by one of them), we might want to allow searching by multiple parameters in the
future.
For other filters, we only use toggle method to update the value. However, for
searchBy, we need to be able to check one of the searchBy parameters, and
uncheck the others. To enable that, we should add a new search store method.
If searchBy is set to tag, creator or source, then the media store
should create search path instead of the search query. So, instead of calling
prepareSearchQuery to create the query parameters, it should call
prepareSearchPath to create the path.
const searchPathOrQuery = searchBy
? prepareSearchPath(searchParams, mediaType)
: prepareSearchQuery(searchParams, mediaType)
const prepareSearchPath = (
searchParams: Record<string, string>,
mediaType: SupportedMediaType
) => {
let path
if (searchBy === "tag") {
path = `${mediaType}/tag/${searchTerm}`
} else {
path = `${mediaType}/source/${searchParams[`${mediaType}Provider`]}`
if (searchBy === "creator") {
path += `/creator/${searchTerm}`
}
}
return path
}
5. Add the additional_search_views feature flag¶
The flag should be switchable, and off by default.
6. Create a page for tag / creator /source collections.¶
Nuxt allows creating nested dynamic routes like
/pages/_collection/_mediaType/_term.
We should add the following pages:
/pages/_mediaType/tag/_tag/pages/_mediaType/source/_source/pages/_mediaType/source/_source/creator/_creator(this page might not be needed as it might be handled by the source page)
To make sure that the mediaType, source and creator parameters are valid,
this page should use the
validate method to
make sure that and show an error page if necessary.
function validate({ params, $pinia }): boolean {
const { collection, mediaType, term } = params
// Check that collection is one of ["tag", "creator" or "source"],
// and mediaType is one of `supportedMediaTypes`.
// Check that `term` is correctly escaped.
// If the params are not valid, return `false` to show the error page.
return isValid ? true : false
}
This page should also update the state (searchType, searchTerm and
searchBy and provider filters) in the search store and handle fetching
using mediaStore’s fetchMedia method in the useFetch hook.
Since it is not possible to change the path or query parameters from this page client-side, fetching can be much simpler than on the current search page (that has to watch for changes in the route and fetch if necessary).
This page should use VCollectionHeader and the image grid or the audio
collection.
7. Update the single result pages¶
All of these changes should be conditional on whether the
additional_search_views feature flag is enabled.
The Figma links for new designs:
Update the VCollectionLink area on the single result page¶
The content info line under the main item on the single result page should be replaced with a section that has two buttons: one for a creator link and a source link. This section should be horizontally scrollable on mobile. It should implement a scroll-snap (example: https://play.tailwindcss.com/AbfA33Za50)
Add the information popover next to source and provider links¶
The information popover should be added next to the source and provider links that explains the difference between the source and provider.
8. Additional analytics events¶
Some existing events will already track the new views events. The views can be
tracked as page views, so no separate event is necessary. The only way to access
the pages is directly or via links on the single results, which will all be
captured by standard page visits. Clicking on the items will be tracked as
SELECT_SEARCH_RESULT events. These events can be narrowed by pathname
(/search or /tag\*, for example) to determine where the event occurred.
Two analytics events should be added or updated:
The clicks on external creator link in the
VCollectionHeadershould be tracked asVISIT_CREATOR_LINKevents.We should also a special event for visiting source
VISIT_SOURCE_LINK, similar toVISIT_CREATOR_LINK.
9. Cleanup after the feature flag is enabled in production¶
After the feature flag is enabled in production, we should remove the
conditional rendering on the single result pages and remove the
additional_search_views feature flag and (old) VMediaTag component.
Tests¶
We should add visual-regression tests for the new views. To minimize flakiness
due to slow loading of the images, we should probably use the
{ filter: brightness(0%); } trick
for the images on the page.
The search store tests should be updated to reflect the changes to the filters.
Dependencies¶
Infrastructure¶
These views potentially might cause more load on our infrastructure due to increase in scraping activity.
Tools & packages¶
No new tools or packages are necessary.
Other projects or work¶
Not applicable.
Design¶
Parallelizable streams¶
The API changes can be done independently of the frontend changes, although they should be finished before the final testing of the frontend changes.
Adding the new components (step 3), Nuxt store update (step 4) and the
additional_search_views feature flag (step 5) can be done in parallel, and are
not dependent on anything.
The work on the single result pages (step 7) can be done in parallel with the work on the collection pages (step 6), but should follow the previous steps.
Blockers¶
The main blocker could be the maintainer capacity.
Accessibility¶
We should make sure that the search titles are accessible, and the pages clearly indicate the change of context.
Rollback¶
To roll back the changes, we would need to set the feature flag to OFF.
Risks¶
The biggest risk I see is that this project might be seen as an “invitation” to scraping. Hopefully, frontend rate limiting and the work on providing the dataset would minimize such risks.