Custom Document Transformer on Solr

Amr Ellafy
3 min readJul 12, 2021

Solr is a great platform for data storage, not only search. With Custom Document Transformers you can modify or integrate external data into your result documents and enrich your data.

Custom Document Transformers is a powerful feature in Solr. Document Transformers allows you to modify the results set from a query, by either modifying a specific field in the result document, or adding new field(s).

In my case I saved an image id in the index, xyz.jpg, I wanted to avoid to save an absolute image path so that I can later change the image/file server. This will allow me to switch between ftp, AWS S3, or Azure Blob Storage, or even mix between them, without having to re-index the whole collection.

To do so, I wrote my own Document Transformer using TransformerFactory. The custom TransformerFactory re-write the image id to image absolute url.
I used Cantaloupe to serve images. Cantaloupe is an image server that supports resizing, and pre-processing images. It’s more like a slightly advanced ftp/blob storage made for images :) nothing more.
It’s also possible to send viewport parameters along the Solr query, and Cantaloupe will return a custom image on the fly. (Think searching results from a mobile app vs desktop browser)

Here are the steps to get you started:

1.Start a maven project with required Solr dependencies

2.Implement TransformerFactory class

The TransformerFactory has two methods to override:
- init: This is where you gonna probably inject configuration values, we cover the configuration later in this article.
- create: is responsible for creating the actual Transformer. In this method you can access the query params through SolrParams and SolrQueryRequest sent to the method.

3.Implement DocTransformer class

This is where the transformation happens. Through the transformer you can add/delete/modify a field in the resulting query document.

Here you have to implement two abstract methods:
- getName: the name of the transformer. A static value will work.
- transform: The transformation takes place here! The method receives the instance of the result document SolrDocument, where which your code can call SolrDocument.setField(), remove(), or add(). We will come to this part next.

4.Building the jar

Once you are done with testing, you can deploy your jar to Solr. The jar will be deployed to the lib folder under solr/server/your_collection. The lib folder does not exist and has to be created.
Here is a maven plugin that will make your life easier by deploying the jar directly to the lib folder:

5.Configuring Solr to use the TransformerFactory

The last step is to tell Solr about your custom TransformerFactory.
The code snippet below is added to solrconfig.xml. The configurations values:
- name: the identifier, this the fl query part the invokes the transformer.
- class: the absolute class path to the custom transformer factory.
- parameters: in the form of children nodes. Parameters can be either string (str), int or double.

The parameters can be read in the init method ,mentioned above, as follows:
String base = args.get(“base”).toString();

Putting it all together

Now we should be ready to call the custom transformer:
fl=image_id, image_url:[image f = ‘image’]

Explanation by order:
image; is an existing stored indexed field
image_url; non-existing field to be created by the transformer.
:[image f = ‘image_id’]; syntax for transformers. The transformer name is image, and f is a parameter passed to the transformer.
All parameters are accessible through SolrParams. Parameters should be read in the create methods and passed to the underlying transformer.

--

--