descriptive image

Fun with Solr Queries

Apache Solr is behind many of our systems that provide a way to search and browse via a web application (such as the Duke Digital Repository, parts of our Bento search application, and the not yet public next generation TRLN Discovery catalog). It’s a tool for indexing data and provides a powerful query API. In this post I will document a few Solr querying techniques that might be interesting or useful. In some cases I won’t be able to provide live links to queries because we restrict direct access to Solr. However, many of these Solr querying techniques can be used directly in an application’s search box. In those cases, I will include live links to example queries in the Duke Digital Repository.

Find a list of items from their identifiers.

With this query you can specify exactly what items you want to appear in a search result from a list of identifiers.

Query

id:"duke:448098" OR id:"duke:282429" OR id:"duke:142581"
Try it in the Duke Digital Repository

Find all records that have a value (any value) in a specific field.

This query will find all the items in the repository that have a value in the product field. (As with most of these queries, you must know the field name in Solr.)

Query

product_tesim:*
Try it in the Duke Digital Repository

Find all the items in the repository that are missing a field value.

You can find all items in the repository that don’t have any date metadata. Inquiring minds want to know.

Query

-date_tesim:[* TO *]
Try it in the Duke Digital Repository

Find items using a begins-with (left-anchored) query.

I want to see all items that have a subject term that begins with “Soviet Union.” The example is a left-anchored query and will exactly match fields that begin with “Soviet Union.” (Note, the field must not be tokenized for this to work as expected.)

Query

subject_facet_sim:/Soviet Union.*/
Try it in the Duke Digital Repository

Find items with an ends-with (right-anchored) query.

Again, this will only work as expected with an untokenized field.

Query

subject_facet_sim:/.*20th century/
Try it in the Duke Digital Repository

Some of you might have noticed that these queries look a lot like regular expressions. And you’re right! Read more about Solr’s support for regular expression queries.

The following examples require direct access to Solr, which is restricted to authorized users and applications. Instead of providing live links, I’ll show the basic syntax, a complete example query using http://localhost:8983/solr/core/* as the sample URL for a Solr index, and a sample response from Solr.

Count instances of values in a field.

I want to know how many items in the repository have a workflow state of published and how many are unpublished. To do that I can write a facet query that will count instances of each value in the specified field. (This is another query that will only work as expected with an untokenized field.)

Query

http://localhost:8983/solr/core/select?q=*:*&facet=true&facet.field=workflow_state_ssi&facet.mincount=1&fl=id

Solr Response (truncated)


...
<lst name="facet_counts">
<lst name="facet_queries"/>
<lst name="facet_fields">
<lst name="workflow_state_ssi">
<int name="published">484075</int>
<int name="unpublished">2228</int>
</lst>
</lst>
</lst>
...

Collapse multiple records into one result based on a shared field value.

This one is somewhat advanced and likely only useful in particular circumstance. But if you had multiple records that were slight variants of each other, and wanted to collapse each variant down to a single result you can do that with a collapse query — as long as the records you want to collapse share a value.

Query

http://localhost:8983/solr/core/select?q=*:*&fq={!collapse%20field=oclc_number%20nullPolicy=expand%20max=termfreq(institution_f,duke)}

  • !collapse instructs Solr to use the Collapsing Query Parser.
  • field=oclc_number instructs Solr to collapse records that share the same value in the oclc_number field.
  • nullPolicy=expand instructs Solr to return any document without a matching OCLC as part of the result set. If this is excluded then records that don’t share an oclc_number with another record will be excluded from the results.
  • max=termfreq(institution,duke) instructs Solr to select as the representative record when collapsing multiple records the one that has the value “duke” in institution field.

CSV response writer (or JSON, Ruby, etc.)

Solr has a number of tricks up its sleeve when it comes to returning results. By default it will return results as XML. You can also specify JSON, or Ruby. You specify a response writer by adding the wt parameter to the URL (wt=json or wt=ruby, etc.).

Solr will also return results as a CSV file, which can then be opened in an Excel spreadsheet — a useful feature for working with metadata.

Query

http://localhost:8983/solr/core/select?q=sun&wt=csv&fl=id,title_tesim

Solr Response

id,title_tesim
duke:194006,Sun Bowl...Sun City...
duke:194002,Sun Bowl...Sun City...
duke:194009,Sun Bowl...Sun City.
duke:194019,Sun Bowl...Sun City.
duke:194037,"Sun City\, Sun Bowl"
duke:194036,"Sun City\, Sun Bowl"
duke:194030,Sun City
duke:194073,Sun City
duke:335601,Sun Control
duke:355105,Proved! Fast starts at 30° below zero!

This is just a small sample of useful ways you can query Solr.