Apache Solr is behind many of our systems that provide a way to search and browse via a web application (such as the Duke Digital Repository, parts of our Bento search application, and the not yet public next generation TRLN Discovery catalog). It’s a tool for indexing data and provides a powerful query API. In this post I will document a few Solr querying techniques that might be interesting or useful. In some cases I won’t be able to provide live links to queries because we restrict direct access to Solr. However, many of these Solr querying techniques can be used directly in an application’s search box. In those cases, I will include live links to example queries in the Duke Digital Repository.
Find a list of items from their identifiers.
With this query you can specify exactly what items you want to appear in a search result from a list of identifiers.
Query
id:"duke:448098" OR id:"duke:282429" OR id:"duke:142581"
Try it in the Duke Digital Repository
Find all records that have a value (any value) in a specific field.
This query will find all the items in the repository that have a value in the product field. (As with most of these queries, you must know the field name in Solr.)
Query
product_tesim:*
Try it in the Duke Digital Repository
Find all the items in the repository that are missing a field value.
You can find all items in the repository that don’t have any date metadata. Inquiring minds want to know.
Query
-date_tesim:[* TO *]
Try it in the Duke Digital Repository
Find items using a begins-with (left-anchored) query.
I want to see all items that have a subject term that begins with “Soviet Union.” The example is a left-anchored query and will exactly match fields that begin with “Soviet Union.” (Note, the field must not be tokenized for this to work as expected.)
Query
subject_facet_sim:/Soviet Union.*/
Try it in the Duke Digital Repository
Find items with an ends-with (right-anchored) query.
Again, this will only work as expected with an untokenized field.
Query
subject_facet_sim:/.*20th century/
Try it in the Duke Digital Repository
Some of you might have noticed that these queries look a lot like regular expressions. And you’re right! Read more about Solr’s support for regular expression queries.
The following examples require direct access to Solr, which is restricted to authorized users and applications. Instead of providing live links, I’ll show the basic syntax, a complete example query using http://localhost:8983/solr/core/*
as the sample URL for a Solr index, and a sample response from Solr.
Count instances of values in a field.
I want to know how many items in the repository have a workflow state of published and how many are unpublished. To do that I can write a facet query that will count instances of each value in the specified field. (This is another query that will only work as expected with an untokenized field.)
Query
http://localhost:8983/solr/core/select?q=*:*&facet=true&facet.field=workflow_state_ssi&facet.mincount=1&fl=id
Solr Response (truncated)
...
<lst name="facet_counts">
<lst name="facet_queries"/>
<lst name="facet_fields">
<lst name="workflow_state_ssi">
<int name="published">484075</int>
<int name="unpublished">2228</int>
</lst>
</lst>
</lst>
...
Collapse multiple records into one result based on a shared field value.
This one is somewhat advanced and likely only useful in particular circumstance. But if you had multiple records that were slight variants of each other, and wanted to collapse each variant down to a single result you can do that with a collapse query — as long as the records you want to collapse share a value.
Query
http://localhost:8983/solr/core/select?q=*:*&fq={!collapse%20field=oclc_number%20nullPolicy=expand%20max=termfreq(institution_f,duke)}
!collapse
instructs Solr to use the Collapsing Query Parser.field=oclc_number
instructs Solr to collapse records that share the same value in the oclc_number field.nullPolicy=expand
instructs Solr to return any document without a matching OCLC as part of the result set. If this is excluded then records that don’t share an oclc_number with another record will be excluded from the results.max=termfreq(institution,duke)
instructs Solr to select as the representative record when collapsing multiple records the one that has the value “duke” in institution field.
CSV response writer (or JSON, Ruby, etc.)
Solr has a number of tricks up its sleeve when it comes to returning results. By default it will return results as XML. You can also specify JSON, or Ruby. You specify a response writer by adding the wt parameter to the URL (wt=json
or wt=ruby
, etc.).
Solr will also return results as a CSV file, which can then be opened in an Excel spreadsheet — a useful feature for working with metadata.
Query
http://localhost:8983/solr/core/select?q=sun&wt=csv&fl=id,title_tesim
Solr Response
id,title_tesim
duke:194006,Sun Bowl...Sun City...
duke:194002,Sun Bowl...Sun City...
duke:194009,Sun Bowl...Sun City.
duke:194019,Sun Bowl...Sun City.
duke:194037,"Sun City\, Sun Bowl"
duke:194036,"Sun City\, Sun Bowl"
duke:194030,Sun City
duke:194073,Sun City
duke:335601,Sun Control
duke:355105,Proved! Fast starts at 30° below zero!
This is just a small sample of useful ways you can query Solr.