Search
Learn how to search documents with CBElasticsearch
The SearchBuilder
object offers an expressive syntax for crafting detailed searches with ranked results. To perform a simple search for matching documents documents, using Elasticsearch's automatic scoring, we would use the SearchBuilder
like so:
By default this search will return an array of Document
objects ( or an empty array if no results are found ), with a descending match score as the sort.
To output the results of our search, we would use a loop, accessing the Document
methods:
The "memento" is our structural representation of the document. We can also use the built-in method of the Document object:
Search matching
Exact matching
The term()
method allows a means of specifying an exact match of all documents in the search results. An example use case might be only to search for active documents:
Or a date:
Boosting individual matches
The match()
method of the SearchBuilder
also allows for a boost
argument. When provided, results which match the term will be ranked higher in the results:
In the above example, documents with a name
field containing "Elasticsearch" would be boosted in score higher than those which only find the value in the short or long description.
Wildcards
There are times when you want to be able to match a portion of a keyword
-mapped field in elasticsearch. The wildcard
method allows you to do this. Let's say I wanted to match any documents with a name
key containing Elastic
. I could use the following method to match those documents:
This would match any documents with a name
keyword field containing Elasticsearch
or Elasticache
. It is important to note that wildcard queries are exceptionally slow, compared to term
/must
/should
queries, as they require recursion through the entire index of document values to obtain their matches.
We can also boost matches and make this a conditional to an existing query:
In the above query we change the operator
argument for the wildcard query to "should" to ensure that the match becomes an "or" for the short description or the wildcard. In addition, we boost the wildcard results 5 times above the short description matched results.
Sorting Results
The sort()
method also allows you to specify custom sort options. To sort by author last name, instead of score, use:
While our documents would still be scored, the results order would be changed to the specified alphabetical order on the author's last name.
The sort()
method also accepts a full sort config:
Calling .sort()
multiple times will append the sort configurations to allow fine-tuning the sort order:
For more information on sorting search results, check out Elasticsearch: Sort search results
Paging Through Query Results
The size
and from
search options allow adjusting the page size and start row, respectively, of the configured search:
The number of matched documents will be in the SearchResult
's getHitCount()
value:
Be sure to read the Elasticsearch "Paginate Search Results" documentation, as paging too deeply can adversely affect CPU and memory usage.
Script Fields
SearchBuilder also supports Elasticsearch script fields, which allow you to evaluate field values at search time for each document hit:
This will result in an "interestCost"
field in the fields
property on the Document
object:
Runtime Fields
Elasticsearch also supports defining runtime fields, which are fields defined in the index mapping but populated at search time via a script. You can define these in the index mapping, or define them at search time.
See Managing-Indices for more information on creating runtime fields.
Runtime fields can be fetched via the setFields()
or addField()
methods, and will appear in the Document
object's fields
struct. This example retrieves the "fuel_usage_in_mpg"
runtime field as well as the indexed "make"
and "model"
fields:
Once you have a search response, you can use the .getFields()
method to retrieve the specified fields from the search document:
To access document fields
as well as the _source
properties, usehit.getDocument( includeFields = true)
:
Define Runtime Fields At Search Time
Elasticsearch also allows you to define runtime fields at search time, and unlike script fields these runtime fields are available to use in aggregations, search queries, and so forth.
Using .addField()
ensures the field is returned with the document upon query completion:
We can then retrieve the result field via the getFields()
method:
or inlined with the document mento using hit.getDocument( includeFields = true )
.
Advanced Query DSL
The SearchBuilder also allows full use of the Elasticsearch query language, allowing full configuration of your search queries. There are several methods to provide the raw query language to the Search Builder. One is during instantiation.
In the following we are looking for matches of active records with "Elasticsearch" in the name
, description
, or shortDescription
fields. We are also looking for a phrase match of "is awesome" and are boosting the score of the applicable document, if found.
After instantion, you can use the .param()
and .bodyParam()
methods to set query parameters and body parameters, respectively.
For more information on Elasticsearch query DSL, the Search in Depth Documentation is an excellent starting point.
Collapsing Results
The collapseToField
allows you to collapse the results of the search to a specific field. The data return includes the first matched, most relevant, document found with the collapsed field. When field collapsing is specified, an automatic aggregation will be run, which provides a pagination total for the collapsed document counts. When paginating collapsed fields, you will want to use the SearchResult
method getCollapsedCount()
as your total record count rather than the usual getHitCount()
- which returns all documents matched to the query.
Let's say, for example, we want to find the most recent version of a book in our index, for all books matching the phrase "Elasticsearch". In this case, we can group on the title
field ( or, in this case title.keyword
, which is a dynamic keyword-typed field in our index ) to retrieve the most recent version of the book.
There is also an option to include the number of ocurrences of each collapsed field in the results. When the argument includeOccurrences=true
is passed to collapseToField
you can retrieve a map of all collapsed key values and their corresponding document count by calling searchResult.getCollapsedOccurrences()
.
For more information on field collapsing, see the Collapse Search Results Documentation.
Get Collapsed Ocurrences
collapseToField()
also supports an includeOccurrences
option. By passing includeOccurrences=true
to collapseToField
, you can retrieve a map of all collapsed key values and their corresponding document count by calling searchResult.getCollapsedOccurrences()
:
For more information on field collapsing, see the [Collapse Search Results Documentation](https://www.elastic.co/guide/en/elasticsearch/reference/current/collapse-search-results.html).
Counting Documents
Sometimes you only need a count of matching documents, rather than the results of the query. When this is the case, you can call the count()
method from the search builder ( or using the client ) to only return the number of matched documents and omit the result set and metadata:
Highlights
ElasticSearch has the ability to highlight the portion of a document that matched. This is useful for showing context on why certain search results were returned. You can add an ElasticSearch highlight struct to your SearchBuilder
using the highlight
method. The struct should take the shape outlined on the ElasticSearch website.
Terms Enum
On occasion, you may wish to show a set of terms matching a partial string. This is similar to aggregations, only filtered by the provided string and intended for autocompletion.
To retrieve this data, you can use the client's getTermsEnum()
method:
For advanced lookups, you can use the second argument to pass a struct of custom options:
Term Vectors
The "Term Vectors" Elasticsearch API allows you to retrieve information and statistics for terms in a specific document field. This could be useful for finding the most common term in a book description, or retrieving all terms with a minimum word length from the book title.
Retrieving Term Vectors By Document ID
To retrieve term vectors for a known document ID, pass the index name, id, and an array or list of fields to pull from:
You can fine-tune the request using the options
argument:
See the query parameters documentation for more configuration options.
Retrieving Term Vectors By Payload
If you wish to analyze a payload (not an existing document) you can pass a "doc"
payload in the options
argument:
SearchBuilder Term Vector Fetch
The SearchBuilder object also offers a getTermVectors()
method for convenience:
SearchBuilder
Function Reference
SearchBuilder
Function Referencenew([string index], [string type], [struct properties])
- Populates a new SearchBuilder object.reset()
- Clears the SearchBuilder and resets the DSLdeleteAll()
- Deletes all documents matching the currently built search query.execute()
- Executes the built searchgetDSL()
- Returns a struct containing the assembled Elasticsearch query DSLmatch(string name, any value, [numeric boost], [struct options], [string matchType='any'])
- Applies a match requirement to the search builder query.multiMatch( array names, any value, [numeric boost], [type="best_fields"])
- Search an array of fields with a given search value.dateMatch( string name, string start, string end, [numeric boost])
- Adds a date range match.mustMatch(string name, any value, [numeric boost])
-must
query alias for match().mustNotMatch(string name, any value, [numeric boost])
-must_not
query alias for match().shouldMatch(string name, any value, [numeric boost])
-should
query alias for match().sort(any sort, [any sortConfig])
- Applies a custom sort to the search query.term(string name, any value, [numeric boost])
- Adds an exact value restriction ( elasticsearch: term ) to the query.aggregation(string name, struct options)
- Adds an aggregation directive to the search parameters.collapseToField( string field, struct options, boolean includeOccurrences = false )
- Collapses the results to the single field and returns only the most relevant/ordered document matched on that field.
Last updated