Documents
Learn how to create and manage Elasticsearch document records using CBElasticsearch
Documents are the searchable, serialized objects within your indexes. As noted above, documents may be assigned a type, allowing separation of schema, while still maintaining searchability across all documents in the index. Within an index, each document is referenced by an _id
value. This _id
may be set manually ( document.setId()
) or, if not provided will be auto-generated when the record is persisted. Note that, if using numeric primary keys for your _id
value, they will be cast as strings on serialization.
Creating a Document
The Document
model is the primary object for creating and working with Documents. Let's say, again, we were going to create a new document in our index. We would do so, by first creating a Document
object.
In addition to population during the new method, we could also populate the document schema using other methods:
or by individual setters:
If we want to manually assign the _id
value, we would need to explicitly call setId( myCustomId )
to do so, or would need to provide an _id
key in the struct provided to the new()
or populate()
methods.
Retrieving documents
To retrieve an existing document, we must first know the _id
value. We can either retrieve using the Document
object or by interfacing with the Client
object directly. In either case, the result returned is a Document
object, i f found, or null if not found.
Using the Document
object's accessors:
Calling the get()
method with explicit arguments:
Calling directly, using the same arguments, from the client:
Updating a Document
Once we've retrieved an existing document, we can simply update items through the Document
instance and re-save them.
You can also pass Document objects to the Client
's save()
method:
Save Documents with an Index Refresh
The refresh parameter also accepts a wait_for
option, which tells Elasticsearch to wait until the next index refresh:
Updating individual document fields
The patch
method of the Client allows a user to update select fields, bypassing the need for a fully retrieved document. This is similar to an UPDATE foo SET bar = 'xyz' WHERE id = :id
query on a relational database. The method requires an index name, identifier and a struct containing the keys to be updated:
Nested keys can also be updated using dot-notation:
Processing Bulk Operations
Bulk Saving of Documents
Builk inserts and updates can be peformed by passing an array of Document
objects to the Client's saveAll()
method:
Update by Query
Let's say, for example, that you need to add a new key, with a default value, to every document in your index where the key does not already exist:
In the above case, we queried for a lack of existence on the isInPrint
key and created all documents which matched to use a default value of false
.
Note that a Painless script containing newlines, tabs, or space indentation will throw a parsing error. To work around this limitation, use CBElasticsearch's Util.formatToPainless( string script )
method to remove newlines and indentation:
Deleting a Document
Deleting documents is similar to the process of saving. The Document
object may be used to delete a single item.
Documents may also be deleted by passing a Document
instance to the client:
Finally, documents may also be deleted by query, using the SearchBuilder
( more below ):
Parameters
The search builder also supports the addition of URL parameters, which may be used to transform or modify the behavior of bulk document actions. Comprehensive lists of these parameters may be found at the official Elasticsearch docs:
Of note are the throttling parameters, which are useful in dealing with large documents and/or indices. By default elasticsearch processes batch operations in groups of 1000 documents. Depending on the size of your documents and the collection, it may be preferable to throttle the batch to a smaller number of documents per batch:
Asynchronous Bulk Operations
Last updated
Was this helpful?