This feature is available in SDK version 4.12.0 and later. See restrictions for current limitations.
From version 4.13.0, DQL supports union and intersect scans for queries with
OR, IN, and AND operators. This allows the query optimizer to use multiple indexes simultaneously in a single query. For example, a query like WHERE status = ‘active’ OR priority = ‘high’ can leverage separate indexes on both status and priority fields, combining results through a union scan.Performance and Query Selectivity
Index performance is inversely related to query selectivity—the percentage of documents returned by your query. Highly selective queries (returning a small percentage of documents) see the most dramatic improvements. For example, a query that returns only 10% of a collection can be approximately 90% faster with an appropriate index, as the index allows skipping the other 90% of documents entirely. Less selective queries benefit less from indexes since most documents still need to be examined.Index Operations Overview
DQL supports the following index operations:| Operation | Description |
|---|---|
CREATE INDEX | Creates a simple on a specified field |
DROP INDEX | Removes an existing index |
USE INDEX | Force a query to use a specific index |
USE DIRECTIVES | Advanced control over index usage and query optimization |
SELECT from system:indexes | Lists all indexes in the database |
Creating Indexes
Simple Index
Create an index on a single field to improve query performance for that field:DQL
color field in the cars collection:
DQL
IF NOT EXISTS clause allows the statement to execute successfully even if an index with the same name already exists on the collection. This is useful for initialization scripts that may run multiple times.
The
IF NOT EXISTS clause only checks if an index with the same name exists - it does not verify the field definition. If an index with the same name already exists (even on a different field), the CREATE INDEX IF NOT EXISTS will succeed without creating a new index or modifying the existing one. To change an index definition, you must explicitly DROP the existing index first before creating the new one.Index Creation Examples
The
CREATE INDEX operation returns an empty result set (no items) upon successful completion. Successful execution without errors indicates that the index was created successfully - if there are any issues (such as duplicate index names or invalid syntax), an error will be thrown. To verify that your index was created successfully, you can query the system:indexes collection.Nested Fields
You can create indexes on nested fields by using dot notation. You can create indexes on registers nested within a map, as long as the value at the end of the path is a scalar.DQL
LIKE
Indexes can also accelerate LIKE queries when searching for prefix patterns:DQL
Indexes are particularly effective for LIKE patterns that start with a fixed prefix (e.g., ‘prefix%’). Patterns starting with wildcards (e.g., ‘%suffix’) cannot leverage indexes effectively.
Automatic Index Selection
By default, DQL automatically chooses the most appropriate index for your queries based on the WHERE clause conditions and ORDER BY expressions:DQL
When a query contains both WHERE and ORDER BY clauses, DQL considers both when selecting the optimal index. An index that satisfies both the filter condition and sort order will be preferred over one that only satisfies the filter.
Operating on Indexes
DQL supports additional options when creating indexes:Viewing Indexes
You can discover all indexes in your database by querying the specialsystem:indexes collection:
DQL
system:indexes collection follows this schema:
Dropping Indexes
Remove an index that is no longer needed:DQL
color_idx index from the cars collection:
DQL
IF EXISTS clause allows the statement to execute successfully even if the index doesn’t exist. This is useful for cleanup scripts that may run multiple times or when you’re unsure if an index exists.
Controlling Index Usage
DQL provides several ways to control index selection in your queries.USE INDEX Syntax
TheUSE INDEX clause is the simple way to force a query to use a specific index. It appears after the collection name in SELECT statements:
DQL
DQL
When to Use USE INDEX
There are scenarios where you might want to force a specific index rather than letting DQL auto-select.- Forcing a specific index when multiple could apply
DQL
- Preferring sort order optimization
DQL
- Working around index statistics
DQL
- Testing and performance comparison
DQL
Advanced: USE DIRECTIVES Syntax
For more advanced scenarios, DQL also supports theUSE DIRECTIVES syntax which allows fine-grained control over query optimization using JSON directives:
DQL
DQL
Comment Directives
Comment directives provide the same functionality using special comments:DQL
DQL
Directive Reference
Collection-specific directives always override global directives. Unrecognized directives are silently ignored, but the JSON must be valid or a parser error will occur.| Directive | Type | Default | Description |
|---|---|---|---|
#index | string, array, or null | auto-select | Specifies which index to use |
#prefer_order | boolean | false | Prefer indexes that optimize ORDER BY over WHERE |
#disable_index_scan | boolean | false | Disable all index scans (global only) |
#max_scans | int | 16 | The maximum number of scans to permit in a statement |
#intersects | int | 4 | The maximum number of intersect scans to permit in a statement branch |
#index directive accepts different value types.
| Value | Behavior |
|---|---|
"index_name" | Use the specified index if available |
["index1", "index2", ...] | Use specified indexes for intersection (v4.13+) |
null or "" | Force collection scan (no indexes) |
[] | Auto-select from available indexes |
Starting in v4.13, you can pass an array of index names to force the use of multiple indexes simultaneously through intersection scans. The indexes will be applied in the order specified for AND conditions. For example:
{"#index":["movies_title_idx", "movies_rated_idx", "movies_year_idx"]}.ix2 being used and not ix1.
#max_inlist and the remaining number of scans below #max_scans at the time the list is processed. #max_inlist should be used to limit the number of scans used for processing of IN-list filters when it is preferable to apply other filters.
Ditto Instance Initialization
When initializing your Ditto instance, you may want to create indexes that are essential for your application’s performance. Since indexes are stored in the database and persist across application restarts, you need to handle cases where indexes might already exist from previous runs.Initialization Patterns
There are three approaches for handling index creation during initialization:- Use IF NOT EXISTS clause - The simplest and most efficient approach (recommended)
- Check before creating - Query
system:indexesto see if the index exists - Create and catch errors - Attempt to create the index and handle duplicate errors gracefully
Initialization Examples
Best Practices for Initialization
- Use IF NOT EXISTS for simplicity - The simplest and most efficient approach for most use cases
- Use the check-first approach for critical indexes - When you need detailed logging and control
- Batch index creation - Create all required indexes during initialization rather than on-demand
- Log index status - Always log whether indexes were created or already existed for debugging
- Handle errors gracefully - Don’t let index creation failures prevent your app from starting unless the indexes are absolutely critical
- Consider startup performance - The IF NOT EXISTS approach is fastest as it requires no additional queries
Error Handling
When working with indexes, you may encounter errors if an index with the same name already exists on a collection. This section covers how to handle such scenarios.Duplicate Index Error
If you attempt to create an index with a name that already exists on the same collection, DQL will throw the following error:- Use IF NOT EXISTS clause - Prevents the error entirely (recommended)
- Check if the index exists first by querying
system:indexes - Drop the existing index before creating a new one (use
DROP INDEX IF EXISTSfor safety) - Use a different index name
- Catch and handle the error in your application code
Error Handling Examples
Best Practices for Error Handling
- Use idempotent index creation: Check if an index exists before attempting to create it
- Implement retry logic: For transient errors, implement appropriate retry mechanisms
- Log errors appropriately: Distinguish between expected errors (like duplicate indexes) and unexpected ones
- Consider using unique naming conventions: Include timestamps or version numbers in index names when appropriate
How it works
General Behavior
- When an index is added to a collection, that collection must be scanned and indexed, which may take time for large collections
- Index directives only suggest which index to consider. DQL still requires a valid reason (WHERE or ORDER BY clause) to actually use the index. An index won’t be used if the query doesn’t benefit from it.
Persistence
- SDK Indexes are retained when Ditto shuts down
- Index names are unique to a collection - creating an index with an existing name in the same collection will raise an error
- To replace an existing index, you must first drop it, then recreate it
Missing Fields
- SDK - Documents that don’t contain the indexed field will be included in index-optimized query results. (i.e. INCLUDE MISSING is implied).
- Ditto Server HTTP API - By default, documents that don’t contain the indexed field will not be included in index-optimized query results. (i.e. EXCLUDE MISSING)
Supported APIs
| API/Feature | Index Support |
|---|---|
execute SDK API | ✅ Supported |
registerObserver SDK API | ✅ Supported |
registerSubscription SDK API | ❌ Not Supported |
| Ditto Server HTTP API | ❌ Not Supported |
| SDKs with in-memory storage | ❌ Not Supported |
Best Practices
- Create indexes on frequently queried fields: Focus on fields that appear often in your WHERE clauses
- Monitor index usage: Use the
system:indexescollection to track your indexes - Consider index creation cost: Adding indexes to large collections requires scanning and indexing time
- Clean up unused indexes: Drop indexes that are no longer needed to save storage space and improve write performance
Roadmap
DQL indexing focuses on core functionality for optimal performance. The following features are not currently supported:- Composite indexes on multiple fields
- Partial indexes (e.g.,
CREATE INDEX ... 'WHERE status = 'active') - Functional indexes (e.g.,
UPPER(field1)) - Tombstone indexing
- Custom indexing on the Ditto Server
- Array & Object support
Future releases may expand indexing capabilities based on user feedback and performance requirements.
Restrictions
When an unsupported operation is detected, the query will be serviced by a collection scan.
All predicates are applied as a filter on the documents matched by an index scan. This re-application of the index predicate along with initial application of any other predicates ensures the correct results.
-
The following filters are not supported by index scans:
- Any non-trivial predicates like functional and case-insensitive operations
- e.g.
LOWER(name) = 'xxx'(with an index on(name))ILIKE 'abc%'
- Non-trivial value expressions that can be statically evaluated to a non-composite type (see below) are used
- e.g.
a = :param % 2as:param % 2can be evaluated directly
- e.g.
- e.g.
- Composite value filters, e.g. (with an index on
(a))a = [1,2,3]a = {'b':'c'}
- Filtering on a sub-field, e.g. (with only an index on
(a))a.b = 1will not use an index scan; you must index the full path you wish to filter on
- Filtering on an element when an indexed field is an
ARRAYtype- e.g.
a[2] = 's'with an index on(a)will use an index scan but all documents will be retrieved by it (i.e. no advantage over a collection scan)
- e.g.
- Filtering on a non-deterministic value
- e.g.
a > clock()(clock()is non-deterministic)
- e.g.
- Any non-trivial predicates like functional and case-insensitive operations
-
The choice of which predicate to use and therefore which index is selected is rule based
- Predicates are preferred in this order:
- Equality / IS NULL / IS MISSING
- Closed ranges (e.g.
a > 0 AND a < 10) with no preference for inclusive or exclusive boundariesLIKE(case sensitive) with a constant prefix can be supported by a closed range based on the prefix
- Less than (or less than or equal to)
- Greater than (or greater than or equal to)
- The order of the available index list (which is alphabetically sorted on index name)
- If an index can offer ordering along with filtering it is preferred over one that provides only filtering
- e.g.
ix1on(b)is preferred overix0on(a)(or an intersection ofix1andix0) fora = 1 AND b > 1 ORDER BY b - Statement ordering direction (ASC/DESC) must match the index direction
- The index key must be the leading sorting term
- Partial ordering will not override preferred filters
- e.g. an intersection of
ix0on(a)andix1on(b)is preferred fora = 1 AND b > 1 ORDER BY b,a
- e.g. an intersection of
- e.g.
As the planning is rule based there is no consideration for actual selectivity of different filters. If a particular filter is known to be more selective, directives can be used to direct planning. - Predicates are preferred in this order:
-
There is a limit to the number of index scans a statement will use.
- If possible the planner will apply all applicable index scans up to this limit and remaining predicates are applied as post-fetch filtering.
- All branches of a statement (i.e.
OR-ed terms) must be serviced by an index scan; if any aren’t then a collection scan is necessary and therefore no index scans will be used. - The total number of scans permitted is controlled by the
#max_scansdirective.
-
There is a limit to the number of intersections (
AND-ed terms) applied in any branch.- Terms not applied with an index scan are applied as post-fetch filters.
- The limit is controlled by the
#intersectsdirective.