INDEX

This feature is available in SDK version 4.12.0 and later. See restrictions for current limitations.

From version 4.13.0, DQL supports union and intersect scans for queries with OR, IN, and AND operators. This allows the query optimizer to use multiple indexes simultaneously in a single query. For example, a query like WHERE status = ‘active’ OR priority = ‘high’ can leverage separate indexes on both status and priority fields, combining results through a union scan.

DQL indexing allows you to create simple indexes on the Small Peer to accelerate searches, support joins, and enhance the overall efficiency of data retrieval.

Performance and Query Selectivity

Index performance is inversely related to query selectivity—the percentage of documents returned by your query. Highly selective queries (returning a small percentage of documents) see the most dramatic improvements. For example, a query that returns only 10% of a collection can be approximately 90% faster with an appropriate index, as the index allows skipping the other 90% of documents entirely. Less selective queries benefit less from indexes since most documents still need to be examined.

Index Operations Overview

DQL supports the following index operations:

Operation	Description
`CREATE INDEX`	Creates a simple on a specified field
`DROP INDEX`	Removes an existing index
`USE INDEX`	Force a query to use a specific index
`USE DIRECTIVES`	Advanced control over index usage and query optimization
`SELECT from system:indexes`	Lists all indexes in the database

Index operations can only be performed through the execute SDK operation. Using any of these commands through registerObserver or registerSubscription will fail with an “invalid query not supported” response.

Creating Indexes

Simple Index

Create an index on a single field to improve query performance for that field:

DQL

-- Standard syntax
CREATE INDEX index_name ON collection_name (field1)

-- Alternative syntax
CREATE INDEX collection_name.index_name (field1)

-- With IF NOT EXISTS clause (prevents errors on duplicate index names)
CREATE INDEX IF NOT EXISTS index_name ON collection_name (field1)

For example, create an index on the color field in the cars collection:

DQL

CREATE INDEX color_idx ON cars (color)
-- or
CREATE INDEX cars.color_idx (color)
-- or with IF NOT EXISTS to allow repeated execution
CREATE INDEX IF NOT EXISTS color_idx ON cars (color)

The IF NOT EXISTS clause allows the statement to execute successfully even if an index with the same name already exists on the collection. This is useful for initialization scripts that may run multiple times.

The IF NOT EXISTS clause only checks if an index with the same name exists - it does not verify the field definition. If an index with the same name already exists (even on a different field), the CREATE INDEX IF NOT EXISTS will succeed without creating a new index or modifying the existing one. To change an index definition, you must explicitly DROP the existing index first before creating the new one.

Index Creation Examples

The CREATE INDEX operation returns an empty result set (no items) upon successful completion. Successful execution without errors indicates that the index was created successfully - if there are any issues (such as duplicate index names or invalid syntax), an error will be thrown. To verify that your index was created successfully, you can query the system:indexes collection.

// Create a simple index
let result = await ditto.store.execute(
  query: "CREATE INDEX color_idx ON cars (color)")
// result.items will be empty - this is expected

// Create with IF NOT EXISTS - safe for repeated execution
let result2 = await ditto.store.execute(
  query: "CREATE INDEX IF NOT EXISTS color_idx ON cars (color)")
// No error even if index already exists

// Verify the index was created
let indexes = await ditto.store.execute(
  query: "SELECT * FROM system:indexes WHERE _id = :indexName",
  arguments: ["indexName": "cars.color_idx"])
// indexes.items will contain the index details

Nested Fields

You can create indexes on nested fields by using dot notation. You can create indexes on registers nested within a map, as long as the value at the end of the path is a scalar.

DQL

CREATE INDEX nested_idx ON cars (properties.engine.type)

LIKE

Indexes can also accelerate LIKE queries when searching for prefix patterns:

DQL

-- Create an index on the name field
CREATE INDEX name_idx ON users (name)

-- This query can use the name_idx for efficient prefix matching
SELECT * FROM users WHERE name LIKE 'John%'

Indexes are particularly effective for LIKE patterns that start with a fixed prefix (e.g., ‘prefix%’). Patterns starting with wildcards (e.g., ‘%suffix’) cannot leverage indexes effectively.

Automatic Index Selection

By default, DQL automatically chooses the most appropriate index for your queries based on the WHERE clause conditions and ORDER BY expressions:

DQL

SELECT * FROM cars WHERE color = 'blue'
-- Automatically uses color_idx if it exists

When a query contains both WHERE and ORDER BY clauses, DQL considers both when selecting the optimal index. An index that satisfies both the filter condition and sort order will be preferred over one that only satisfies the filter.

Operating on Indexes

DQL supports additional options when creating indexes:

Viewing Indexes

You can discover all indexes in your database by querying the special system:indexes collection:

DQL

SELECT * FROM system:indexes

Each document in the system:indexes collection follows this schema:

{
  "_id": "index_name",
  "collection": "collection_name",
  "fields": ["field1", "field2"]
}

Example

{
  "_id":"tasks.tasks_id",
  "collection":"tasks",
  "fields":["`_id`"]
}

// List all indexes
let result = await ditto.store.execute(
  query: "SELECT * FROM system:indexes")

// List indexes for a specific collection
let result = await ditto.store.execute(
  query: "SELECT * FROM system:indexes WHERE collection = :collectionName",
  arguments: ["collectionName": "cars"])

Dropping Indexes

Remove an index that is no longer needed:

DQL

-- Standard syntax
DROP INDEX index_name ON collection_name

-- Alternative syntax
DROP INDEX collection_name.index_name

-- With IF EXISTS clause (prevents errors if index doesn't exist)
DROP INDEX IF EXISTS index_name ON collection_name

For example, to drop the color_idx index from the cars collection:

DQL

DROP INDEX color_idx ON cars
-- or
DROP INDEX cars.color_idx
-- or with IF EXISTS to allow repeated execution
DROP INDEX IF EXISTS color_idx ON cars

The IF EXISTS clause allows the statement to execute successfully even if the index doesn’t exist. This is useful for cleanup scripts that may run multiple times or when you’re unsure if an index exists.

Without the IF EXISTS clause, attempting to drop a non-existent index will raise an error.

// Standard syntax
await ditto.store.execute(query: "DROP INDEX color_idx ON cars")

// Alternative syntax
await ditto.store.execute(query: "DROP INDEX cars.color_idx")

// With IF EXISTS - safe for repeated execution
await ditto.store.execute(query: "DROP INDEX IF EXISTS color_idx ON cars")
// No error even if index doesn't exist

Controlling Index Usage

DQL provides several ways to control index selection in your queries.

Using index directives on a sync subscription will result in an invalid query error.

USE INDEX Syntax

The USE INDEX clause is the simple way to force a query to use a specific index. It appears after the collection name in SELECT statements:

DQL

SELECT * FROM collection_name USE INDEX 'index_name' WHERE ...

DQL

-- Force use of a specific index
SELECT * FROM cars USE INDEX 'color_idx' WHERE color = 'blue'

This syntax is straightforward and recommended for most use cases where you need to specify which index to use for a query.

When to Use USE INDEX

There are scenarios where you might want to force a specific index rather than letting DQL auto-select.

Forcing a specific index when multiple could apply

DQL

-- You have indexes: customer_idx and total_idx
-- Force the customer index when you know it's more selective
SELECT * FROM orders USE INDEX 'customer_idx'
WHERE customer_id = 'abc123' AND total > 1000
-- Without forcing, DQL might choose total_idx if it estimates incorrectly

Preferring sort order optimization

DQL

-- You have indexes: date_idx and status_idx
-- Force the date index when sort performance is critical
SELECT * FROM events USE INDEX 'date_idx'
WHERE status = 'active' ORDER BY date DESC LIMIT 10

Working around index statistics

DQL

-- Force an index when you know the data distribution better than the optimizer
-- e.g., most cars are 'available' but you're querying a rare status
SELECT * FROM cars USE INDEX 'status_idx'
WHERE status = 'maintenance' -- Only 0.1% of records

Testing and performance comparison

DQL

-- Compare performance between different indexes
SELECT * FROM products USE INDEX 'category_idx' WHERE category = 'electronics'
-- vs
SELECT * FROM products USE INDEX 'category_brand_idx' WHERE category = 'electronics'

Advanced: USE DIRECTIVES Syntax

For more advanced scenarios, DQL also supports the USE DIRECTIVES syntax which allows fine-grained control over query optimization using JSON directives:

DQL

SELECT * FROM collection_name USE DIRECTIVES '<json-directives>' WHERE ...

DQL

-- Force use of a specific index (equivalent to USE INDEX)
SELECT * FROM cars USE DIRECTIVES '{"#index":"color_idx"}'
WHERE color = 'blue'

-- Force use of multiple indexes for intersection (v4.13+)
SELECT * FROM movies USE DIRECTIVES '{"#index":["movies_title_idx", "movies_rated_idx"]}'
WHERE title LIKE 'Mind%' AND (rated = 'PG' OR rated = 'G')

-- Force collection scan (skip all indexes)
SELECT * FROM cars USE DIRECTIVES '{"#index":null}'
WHERE color = 'blue'

-- Let DQL auto-select from available indexes
SELECT * FROM cars USE DIRECTIVES '{"#index":[]}'
WHERE color = 'blue'

-- Prefer ordering optimization
SELECT * FROM cars USE DIRECTIVES '{"#prefer_order":true}'
WHERE year > 2020 ORDER BY price DESC

// Using USE INDEX syntax
let result = await ditto.store.execute(
  query: "SELECT * FROM cars USE INDEX 'color_idx' WHERE color = :color",
  arguments: ["color": "blue"])

// Using USE DIRECTIVES syntax (advanced)
let result2 = await ditto.store.execute(
  query: "SELECT * FROM cars USE DIRECTIVES '{\"#index\":\"color_idx\"}' WHERE color = :color",
  arguments: ["color": "blue"])

// Using comment directive (advanced)
let result3 = await ditto.store.execute(
  query: "/*+ {\"cars\":{\"#index\":\"color_idx\"}} */ SELECT * FROM cars WHERE color = :color",
  arguments: ["color": "blue"])

Comment Directives

Comment directives provide the same functionality using special comments:

DQL

/*+ <json-directives> */ SELECT ...
-- or
--+ <json-directives>
SELECT ...

The JSON structure for comment directives supports both collection-specific and global settings:

DQL

-- Collection-specific directive
/*+ {"cars":{"#index":"color_idx"}} */
SELECT * FROM cars WHERE color = 'blue'

-- Global directive affecting all collections
/*+ {"#prefer_order":true} */
SELECT * FROM cars WHERE year > 2020 ORDER BY price DESC

-- Multiple directives
/*+ {
  "cars": {
    "#index": "color_idx",
    "#prefer_order": true
  },
  "#disable_index_scan": false
} */
SELECT * FROM cars WHERE color = 'blue' ORDER BY year

Directive Reference

Collection-specific directives always override global directives. Unrecognized directives are silently ignored, but the JSON must be valid or a parser error will occur.

Directive	Type	Default	Description
`#index`	string, array, or null	auto-select	Specifies which index to use
`#prefer_order`	boolean	false	Prefer indexes that optimize ORDER BY over WHERE
`#disable_index_scan`	boolean	false	Disable all index scans (global only)
`#max_scans`	int	16	The maximum number of scans to permit in a statement
`#intersects`	int	4	The maximum number of intersect scans to permit in a statement branch

The #index directive accepts different value types.

Value	Behavior
`"index_name"`	Use the specified index if available
`["index1", "index2", ...]`	Use specified indexes for intersection (v4.13+)
`null` or `""`	Force collection scan (no indexes)
`[]`	Auto-select from available indexes

Starting in v4.13, you can pass an array of index names to force the use of multiple indexes simultaneously through intersection scans. The indexes will be applied in the order specified for AND conditions. For example: {"#index":["movies_title_idx", "movies_rated_idx", "movies_year_idx"]}.

Given multiple directives, the last instance of any directive applies. For example, the following will result in ix2 being used and not ix1.

SELECT ... FROM cars
USE DIRECTIVES '{"#index":"ix1"}'
WHERE ... /*+{"cars":{"#index":"ix2"}}*/

Translation of IN-list filters to grouped OR terms is limited by the lower of #max_inlist and the remaining number of scans below #max_scans at the time the list is processed. #max_inlist should be used to limit the number of scans used for processing of IN-list filters when it is preferable to apply other filters.

Ditto Instance Initialization

When initializing your Ditto instance, you may want to create indexes that are essential for your application’s performance. Since indexes are stored in the database and persist across application restarts, you need to handle cases where indexes might already exist from previous runs.

Initialization Patterns

There are three approaches for handling index creation during initialization:

Use IF NOT EXISTS clause - The simplest and most efficient approach (recommended)
Check before creating - Query system:indexes to see if the index exists
Create and catch errors - Attempt to create the index and handle duplicate errors gracefully

Initialization Examples

// Approach 1: Use IF NOT EXISTS (recommended)
func initializeIndexes() async throws {
    let indexQueries = [
        "CREATE INDEX IF NOT EXISTS color_idx ON cars (color)",
        "CREATE INDEX IF NOT EXISTS name_idx ON users (name)",
        "CREATE INDEX IF NOT EXISTS date_idx ON orders (created_date)"
    ]

    for query in indexQueries {
        await ditto.store.execute(query: query)
        print("Index created or already exists")
    }
}

// Approach 2: Check if index exists before creating
func initializeIndexesWithCheck() async throws {
    let requiredIndexes = [
        ("cars", "color_idx", "color"),
        ("users", "name_idx", "name"),
        ("orders", "date_idx", "created_date")
    ]

    for (collection, indexName, field) in requiredIndexes {
        let existingIndex = await ditto.store.execute(
            query: "SELECT * FROM system:indexes WHERE collection = :collection AND _id = :indexId",
            arguments: ["collection": collection, "indexId": "\(collection).\(indexName)"])

        if existingIndex.items.isEmpty {
            await ditto.store.execute(
                query: "CREATE INDEX \(indexName) ON \(collection) (\(field))")
            print("Created index \(indexName) on \(collection)")
        } else {
            print("Index \(indexName) already exists on \(collection)")
        }
    }
}

// Approach 3: Create and handle errors
func initializeIndexesWithErrorHandling() async {
    let indexQueries = [
        "CREATE INDEX color_idx ON cars (color)",
        "CREATE INDEX name_idx ON users (name)",
        "CREATE INDEX date_idx ON orders (created_date)"
    ]

    for query in indexQueries {
        do {
            try await ditto.store.execute(query: query)
            print("Successfully created index")
        } catch {
            if error.localizedDescription.contains("already exists") {
                print("Index already exists, continuing...")
            } else {
                print("Failed to create index: \(error)")
                throw error
            }
        }
    }
}

Best Practices for Initialization

Use IF NOT EXISTS for simplicity - The simplest and most efficient approach for most use cases
Use the check-first approach for critical indexes - When you need detailed logging and control
Batch index creation - Create all required indexes during initialization rather than on-demand
Log index status - Always log whether indexes were created or already existed for debugging
Handle errors gracefully - Don’t let index creation failures prevent your app from starting unless the indexes are absolutely critical
Consider startup performance - The IF NOT EXISTS approach is fastest as it requires no additional queries

Error Handling

When working with indexes, you may encounter errors if an index with the same name already exists on a collection. This section covers how to handle such scenarios.

Duplicate Index Error

If you attempt to create an index with a name that already exists on the same collection, DQL will throw the following error:

Database error: an index with the given name already exists on the given collection

To handle this error, you have several options:

Use IF NOT EXISTS clause - Prevents the error entirely (recommended)
Check if the index exists first by querying system:indexes
Drop the existing index before creating a new one (use DROP INDEX IF EXISTS for safety)
Use a different index name
Catch and handle the error in your application code

Error Handling Examples

// Approach 1: Use IF NOT EXISTS (recommended)
await ditto.store.execute(query: "CREATE INDEX IF NOT EXISTS color_idx ON cars (color)")
// No error handling needed - succeeds whether index exists or not

// Approach 2: Check if index exists before creating
let existingIndexes = await ditto.store.execute(
  query: "SELECT * FROM system:indexes WHERE collection = :collection AND _id = :indexName",
  arguments: ["collection": "cars", "indexName": "cars.color_idx"])

if existingIndexes.items.isEmpty {
  // Safe to create the index
  await ditto.store.execute(query: "CREATE INDEX color_idx ON cars (color)")
}

// Approach 3: Handle the error
do {
  try await ditto.store.execute(query: "CREATE INDEX color_idx ON cars (color)")
} catch {
  if error.localizedDescription.contains("already exists") {
    print("Index already exists, skipping creation")
  } else {
    throw error
  }
}

Best Practices for Error Handling

Use idempotent index creation: Check if an index exists before attempting to create it
Implement retry logic: For transient errors, implement appropriate retry mechanisms
Log errors appropriately: Distinguish between expected errors (like duplicate indexes) and unexpected ones
Consider using unique naming conventions: Include timestamps or version numbers in index names when appropriate

How it works

General Behavior

When an index is added to a collection, that collection must be scanned and indexed, which may take time for large collections
Index directives only suggest which index to consider. DQL still requires a valid reason (WHERE or ORDER BY clause) to actually use the index. An index won’t be used if the query doesn’t benefit from it.

Persistence

SDK Indexes are retained when Ditto shuts down
Index names are unique to a collection - creating an index with an existing name in the same collection will raise an error
To replace an existing index, you must first drop it, then recreate it

Missing Fields

SDK - Documents that don’t contain the indexed field will be included in index-optimized query results. (i.e. INCLUDE MISSING is implied).
Ditto Server HTTP API - By default, documents that don’t contain the indexed field will not be included in index-optimized query results. (i.e. EXCLUDE MISSING)

Supported APIs

API/Feature	Index Support
`execute` SDK API	✅ Supported
`registerObserver` SDK API	✅ Supported
`registerSubscription` SDK API	❌ Not Supported
Ditto Server HTTP API	❌ Not Supported
SDKs with in-memory storage	❌ Not Supported

Best Practices

Create indexes on frequently queried fields: Focus on fields that appear often in your WHERE clauses
Monitor index usage: Use the system:indexes collection to track your indexes
Consider index creation cost: Adding indexes to large collections requires scanning and indexing time
Clean up unused indexes: Drop indexes that are no longer needed to save storage space and improve write performance

Roadmap

DQL indexing focuses on core functionality for optimal performance. The following features are not currently supported:

Composite indexes on multiple fields
Partial indexes (e.g., CREATE INDEX ... 'WHERE status = 'active')
Functional indexes (e.g., UPPER(field1))
Tombstone indexing
Custom indexing on the Ditto Server
Array & Object support

Future releases may expand indexing capabilities based on user feedback and performance requirements.

Restrictions

When an unsupported operation is detected, the query will be serviced by a collection scan.

All predicates are applied as a filter on the documents matched by an index scan. This re-application of the index predicate along with initial application of any other predicates ensures the correct results.

In addition to other restrictions, these restrictions apply to version 4.12 only. Starting in v4.13, union and intersect scans allow using multiple indexes simultaneously:

v4.12 only: At most only one index scan will be used in any given query. This means that:
1. Logical OR is not supported
  1. Any operation that resolves to a logical OR is not supported
    1. e.g. IN (a,b,c) which logically resolves to (a OR b OR c)
    2. e.g. Inequality, e.g. a != 0 which resolves to (a < 0 OR a > 0)
  2. This includes when a branch is always false and would never contribute any documents to the result
    1. e.g a = 1 OR false
2. Only one non-overlapping logical AND predicate with an appropriate supporting index is applied as an index scan
  1. Overlapping predicates are combined when possible, e.g. a > 0 AND a < 10, to form a single closed range and are thus considered a single predicate when it comes to index selection
  2. As only the overlapping range is used, something like (e.g.) a > 0 AND a = 1 is combined to be just a = 1
The following filters are not supported by index scans:
1. NOT (logical negation)
  1. This is true even when the logical result would be a supported operation
    1. e.g. Even though NOT a != 0 logically is a = 0, it won’t be supported by an index scan

The following filters are not supported by index scans:
1. Any non-trivial predicates like functional and case-insensitive operations
  1. e.g.
    1. LOWER(name) = 'xxx' (with an index on (name))
    2. ILIKE 'abc%'
  2. Non-trivial value expressions that can be statically evaluated to a non-composite type (see below) are used
    1. e.g. a = :param % 2 as :param % 2 can be evaluated directly
2. Composite value filters, e.g. (with an index on (a))
  1. a = [1,2,3]
  2. a = {'b':'c'}
3. Filtering on a sub-field, e.g. (with only an index on (a))
  1. a.b = 1 will not use an index scan; you must index the full path you wish to filter on
4. Filtering on an element when an indexed field is an ARRAY type
  1. e.g. a[2] = 's' with an index on (a) will use an index scan but all documents will be retrieved by it (i.e. no advantage over a collection scan)
5. Filtering on a non-deterministic value
  1. e.g. a > clock() (clock() is non-deterministic)
The choice of which predicate to use and therefore which index is selected is rule based
1. Predicates are preferred in this order:
  1. Equality / IS NULL / IS MISSING
  2. Closed ranges (e.g. a > 0 AND a < 10) with no preference for inclusive or exclusive boundaries
    1. LIKE (case sensitive) with a constant prefix can be supported by a closed range based on the prefix
  3. Less than (or less than or equal to)
  4. Greater than (or greater than or equal to)
  5. The order of the available index list (which is alphabetically sorted on index name)
2. If an index can offer ordering along with filtering it is preferred over one that provides only filtering
  1. e.g. ix1 on (b) is preferred over ix0 on (a) (or an intersection of ix1 and ix0) for a = 1 AND b > 1 ORDER BY b
  2. Statement ordering direction (ASC/DESC) must match the index direction
    1. The index key must be the leading sorting term
  3. Partial ordering will not override preferred filters
    1. e.g. an intersection of ix0 on (a) and ix1 on (b) is preferred for a = 1 AND b > 1 ORDER BY b,a
As the planning is rule based there is no consideration for actual selectivity of different filters. If a particular filter is known to be more selective, directives can be used to direct planning.

Use explain to show the chosen access plan.
There is a limit to the number of index scans a statement will use.
1. If possible the planner will apply all applicable index scans up to this limit and remaining predicates are applied as post-fetch filtering.
2. All branches of a statement (i.e. OR-ed terms) must be serviced by an index scan; if any aren’t then a collection scan is necessary and therefore no index scans will be used.
3. The total number of scans permitted is controlled by the #max_scans directive.
There is a limit to the number of intersections (AND-ed terms) applied in any branch.
1. Terms not applied with an index scan are applied as post-fetch filters.
2. The limit is controlled by the #intersects directive.

Statements

Types and Functions

Reference

Performance and Query Selectivity

Index Operations Overview

Creating Indexes

Simple Index

Index Creation Examples

Nested Fields

LIKE

Automatic Index Selection

Operating on Indexes

Viewing Indexes

Dropping Indexes

Controlling Index Usage

USE INDEX Syntax

When to Use USE INDEX

Advanced: USE DIRECTIVES Syntax

Comment Directives

Directive Reference

Ditto Instance Initialization

Initialization Patterns

Initialization Examples

Best Practices for Initialization

Error Handling

Duplicate Index Error

Error Handling Examples

Best Practices for Error Handling

How it works

General Behavior

Persistence

Missing Fields

Supported APIs

Best Practices

Roadmap

Restrictions

Statements

Types and Functions

Reference

​Performance and Query Selectivity

​Index Operations Overview

​Creating Indexes

​Simple Index

​Index Creation Examples

​Nested Fields

​LIKE

​Automatic Index Selection

​Operating on Indexes

​Viewing Indexes

​Dropping Indexes

​Controlling Index Usage

​USE INDEX Syntax

​When to Use USE INDEX

​Advanced: USE DIRECTIVES Syntax

​Comment Directives

​Directive Reference

​Ditto Instance Initialization

​Initialization Patterns

​Initialization Examples

​Best Practices for Initialization

​Error Handling

​Duplicate Index Error

​Error Handling Examples

​Best Practices for Error Handling

​How it works

​General Behavior

​Persistence

​Missing Fields

​Supported APIs

​Best Practices

​Roadmap

​Restrictions

Performance and Query Selectivity

Index Operations Overview

Creating Indexes

Simple Index

Index Creation Examples

Nested Fields

LIKE

Automatic Index Selection

Operating on Indexes

Viewing Indexes

Dropping Indexes

Controlling Index Usage

USE INDEX Syntax

When to Use USE INDEX

Advanced: USE DIRECTIVES Syntax

Comment Directives

Directive Reference

Ditto Instance Initialization

Initialization Patterns

Initialization Examples

Best Practices for Initialization

Error Handling

Duplicate Index Error

Error Handling Examples

Best Practices for Error Handling

How it works

General Behavior

Persistence

Missing Fields

Supported APIs

Best Practices

Roadmap

Restrictions