> ## Documentation Index
> Fetch the complete documentation index at: https://docs.ditto.live/llms.txt
> Use this file to discover all available pages before exploring further.

# Document Model

> Ditto stores data records as JSON-like documents. Internally these documents are CRDTs, which are a binary representation of JSON documents designed for automatic conflict resolution.

## Document Structure

Ditto documents are composed of field-and-value pairs and have the following structure:

```json theme={null}
{
   field1: value1,
   field2: value2,
   field3: value3,
   ...
   fieldN: valueN
}
```

The value of a field can be any of the JSON data types, including other documents, arrays, and arrays of documents. For example, the following document contains values of varying types:

```json theme={null}
{
  _id: "0016d749-9a9b-4ece-8794-7f3eb40bc82e",
  name: "John Doe",
  age: 30,
  email: "john.doe@example.com",
  address: {
    street: "123 Main St",
    city: "Anytown",
    state: "CA",
    zip: "12345"
  },
  is_active: true,
  created_at: "2021-01-01T00:00:00Z"
}
```

The above fields have the following data types:

* `_id`: `STRING`
* `name`: `STRING`
* `age`: `NUMBER`
* `email`: `STRING`
* `address`: `OBJECT`
* `is_active`: `BOOLEAN`
* `created_at`: `DATE`

## Identifying Documents

In Ditto, each document stored in a collection requires a unique `_id` field that acts as a primary key. If an inserted document omits the `_id` field, Ditto automatically generates a unique identifier for the `_id` field.

Once set for a given document, the `_id` field cannot be changed. The same document contents inserted with a different `_id` will be treated as a new document within Ditto.

The `_id` is required for all documents and can be any JSON data type.
While generating a unique identifier is the default behavior, typically you will provide your own `_id` values represented as a more complex object (acting as a composite key).

For example, the following document includes a `_id` field with a complex object:

```json theme={null}
{
  _id: {
    orderId: "0016d749-9a9b-4ece-8794-7f3eb40bc82e"
    locationId: "5da42ab5-d00b-4377-8524-43e43abf9e01"
  },
  ...
}
```

You can then either query documents using the entire `_id` object or by breaking it down into its individual components.

```sql DQL theme={null}
SELECT * 
FROM orders 
WHERE _id.locationId = '5da42ab5-d00b-4377-8524-43e43abf9e01'
```

```sql DQL theme={null}
SELECT * 
FROM orders 
WHERE _id = {'orderId': '0016d749-9a9b-4ece-8794-7f3eb40bc82e', 'locationId': '5da42ab5-d00b-4377-8524-43e43abf9e01'}
```

It is important to carefully consider the `_id` field when designing your data model, as this is used for authorization rules within Ditto. For more information, see [Authorization](/key-concepts/authentication-and-authorization).

## Document Fields

Documents are composed of fields, which are key-value pairs.

### Field Names

Similar to most document-oriented databases, you can only use `strings` to encode field names in documents.

For complete naming rules, see [IDs, Paths, Strings, and Keywords](/dql/ids-paths-strings-keywords).

### Field Values

Field values can be encoded using various *data types*, including scalar types, providing flexibility in representing a wide range of information.

<Warning>
  Avoid using `arrays` in Ditto.

  Due to potential merge conflicts when offline peers reconnect to the mesh and attempt to sync their updates, especially when multiple peers make concurrent updates to the same item within the` array`.

  Instead using a JSON object within a `MAP` allows you to automatically merge the contents of the `MAP` when offline peers reconnect to the mesh.
</Warning>

Each value of a field is stored as a specific CRDT type, for example a `MAP` or `REGISTER`.
You can read more about CRDTs in [Syncing Data](/key-concepts/syncing-data#crdts).

## Document Size

Each Ditto document has soft and hard size limits:

* A document larger than the **soft limit (256 KiB)** logs a `warn`-level message; the write succeeds.
* A document larger than the **hard limit (5 MiB)** logs an `error`-level message; the write currently succeeds. **Future versions will reject writes that exceed the hard limit and skip such documents during replication** — treat the error log as a deprecation signal and remediate now.

Document size does not drive steady-state sync — Ditto syncs only the fields that change between peers. It does affect:

* **Local storage and memory** on every device that holds the document.
* **Serialization and deserialization time** on read and write.
* **Initial replication and bulk catch-up**, where the full payload travels over the wire. On a Bluetooth Low Energy (BLE) link the practical ceiling is roughly 20 KB/sec, so a 256 KiB document takes about 10 seconds to replicate the first time. Subsequent edits sync only the changed fields and are far smaller.
* **CRDT merge cost**, which scales with document size rather than change size — a small edit to a large document still pays the cost of a large merge.

For storing large binary blobs such as images or video, use the [`ATTACHMENT`](/sdk/latest/crud/working-with-attachments) data type rather than embedding raw bytes. For the rationale behind the limits, monitoring guidance, remediation patterns, and runtime configuration, see [Document Size Limits](/best-practices/document-size-limits). For application-level modeling guidance, see [Data Modeling Tips](/best-practices/data-modeling#document-size).

## Relationships

The recommended default in Ditto is to **embed related data within a single document**, typically as a map keyed by ID. This gives you atomic single-document writes, full sync as a unit, and Ditto's add-wins map merge for concurrent edits to different sub-entities. Use a foreign-key relationship across collections only when sub-entities need independent permission scopes, are accessed independently at scale, or would push the parent past the [document size limits](#document-size). See [Data Modeling Tips](/best-practices/data-modeling#modeling-relationships) and [Denormalized Documents](/best-practices/conflict-resolution-patterns#denormalized-documents-one-document-atomic-sync) for the full trade-off.

### Embedded Relationships

An *embedded relationship* keeps related sub-entities inside the parent document, typically as a map keyed by the sub-entity's ID. Each entry in the map merges independently, so concurrent edits to different sub-entities — or different fields within one sub-entity — resolve cleanly without custom logic.

For example, a `team` document with members embedded as a map keyed by member ID:

```json theme={null}
{
  "_id": "engineering",
  "name": "Engineering",
  "members": {
    "alice": {
      "role": "lead",
      "joined": "2024-01-15"
    },
    "bob": {
      "role": "engineer",
      "joined": "2024-03-22"
    }
  }
}
```

Two devices editing different members, or different fields on the same member, will merge automatically.

### Foreign-Key Relationships

To create a *foreign-key relationship*, store the `_id` of one document as a field within another. This splits related data across collections, which is useful when sub-entities need their own permission scope, are accessed independently of the parent, or would push the parent past the [document size limits](#document-size).

For example, if you have two collections — `cars` and `owners` — where each car has a corresponding owner, every document in `cars` includes a field containing the `_id` of a document in `owners`:

```json Car theme={null}
{
  "_id": "0016d749-9a9b-4ece-8794-7f3eb40bc82e",
  "owner_id": "5da42ab5-d00b-4377-8524-43e43abf9e01"
}
```

```json Owner theme={null}
{
  "_id": "5da42ab5-d00b-4377-8524-43e43abf9e01",
  "name": "John Doe"
}
```
