Troubleshooting
This guide covers the most commonly encountered issues using Ditto. Refer to any of the following to help you get back on track:
If you need technical support, submit a technical support request ticket. (See Contact Us)
To troubleshoot a peer-to-peer data sync issue, first determine the cause of the issue by obtaining and analyzing the database error and warning messages captured in the debug logs.
As demonstrated in the following snippet, obtain the database debug logs before Ditto is initialized so that any potential issues related to the file system access and authentication background tasks that run during initialization are tracked. You want to begin gathering logs before Ditto is initialized in order to capture any potential issues with those two background tasks.
For OnlinePlayground and onlineWithAuthentication identities, the first thing that Ditto needs to do is authenticate to the Big Peer.
Confirm that all devices share the same AppID by verifying that the app_id log line contains your AppID:
Once confirmed, verify that authentication was successful.
Once authentication is successful, client re-authentication does not occur until the device's local certificate expires, as indicated by the following:
If the appID or token are incorrect, the following displays in the debug logs:
See if the following fixes your issue:
- Go to your App.
- Copy the AppID and Playground Token and use it in your Ditto initialization.
If you see the above message, your device's locally cached certificate is invalid. The device needs to call ditto.auth.logout() and reconnect to the Internet to get a new certificate. Alternatively, you can clear the local cache by reinstalling the mobile application or clearing the local persistence directory.
If you see either of the above messages, this is most likely a problem with your authentication webhook. There is a debugging level log line that will provide more information about the authentication webhook if you need more information.
Send a POST request with a JSON stringified body to your server to ensure that the AWS-hosted Big Peer servers located in U.S. regions successfully send POST requests to your authentication server.
If you see that Ditto is unable to sync, and see the log lines below, it's possible that your ditto instance is being deallocated before it is able to synchronize with other peers. Ensure that the ditto instance is initialized and saved in a global variable that is long-lived for the duration of the application.
A small peer sync with the Big Peer using a WebSocket connection.
The debug logs will tell you each step to create a successful WebSocket connection to the big peer.
First, the peer will discover the big peer and you'll see a Discovered event.
If the WebSocket connection that links the Small Peer to the Big Peer is successful, the debug logs contain the ConnectionEstablished event:
If you see StreamFailed or ConnectionEnded or any errors related to WebSocket connection with the Big Peer, there is likely an error in the Big Peer subscription server. For troubleshooting help, contact Ditto support.
If your certificate chain is corrupted, you will see the following.
To fix this, please try to reset your certificate chain. On MacOS, open the Keychain Acess application and navigate to the settings, and click "Reset default keychains..."
You will then see that the underlying physical replication session has been started with phy started.
The pkSOME_BYTES identifer displays the public key to the Big Peer instance that a Small Peer WebSocket connection has temporary sync access.
Note that the following snippet is merely an example; the pkSOME_BYTES identifer is not a guaranteed static variable.
Once temporary remote access is authorized, the following debug log message appears to indicate that the Small Peer (client) WebSocket connection to the Big Peer (cloud server) is successful:
For the Small Peer to have an active subscription to read and write to other peers, it must first be authorized access by a Big Peer. If the Small Peer does not have permissions to access database replication, it cannot subscribe to its collections to read and write.
Given this, before you check the Small Peer's active read-write subscriptions, confirm that the Small Peer has permissions to access the database.
Local permissions refer to the current peer that is running on the device.
Once the peer is authorized, it will begin to print the active subscriptions. Ditto will print the active subscriptions every time the sync engine wakes up. This includes a local write to the database or a sync event from another peer writing to their local database.
You will want to verify that these permissions are correct and what you expect. A peer cannot subscribe to a collection if it does not have read permission.
The default Big Peer remote permission to access a connected Small Peer local database replication to read and write is as follows:
Now that default permissions are verified, confirm that the Big Peer is connected to a Small Peer and the local-to-global database replication task is running.
Ditto prints a list of all the queries that the peer is currently subscribed to.
By default, each peer includes some internal subscriptions, which are denoted using a double underscore (__) before the collection name; for example, the following default internal __presence subscription:
You will also see a list of remote subscriptions. The big peer subscribes to everything, so you will see the following line which references the big peer:
If there are any application-level subscriptions, they will be listed by collection. For instance, if a Small Peer that subscribes to all tasks that are not deleted, the following appears:
If a subscription changes, the following appears:
If you have problems with subscriptions or permssions, you can try the operations with global read and write permission to verify that sync succeeds in this case. You can do this by using an OnlinePlayground identity, which defaults to global read and write permissions.
Check the following
- Do your permissions match your subscriptions? If you do not have permission to subscribe to data, it will not sync to the device.
- Are you subscribing to what you expect to see?
- Do you have more subscriptions than you expect to have?
- Do your live queries match the subscriptions?
Synchronization seems slow, or comes to a halt over time
- Ensure that you are only creating a fixed number of live queries, with a smaller amount of data.
- Evict irrelevant data. You can evict all irrelevant once per day
- Turn off verbose logging. Verbose logging can slow down replication considerably, especially with thousands of documents. Hence, it could look like that sync is stalling, but that can be indistinguishable from the logging mechanism slowing down ditto because it is trying to write too many lines.
- Look at the size of your ditto directory. Is it very large? Large databases will be slower. Try to query less data.
ParseError can be printed in the debug logs when there is a problem with your query. For example, if you create a subscription in Swift that is the empty string, you will see a ParseError in the logs.
If a Write is made to the local database, the following debugging messages appear:
Once a Write is made to the local database, Ditto re-prints the active subscriptions and permissions to access the debug log, as demonstrated in the previous snippet.
Next, the Small Peer creates the "update file." Annotated in the debug logs as follows, the update file provides the status of the data update and, using the pkA and pkB identifiers, indicates the two peers involved in the data exchange.
Once the local peer finishes creating the update file, the following appears to indicate that the update file is complete and the local peer is ready to send the new update to the remote peer. Note that at this time the local Write has yet to be synchronized across all connected peers.
If the local peer has received the update, all connected peers are synchronized, and the local database replication process is complete, the following appears:
OnlinePlayground applications must connect to the Big Peer first before going offline. Read more about online playground.
A proxy may either either block connections or cause errors in the log by substituting its own TLS certificate: invalid certificate: UnknownIssuer. If you see this log message you will either need to get Ditto unblocked or add the CA certificate to the Small Peer's trusted certificate store.
Verify that you can reach the following endpoints. You should see the output exactly as written below:
If this test passes, next check to see if WebSockets are blocked on your machine. Some corporate networks, firewalls, or proxies block the HTTP upgrade packet that tells the WebSocket server to keep the connection alive. Check with your IT administrator to see if your computer is configured to block WebSocket connections.
Did you follow the tutorial? The tutorial is your best guide for implementing this correctly. See the code on GitHub for a complete working example.
Common mistakes
- authenticationExpiringSoon and authenticationRequired both need to be implemented according to the sample code.
- Since callback objects are only invoked when Ditto initializes and the client authentication certificate expires, do not create subscriptions inside callbacks.
- Keep a strong reference to the AuthClient for the duration of the Ditto object, otherwise the auth handler will become garbage collected at an inappropriate time.
Verify that your webhook provider name is correctly copied in the Ditto portal.
The provider name given to the Ditto Client must match a provider name in the Portal (e.g., my-auth).
Is your AuthClient becoming garbage collected?
Ensure that you keep a reference to the AuthClient in scope for the duration that Ditto is also active. You should attach the dittoAuth variable to the object so it does not get garbage collected.
For example:
If you are unable to see connections over particular transports, first ensure you are following the SDK Setup guide for your language.
After you verify that your app is set up correctly, if connection issues continue follow these debugging steps.
Bluetooth
- Turn Use Location on
- Turn Bluetooth Scanning on
- Are permissions set correctly? See
- Go to your OS-level permissions for Bluetooth and clear the app permissions for your application.
- Delete the app, install it again, and open it. Does it ask for Bluetooth permissions?
Apple Wireless Direct Link, P2P-Wifi, Wifi Aware
- Go to your OS-level permissions and clear the app permissions for your application.
- Delete the app, install it again, and open it. Does it ask for location permissions?
- If you are using a custom
Local Area Network (LAN)
- Are permissions set correctly? See
- Are both devices connected to the same WiFi network?
- Check your router settings and see the
- If your device has VPN enabled, then peers can fail to communicate. Ensure that your VPN supports UDP multicasting.
App is using too much memory, why?
Use profiling tools for your platform to better understand where the memory leak may be occurring.
A common issue we see in reactive apps is a failure to dispose of resources as conditions change. Your app could create a large accumulation of publishers that infinitely grow. Every liveQuery and subscription in ditto must be explicitly stopped using the stop or cancel API. See snycing data for more information.
Debugging Blocked Transactions
This section only discusses blocked transactions on native platforms (e.g. iOS, Android, Windows, Linux). Ditto in web browsers operates sufficiently differently and isn’t covered here.
Blocked write transactions will automatically retry until they succeed. A blocked write transaction will never crash. Howewever, blocked write transactions are a common cause for poor database performance. Long running blocks are generally bad since they mean that nothing else can be writing to the database during this time. This could manifest itself as one of many problems:
- Unresponsive UI: an interaction might try to update a document, but is blocked by an existing write transaction
- Slow sync: new updates cannot be integrated into the store, since they’re blocked by another write transaction
A blocked write transaction can hint that something is wrong with the application code, or at a deeper level in Ditto. This page contains some tips & tricks to help understand the situation.
The process of unblocking is automatic and you don’t need to write any code to handle this. However, you can drastically reduce the chance of blocking transactions by making sure a device is only syncing the data it really needs.
When using OnlineWithAuthentication,
Send a POST request with a JSON stringified body to your server to ensure that the AWS-hosted Big Peer servers located in U.S. regions successfully send POST requests to your authentication server.
- authenticationExpiringSoon
- Since callback objects are only invoked when Ditto initializes and the client authentication certificate expires, do not create subscriptions inside callbacks.
- Keep a strong reference to the AuthClient for the duration of the Ditto object, otherwise the auth handler will become garbage collected at an inappropriate time.
Verify that your webhook provider name is correctly copied in the Ditto portal.
The provider name given to the Ditto Client must match a provider name in the Portal (e.g., my-auth).
Ensure that you keep a reference to the AuthClient in scope for the duration that Ditto is also active. You should attach the dittoAuth variable to the object so it does not get garbage collected.
For example:
- Turn Use Location on
- Turn Bluetooth Scanning on
- Are permissions set correctly? From SDK Setup Guides, refer to the installation article for your language)
- Go to your OS-level permissions for Bluetooth and clear the app permissions for your application.
- Delete the app, install it again, and open it. Does it ask for Bluetooth permissions?
- Android only: are you calling ditto.refreshPermissions()?
- Are permissions set correctly? (From SDK Setup Guides, refer to the installation article for your language)
- Go to your OS-level permissions and clear the app permissions for your application.
- Delete the app, install it again, and open it. Does it ask for location permissions?
- If you are using a custom TransportConfig, make sure you have enabled all peer-to-peer transports using enableAllPeerToPeer().
- Are permissions set correctly? From SDK Setup Guides, refer to the installation article for your language)
- Are both devices connected to the same WiFi network?
- If your device has VPN enabled, then peers can fail to communicate. Ensure that your VPN supports UDP multicasting.
- Ensure that you are only creating a fixed number of live queries, with a smaller amount of data. Do not use findAll() because you can sync more data than the small peer can reasonably handle.
- Turn off verbose logging. Verbose logging can slow down replication considerably, especially with thousands of documents. Hence, it could look like that sync is stalling, but that can be indistinguishable from the logging mechanism slowing down ditto because it is trying to write too many lines.
- Look at the size of your ditto directory. Is it very large? Large databases will be slower. Try to query less data.
Use profiling tools for your platform to better understand where the memory leak may be occurring.
A common issue we see in reactive apps is a failure to dispose of resources as conditions change. Your app could create a large accumulation of publishers that infinitely grow. Every liveQuery and subscription in ditto must be explicitly stopped using the stop or cancel API. See snycing data for more information.
This section only discusses blocked transactions on native platforms (e.g. iOS, Android, Windows, Linux). Ditto in web browsers operates sufficiently differently and isn’t covered here.
Blocked write transactions will automatically retry until they succeed. A blocked write transaction will never crash. Howewever, blocked write transactions are a common cause for poor database performance. Long running blocks are generally bad since they mean that nothing else can be writing to the database during this time. This could manifest itself as one of many problems:
- Unresponsive UI: an interaction might try to update a document, but is blocked by an existing write transaction
- Slow sync: new updates cannot be integrated into the store, since they’re blocked by another write transaction
A blocked write transaction can hint that something is wrong with the application code, or at a deeper level in Ditto. This page contains some tips & tricks to help understand the situation.
The process of unblocking is automatic and you don’t need to write any code to handle this. However, you can drastically reduce the chance of blocking transactions by making sure a device is only syncing the data it really needs.
What is a “blocked” transaction?
At any given time, there can be only one write transaction. Any subsequent attempts to open another write transaction will become blocked until the existing write transaction finishes.
Read vs. Write Transactions
Read transactions operate differently to write transactions.
Read transactions cannot be blocked and can run in parallel with write transactions. Read transactions don’t block each other, don’t block write transactions, and aren’t blocked by write transactions.
If a write transaction is blocked, Ditto will log a message with increasing severity every 10s.
Time (t) a transaction has been blocked | Log Level |
---|---|
t ≤ 30s | DEBUG |
31s < t ≤ 120s | WARN |
120s < t | ERROR |
To see these logs in the database, it’s important to have a minimum log level set. Transactions that are blocked for over 2 minutes should always be visible in the logs.
If INFO level is used, then INFO, WARN, and ERROR messages will all be included in the logs. This means any write transactions blocking for more than 30s should always be visible in the logs.
Reading the Logs
If a write transaction is blocked, the device logs will look something like the following. In this example we can see a write transaction was blocked for a total of 150s (or 2.5 minutes).
As time progressed, Ditto complained more and more loudly (starting with DEBUG logs before eventually logging at ERROR level). Eventually the existing transaction finished and blocked transaction was was able to proceed.
The write transaction which was blocked was for a Ditto internal component. This is identified by “originator=Internal”.
The existing, long-running write transaction which was causing the block was a user call in the public SDK. This is identified by “blocked_by=User”. So a user-level write transaction is blocking some internal workload. This is not necessarily a problem, as the internal system will catch up eventually.
Originators
The three values you’ll see for originator and blocked_by are:
Originator / Blocked By | Description |
---|---|
“User" | Any user-level API which modifies data. |
“Internal" | An internal job (presence data, DB maintenance, etc.). |
“Replication" | Updating the store with data received via replication. |
Causes of Blocked Transactions
This section presents a few examples blocked transaction scenarios, how they would appear in logs, and what they might mean.
User blocks User
An application might block its own write transactions by performing multiple writes at the same time in different places. If one is slow (perhaps it does too much work, or perhaps it reaches out to external APIs, etc.) then the other write transactions will block until it finishes.
In logs, this scenario will look like the following:
LOG_LEVEL: Waiting for write transaction (elapsed: XXs), originator=User blocked_by=User
Note that “User” is blocking “User”. This could be temporary, but it could also be a deadlock which is much worse. See below.
You cannot start a write transaction within a write transaction. The result will be a deadlock which never resolves itself. This might manifest itself in the logs as:
Note that User blocking User doesn’t necessarily mean a deadlock. It might merely be a long running write transaction and this situation might be expected depending on the task. However, if the transaction never unblocks and the log messages at ERROR level continue forever - you have a strong indication that there’s a deadlock and should investigate the application code.
Replication blocks User
From the application code, it might not be obvious what the problem is. By looking in the logs, you might get a hint. For instance:
Here we can see that replication is blocking our user’s write transaction. This might happen if we’ve just received a large amount of sync data from another peer. Most commonly, happens during initial sync (either with the Big Peer, or joining a mesh for the first time, or re-joining a mesh after an extended period away).
This scenario is something to be aware of, but will resolve itself automatically once the sync is complete. The user transaction will eventually unblock and continue when replication has finished updating the store.
Replication blocks Replication
If you see the following in logs, it’s an indication that one replication session with a remote peer is blocking another:
This can happen when the device has received large amounts of data from multiple peers simultaneously and must update the database with the changes it received from each. Ditto can only update the database with sync data from remote peers one at a time, so the other updates must wait their turn.
This scenario is most likely to happen during a large initial sync - either with the Big Peer at the same time as with nearby devices, or just with multiple nearby devices when joining a mesh for the first time (or re-joining after an extended period away).
This scenario is something to be aware of, but will resolve itself automatically once the sync is complete. You can drastically reduce the chance of this type of blocking transaction by making sure each device is only syncing the data it really needs.
User or Replication blocks Internal
A long running write transaction might block an internal job. This scenario will be the least obvious to spot and it might not be obvious that it’s happening at all. It’s also the least likely to cause actual problems.
In logs, you might see the following:
or:
That is, something is blocking Internal.
If you observe slow and unreliable Ditto Bus connections, an inaccurate presence viewer, or slow and unreliable multi-hop data replication, a Write transaction may be preventing the internal mesh presence component to persist data in regular 30-second intervals.
Once the blocking Write transaction is complete, the observed issue should resolve itself automatically.
If you continue to have problems, please contact us via Human Support.