> ## Documentation Index > Fetch the complete documentation index at: https://docs.ditto.live/llms.txt > Use this file to discover all available pages before exploring further. # SELECT > The `SELECT` operation, once executed, retrieves documents from a collection and uses clauses like `WHERE` to specify conditions for filtering the documents to return. DQL now supports projections and aggregates in addition to `SELECT *` operations. You can select specific fields, perform calculations, and use aggregate functions like `COUNT`, `SUM`, `AVG`, `MIN`, and `MAX`. ## SELECT Statements The following syntax outlines the basic structure and optional clauses you can use within your `SELECT` statements: ```sql DQL theme={null} SELECT [DISTINCT] projection FROM your_collection_name [WHERE condition] [GROUP BY groupby_expression_1, groupby_expression_2, ...] [HAVING condition] [ORDER BY orderby_expression_1, orderby_expression_2, ... [ASC|DESC]] [LIMIT limit_value] [OFFSET number_of_documents_to_skip] ``` SELECT Syntax Diagram

Where `projection` can be: * `*` - Returns all fields from the documents * `field1, field2, ...` - Returns specific fields * `expression AS alias` - Returns calculated values with custom names * Aggregate functions like `COUNT(*)`, `SUM(field)`, etc. For instance, retrieve all documents in the `cars` collection `WHERE` the `color` property is set to the value 'blue' : ```sql DQL theme={null} SELECT * FROM cars WHERE color = 'blue' ``` ## Clauses for Filtering The following table provides an overview of the different clauses you can use to define specific conditions and calculations within your DQL `SELECT` statements to provide more granular control over your queries: | **Clause** | **Description** | | ---------- | ---------------------------------------------------------------------------------------------------------------------- | | FROM | The required clause specifying the collection containing the documents for retrieval. (See [FROM](/dql/select#from)) | | WHERE | Applies filtering conditions to restrict the documents included in the result set. (See [WHERE](/dql/select#where)) | | GROUP BY | Groups documents based on one or more expressions for aggregate calculations. (See [GROUP BY](/dql/select#group-by)) | | HAVING | Filters grouped results based on aggregate conditions. (See [HAVING](/dql/select#having)) | | ORDER BY | Specifies the sorting order of the result set based on one or more expressions. (See [ORDER BY](/dql/select#order-by)) | | LIMIT | Restricts the number of documents included in the result set. (See [LIMIT](/dql/select#limit)) | | OFFSET | Skips a specific number of documents before returning the result set. (See [OFFSET](/dql/select#offset)) | ### FROM Required in each `SELECT` statement you write in DQL, the `FROM` element identifies the collection for document retrieval. ```sql DQL theme={null} SELECT * FROM your_collection_name ``` FROM Clause Syntax Diagram

For example, a `SELECT` statement querying documents from the `cars` collection: ```sql DQL theme={null} SELECT * FROM cars ``` #### USE IDS The optional `USE IDS` clause allows you to efficiently retrieve specific documents by their IDs without performing a collection scan. This is more performant than using `WHERE _id IN (...)` filters. **Syntax:** ```sql DQL theme={null} -- Inline IDs SELECT * FROM collection_name USE IDS (id1, id2, id3) -- Using a parameter array SELECT * FROM collection_name USE IDS LIST :id_array ``` USE IDS Syntax Diagram

**Examples:** ```sql DQL theme={null} -- Retrieve specific documents by ID SELECT * FROM cars USE IDS ('123', '456', '789') -- Using a parameter array (recommended for dynamic queries) SELECT * FROM cars USE IDS LIST :car_ids ``` ```swift Swift theme={null} let carIds = ["123", "456", "789"] let results = await ditto.store.execute( query: "SELECT * FROM cars USE IDS LIST :car_ids", arguments: ["car_ids": carIds] ) ``` ```kotlin Kotlin theme={null} val carIds = listOf("123", "456", "789") val results = ditto.store.execute( "SELECT * FROM cars USE IDS LIST :car_ids", mapOf("car_ids" to carIds) ) ``` ```javascript JS theme={null} const carIds = ["123", "456", "789"]; const results = await ditto.store.execute( "SELECT * FROM cars USE IDS LIST :car_ids", { car_ids: carIds } ); ``` ```java Java theme={null} List carIds = Arrays.asList("123", "456", "789"); ditto.store.execute( "SELECT * FROM cars USE IDS LIST :car_ids", Collections.singletonMap("car_ids", carIds) ); ``` ```csharp C# theme={null} var carIds = new List { "123", "456", "789" }; await ditto.Store.ExecuteAsync( "SELECT * FROM cars USE IDS LIST :car_ids", new Dictionary { { "car_ids", carIds } } ); ``` ```cpp C++ theme={null} std::vector carIds = {"123", "456", "789"}; std::map> args; args["car_ids"] = carIds; ditto.get_store().execute( "SELECT * FROM cars USE IDS LIST :car_ids", args ).get(); ``` ```rust Rust theme={null} let car_ids = vec!["123", "456", "789"]; ditto.store().execute_v2(( "SELECT * FROM cars USE IDS LIST :car_ids", serde_json::json!({ "car_ids": car_ids }) )).await?; ``` ```dart Dart theme={null} final carIds = ["123", "456", "789"]; await ditto.store.execute( "SELECT * FROM cars USE IDS LIST :car_ids", arguments: {"car_ids": carIds} ); ``` **Performance Tip**: Where possible, simple equality filters on `_id` (like `WHERE _id = '123'`) are automatically optimized internally similar to USE IDS. However, for explicit control and guaranteed performance, USE IDS is recommended. See also the `#auto_use_ids` directive in [Directives](/dql/directives). #### Collection Aliasing You can assign an alias to a collection in the FROM clause to create shorter, more readable queries. The `AS` keyword is optional: ```sql DQL theme={null} -- With AS keyword SELECT * FROM cars AS c WHERE c.color = 'blue' -- Without AS keyword SELECT * FROM cars c WHERE c.color = 'blue' ``` Collection aliases are particularly useful in: * **Complex queries**: Shortening long collection names for readability * **Qualified field references**: Explicitly referencing fields from a specific collection * **Query directives**: Specifying collection-level directives (see [Directives](/dql/directives)) ```sql DQL theme={null} -- Using collection alias with qualified field references SELECT c.make, c.model, c.year FROM cars c WHERE c.color = 'blue' ORDER BY c.year DESC -- Using collection alias with directives /*+ {"c": {"#index": "ix_color"}} */ SELECT * FROM cars c WHERE c.color = 'blue' ``` Collection aliases follow the same identifier rules as projection aliases and can also use backticks for special characters or reserved words. ### WHERE The `WHERE` clause filters data based on either an expression or a set of conditions that narrow the result set returned to you: ```sql DQL theme={null} SELECT * FROM your_collection_name WHERE [condition] ``` WHERE Clause Syntax Diagram

For example, here is a basic `SELECT` statement querying documents from the `cars` collection based on a given address: ```sql DQL theme={null} SELECT * FROM cars WHERE location.address = '123 Main St, San Francisco, CA 98105' ``` To demonstrate a more complex query, here is a `SELECT` statement that queries using multiple expressions and logical operators to further refine the criteria for document retrieval: ```sql DQL theme={null} SELECT * FROM cars WHERE color = 'blue' AND features.trim = 'Standard' OR features.mileage > 10000 ``` ### ORDER BY With the `ORDER BY` clause, if you'd like, you can integrate calculations or expressions in your `SELECT` statement. Then sort the resulting documents to return in either ascending (`ASC`) or descending (`DESC`) alphabetical order: ```sql DQL theme={null} SELECT * FROM your_collection_name ORDER BY expression_1, expression_2, ... [ASC|DESC] ``` ORDER BY Clause Syntax Diagram

For example, here is a simple `SELECT` statement that uses the `ORDER BY` clause to query and sort documents from the `cars` collection in descending (`DESC`) alphabetical order based on the field value set for the `color` property: ```sql DQL theme={null} SELECT * FROM cars ORDER BY color DESC ``` In this syntax: * `your_collection_name` is the name of the collection from which you want to retrieve the data. * `expression_1, expression_2, ...` are the expressions evaluated to sort the result. Expressions are resolved in order. * `[ASC|DESC]` is an optional parameter that specifies the sort order. If omitted, the default sort order is ascending (`ASC`). To sort in descending order, you can specify `DESC`. **Example** In this example, the result set from the query will be sorted in descending order based on the values in the field: ```sql DQL theme={null} -- Sort by a given field_name SELECT * FROM your_collection_name ORDER BY field_name DESC ``` For instance, here `"blue"` cars return first and other cars sort by the natural order in the collection: ```sql DQL theme={null} SELECT * FROM cars ORDER BY color = 'blue' ``` #### **Sort Order by Object Type** In DQL, the hierarchy for comparing and sorting objects varies based on the following criteria: * If ascending (`ASC`) order operations: * `boolean` * `number` * `binary` * `string` * `array` * `object` * `null` * `missing` * If descending (`DESC`) order operations, sorting order is reversed: * `missing` * `null` * `object` * `array` * `string` * `binary` * `number` * `boolean` * If evaluating values, `true` results are prioritized and ordered first followed by `false` results. #### Expressing Sort Order Unless explicitly defined as `DESC` in your query, Ditto defaults to sorting in ascending (`ASC`). So, if you want to sort in ascending order, you do *not* have to express that in your query. ### LIMIT The `LIMIT` clause is used to restrict the number of documents returned by a query, allowing you to specify a maximum limit on the number of documents to be included in the result set: ```sql DQL theme={null} SELECT * FROM your_collection_name LIMIT limit_value ``` In this syntax: * `your_collection_name` is the name of the collection from which you want to retrieve the data. * `limit_value` is the maximum number of documents you want to include in the result set. For example, only return the first 10 documents from the `your_collection_name` collection: ```sql DQL theme={null} SELECT * FROM your_collection_name LIMIT 10 ``` ### OFFSET The `OFFSET` clause is used to specify the number of records to skip before starting to return documents from the query result: ```sql DQL theme={null} SELECT * FROM your_collection_name OFFSET number_of_items_to_skip ``` In this syntax: * `your_collection_name` is the name of the collection from which you want to retrieve the data. * `number_of_items_to_skip` is the number of items before returning the result set. Using `OFFSET` with `LIMIT` is a common way to utilize `OFFSET`; for example: ```sql DQL theme={null} SELECT * FROM your_collection_name LIMIT 10 OFFSET 10 ``` ## Projections Projections allow you to specify exactly which fields or calculated values to return from your query, rather than returning entire documents. This reduces data transfer and processing overhead. Projection Syntax Diagram

### Basic Field Selection Select specific fields from documents: ```sql DQL theme={null} SELECT make, model, year FROM cars ``` ### Excluding Fields To exclude specific fields from a projection, use the `MISSING` keyword. This can be combined with `*` to select all fields except those explicitly excluded: ```sql DQL theme={null} -- Given a document: { "_id": "abc", "make": "Toyota", "model": "Camry", "color": "blue", "vin": "1234" } -- Exclude "vin" and "color", keeping everything else: SELECT cars.*, MISSING vin, MISSING color FROM cars -- Result: { "_id": "abc", "make": "Toyota", "model": "Camry" } ``` ### Aliasing Use aliases to rename fields in your results: ```sql DQL theme={null} SELECT make AS manufacturer, model AS car_model FROM cars ``` Default aliases are assigned if not specified: * Simple field references use the field name as the alias * Expressions get aliases like `($1)`, `($2)`, etc. Aliases bind to the element immediately prior in the statement. `SELECT COUNT(*) AS car_count FROM cars` names the output field, but `SELECT COUNT(*) FROM cars AS car_count` aliases the `cars` collection — `COUNT(*)` falls back to its default alias. #### Alias Constraints Aliases must follow these rules: * **Uniqueness**: Each alias in a SELECT list must be unique. Duplicate aliases will raise an error. * **Identifier Rules**: Aliases must be valid identifiers following field naming conventions. * **Special Characters**: Use backticks (grave accents) to quote aliases containing special characters or reserved words. ```sql DQL theme={null} -- Using backticks for special characters SELECT price AS `final-price`, model AS `car-model` FROM cars -- Using backticks for reserved words SELECT status AS `order`, type AS `select` FROM orders ``` #### Alias Scope Projection aliases defined in the SELECT list can be referenced by the **ORDER BY** clause to sort by an aliased expression: ```sql DQL theme={null} SELECT make, price * 0.9 AS sale_price FROM cars ORDER BY sale_price DESC ``` `GROUP BY` and `HAVING` do **not** accept projection aliases. They must reference the underlying source expression or aggregate function directly: use `GROUP BY color` (the source field) rather than `GROUP BY c` (a projection alias), and `HAVING COUNT(*) > 5` (the aggregate) rather than `HAVING car_count > 5` (the alias). ### Calculated Fields Create new fields using expressions: ```sql DQL theme={null} SELECT make, model, price * 0.9 AS discounted_price FROM cars ``` ### DISTINCT Results Remove duplicate rows from your results: ```sql DQL theme={null} SELECT DISTINCT color FROM cars ``` **DISTINCT Performance Considerations:** `DISTINCT` has performance implications you should understand: **Memory Buffering:** * All projections must be buffered in memory to enforce uniqueness * This increases memory requirements proportional to the number of unique results * Memory usage grows with the result set size **Redundant Usage:** * When the `_id` field is included (e.g., `SELECT DISTINCT *`), DISTINCT is redundant * Each document already has a unique `_id`, making DISTINCT unnecessary * This adds overhead without benefit - avoid in these cases **Best Practices:** * Only use DISTINCT when you truly need unique combinations of projected fields * Avoid DISTINCT when projecting complete documents (`SELECT DISTINCT *`) * Consider if your application logic can handle duplicates instead * For large result sets, evaluate if DISTINCT is worth the memory cost ## Aggregate Functions Aggregate functions perform calculations across multiple documents and return a single result. DQL supports the following aggregate functions: For a quick reference table of all aggregate functions and their syntax, see [Operator Expressions - Aggregate Functions](/dql/operator-expressions#aggregate-functions). ### COUNT Count documents or non-null values: ```sql DQL theme={null} -- Count all documents SELECT COUNT(*) FROM cars -- Count non-null values in a field SELECT COUNT(color) FROM cars -- Count distinct values SELECT COUNT(DISTINCT color) FROM cars ``` ### SUM Calculate the sum of numeric values: ```sql DQL theme={null} SELECT SUM(price) AS total_value FROM cars -- Sum distinct values only SELECT SUM(DISTINCT price) FROM cars ``` ### AVG Calculate the average of numeric values: ```sql DQL theme={null} SELECT AVG(mileage) AS average_mileage FROM cars ``` ### MIN and MAX Find minimum and maximum values: ```sql DQL theme={null} SELECT MIN(year) AS oldest_year, MAX(year) AS newest_year FROM cars ``` ### MID Calculate the midpoint between minimum and maximum values: ```sql DQL theme={null} SELECT MID(price) AS midpoint_price FROM cars -- With DISTINCT SELECT MID(DISTINCT price) FROM cars ``` The `MID` function returns the midpoint value between `MIN(expr)` and `MAX(expr)`. Non-numeric values are silently ignored. ### MEDIAN Calculate the median (middle) value: ```sql DQL theme={null} SELECT MEDIAN(price) AS median_price FROM cars -- With DISTINCT SELECT MEDIAN(DISTINCT mileage) FROM cars ``` The `MEDIAN` function returns the middle value when all values are sorted. Non-numeric values are silently ignored. ### Combining Aggregates Use multiple aggregate functions in a single query: ```sql DQL theme={null} SELECT COUNT(*) AS total_cars, AVG(price) AS avg_price, MIN(price) AS lowest_price, MAX(price) AS highest_price FROM cars ``` ### GROUP BY Group results and calculate aggregates for each group: ```sql DQL theme={null} SELECT color, COUNT(*) AS car_count, AVG(price) AS avg_price FROM cars GROUP BY color ``` GROUP BY Clause Syntax Diagram

When using aggregates with `GROUP BY`, non-aggregate projections must be part of the `GROUP BY` clause: ```sql DQL theme={null} -- This works: 'make' is in GROUP BY SELECT make, COUNT(*) AS model_count FROM cars GROUP BY make -- This would fail: 'model' is not in GROUP BY -- SELECT make, model, COUNT(*) FROM cars GROUP BY make ``` ### HAVING Filter grouped results based on aggregate values: ```sql DQL theme={null} SELECT color, COUNT(*) AS car_count FROM cars GROUP BY color HAVING COUNT(*) > 5 ``` HAVING Clause Syntax Diagram

Aggregate functions form a "dam" in the execution pipeline - all documents must be processed before results can be returned. This differs from non-aggregate queries which can stream results. ### Aggregate Function Behavior Understanding how aggregate functions work internally helps optimize query performance: **Memory Requirements:** * Aggregates accumulate results per group, so memory usage depends on the number of groups * A small number of groups requires little memory regardless of document count * A large number of groups increases memory requirements proportionally * `DISTINCT` aggregates must buffer all distinct values in memory, which can significantly increase memory usage with many unique values **MISSING Values:** * If no documents qualify for an aggregate in a group, the result is `MISSING` (absent from results) * To always include a result, use conditional functions: `SUM(IFMISSING(field, 0))` * `MIN` and `MAX` silently ignore `MISSING` values * `COUNT` does not count `NULL`, `MISSING`, or `FALSE` values unless explicitly handled **Type Handling:** * `SUM`, `AVG`, `MID`, and `MEDIAN` silently ignore non-numeric values * `MIN` and `MAX` compare values using Ditto's standard type ordering rules * `COUNT(*)` is equivalent to `COUNT(true)` with no additional overhead **Performance Tips:** * Use `COUNT(*)` instead of `COUNT(field)` when counting all documents * Avoid `DISTINCT` with aggregates unless necessary due to memory overhead * Minimize the number of groups in `GROUP BY` for better memory efficiency * Use `HAVING` to filter groups rather than filtering after aggregation ### Advanced Aggregate Examples **Handling MISSING Values:** ```sql DQL theme={null} -- Always return 0 instead of MISSING when no documents match SELECT IFMISSING(SUM(price), 0) AS total_sales FROM orders WHERE status = 'completed' -- Use default value for missing fields during aggregation SELECT SUM(IFMISSING(price, 0)) AS total_with_defaults FROM products -- Count all non-missing, non-null values regardless of truthiness SELECT COUNT(NOT ISMISSINGORNULL(rating)) AS rated_items FROM reviews ``` **Conditional Aggregation with CASE:** ```sql DQL theme={null} -- Sum order values by status within a single query SELECT COUNT(*) AS total_orders, SUM(CASE WHEN status = 'completed' THEN final_price ELSE 0 END) AS completed_value, SUM(CASE WHEN status = 'pending' THEN price * discount ELSE 0 END) AS pending_value, SUM(CASE WHEN status = 'cancelled' THEN price ELSE 0 END) AS cancelled_value FROM orders -- Calculate weighted averages SELECT category, AVG(CASE WHEN priority = 'high' THEN price * 1.5 ELSE price END) AS weighted_avg_price FROM products GROUP BY category -- Sum with conditional multipliers SELECT SUM(CASE WHEN quantity > 100 THEN price * quantity * 0.9 WHEN quantity > 50 THEN price * quantity * 0.95 ELSE price * quantity END) AS total_revenue FROM order_items ``` **Complex Business Logic:** ```sql DQL theme={null} -- Calculate profit margin by category SELECT category, (SUM(sale_price) - SUM(cost_price)) / SUM(sale_price) * 100 AS profit_margin_pct, COUNT(*) AS items_sold FROM sales WHERE sale_date >= '2024-01-01' GROUP BY category HAVING (SUM(sale_price) - SUM(cost_price)) > 1000 -- Customer lifetime value with fallback for missing data SELECT customer_id, COUNT(*) AS total_orders, SUM(IFMISSING(order_total, 0)) AS lifetime_value, AVG(IFMISSING(order_total, 0)) AS avg_order_value, MAX(order_date) AS last_order_date FROM orders GROUP BY customer_id HAVING SUM(IFMISSING(order_total, 0)) > 500 -- Inventory health metrics SELECT warehouse_id, COUNT(*) AS total_items, SUM(CASE WHEN stock_level < reorder_point THEN 1 ELSE 0 END) AS items_needing_reorder, SUM(CASE WHEN stock_level = 0 THEN 1 ELSE 0 END) AS out_of_stock_items, AVG(IFMISSING(stock_level, 0)) AS avg_stock_level FROM inventory GROUP BY warehouse_id ``` **Filtering Within Aggregates:** ```sql DQL theme={null} -- Count only items matching specific conditions SELECT department, COUNT(*) AS total_employees, COUNT(CASE WHEN salary > 100000 THEN 1 END) AS high_earners, AVG(CASE WHEN years_experience > 5 THEN salary END) AS avg_senior_salary FROM employees GROUP BY department -- Calculate percentages within groups SELECT region, COUNT(*) AS total_sales, SUM(CASE WHEN product_type = 'premium' THEN 1 ELSE 0 END) * 100.0 / COUNT(*) AS premium_pct FROM sales GROUP BY region ``` **Handling NULL and MISSING Gracefully:** ```sql DQL theme={null} -- Coalesce with multiple fallbacks SELECT product_id, COUNT(*) AS review_count, AVG(COALESCE(rating, average_category_rating, 3.0)) AS effective_rating FROM reviews GROUP BY product_id -- Conditional counting with null handling SELECT status, COUNT(*) AS total, COUNT(completed_date) AS completed_with_date, COUNT(*) - COUNT(completed_date) AS missing_dates FROM tasks GROUP BY status -- Sum with NULL protection SELECT order_id, SUM(IFNULL(item_price * quantity, 0)) AS order_total, COUNT(*) AS item_count, SUM(IFNULL(discount_amount, 0)) AS total_discount FROM order_items GROUP BY order_id ``` **Multi-level Aggregations:** ```sql DQL theme={null} -- Nested calculations with aggregates SELECT category, COUNT(*) AS product_count, SUM(price * inventory_count) AS total_inventory_value, SUM(price * inventory_count) / COUNT(*) AS avg_value_per_product, MAX(price) - MIN(price) AS price_range FROM products GROUP BY category HAVING COUNT(*) >= 5 -- Time-based aggregations with conditionals SELECT DATE_FORMAT(created_at,'YYYY-MM-DD') AS order_date, COUNT(*) AS daily_orders, SUM(IFMISSING(total, 0)) AS daily_revenue, AVG(CASE WHEN total > 0 THEN total END) AS avg_order_value_excluding_zero FROM orders WHERE created_at >= DATE_CAST('2024-01-01','YYYY-MM-DD') GROUP BY DATE_FORMAT(created_at,'YYYY-MM-DD') ORDER BY order_date DESC ``` **Working with Arrays and Complex Fields:** ```sql DQL theme={null} -- Aggregate with array transformations SELECT seller_id, COUNT(*) AS total_products, AVG(array_length(tags)) AS avg_tags_per_product, SUM(CASE WHEN array_length(tags) > 5 THEN 1 ELSE 0 END) AS well_tagged_products FROM products GROUP BY seller_id -- Conditional aggregation on nested fields SELECT category, COUNT(*) AS total_items, AVG(IFMISSING(details.weight, 0)) AS avg_weight, SUM(CASE WHEN details.fragile = true THEN 1 ELSE 0 END) AS fragile_count FROM inventory GROUP BY category ```