Nick Langa't — Product Engineer & Solutions Architect

Introduction

The Django ORM makes simple queries easy. Order.objects.filter(status="paid") is readable, fast to write, and hard to get wrong. But most real applications eventually hit a wall: the queries get complex, the Python loop counting and grouping starts to show up in profiler traces, and the database — which was designed for exactly this kind of work — sits mostly idle.

The ORM has first-class support for aggregation, annotation, window functions, and correlated subqueries. Most Django developers never get past filter() and values(). This article closes that gap with patterns that replace Python-side computation with efficient SQL, using a simple e-commerce model as a running example.

orders/models.py

from django.db import models

class Customer(models.Model):
    name = models.CharField(max_length=255)
    email = models.EmailField(unique=True)
    created_at = models.DateTimeField(auto_now_add=True)

class Product(models.Model):
    name = models.CharField(max_length=255)
    price = models.DecimalField(max_digits=10, decimal_places=2)
    stock = models.PositiveIntegerField(default=0)

class Order(models.Model):
    class Status(models.TextChoices):
        PENDING  = "pending"
        PAID     = "paid"
        SHIPPED  = "shipped"
        RETURNED = "returned"

    customer   = models.ForeignKey(Customer, on_delete=models.CASCADE, related_name="orders")
    status     = models.CharField(max_length=20, choices=Status.choices, default=Status.PENDING)
    created_at = models.DateTimeField(auto_now_add=True)

class OrderItem(models.Model):
    order    = models.ForeignKey(Order, on_delete=models.CASCADE, related_name="items")
    product  = models.ForeignKey(Product, on_delete=models.PROTECT, related_name="order_items")
    quantity = models.PositiveIntegerField()
    price    = models.DecimalField(max_digits=10, decimal_places=2)  # price at time of purchase

aggregate()

aggregate() computes a single summary value across the entire queryset. It terminates the queryset — it returns a dictionary, not another queryset.

python

from django.db.models import Count, Sum, Avg, Max, Min

# Total revenue from paid orders
result = OrderItem.objects.filter(
    order__status="paid"
).aggregate(
    total_revenue=Sum(F("quantity") * F("price"))
)
# {"total_revenue": Decimal("48291.50")}

# Order statistics in one query
stats = Order.objects.aggregate(
    total_orders=Count("id"),
    avg_value=Avg("items__price"),
    max_value=Max("items__price"),
)

The common mistake is running aggregate in a Python loop — once per row, once per category. If you find yourself doing that, you want annotate() instead, or a values().annotate() group-by.

python

# Bad — N queries, one per customer
for customer in Customer.objects.all():
    total = customer.orders.aggregate(total=Count("id"))["total"]
    print(customer.name, total)

# Good — 1 query
customers = Customer.objects.annotate(order_count=Count("orders"))
for customer in customers:
    print(customer.name, customer.order_count)

annotate()

annotate() adds a computed column to each row in the queryset. The result is still a queryset — you can filter on it, order by it, and chain further operations.

python

from django.db.models import Count, Sum, F, ExpressionWrapper, DecimalField

# Annotate each order with its total value
orders = Order.objects.annotate(
    total_value=Sum(
        ExpressionWrapper(
            F("items__quantity") * F("items__price"),
            output_field=DecimalField()
        )
    )
)

# Now filter and order on the annotation — all in SQL
big_orders = orders.filter(total_value__gt=500).order_by("-total_value")

Group-by queries use values() followed by annotate(). The values() call tells Django what to group on:

python

# Revenue per product
revenue_by_product = (
    OrderItem.objects.filter(order__status="paid")
    .values("product__name")
    .annotate(
        total_qty=Sum("quantity"),
        total_revenue=Sum(ExpressionWrapper(
            F("quantity") * F("price"),
            output_field=DecimalField()
        ))
    )
    .order_by("-total_revenue")
)

# Orders per month
from django.db.models.functions import TruncMonth

monthly = (
    Order.objects.filter(status="paid")
    .annotate(month=TruncMonth("created_at"))
    .values("month")
    .annotate(order_count=Count("id"))
    .order_by("month")
)

Note

Order of values() and annotate() matters

values().annotate() is a GROUP BY. annotate().values() is something different — it annotates each row first, then selects fields. The first form is what you want for aggregation per category. If your query is producing more rows than expected, check the order.

F() expressions

F() references a model field in a database expression without pulling the value into Python. This is important for two reasons: it avoids race conditions in updates, and it lets you do field arithmetic entirely in SQL.

python

from django.db.models import F

# Bad — read-modify-write in Python, race condition possible
product = Product.objects.get(pk=product_id)
product.stock -= quantity
product.save()

# Good — atomic decrement in SQL
Product.objects.filter(pk=product_id).update(stock=F("stock") - quantity)

Use F() in filters to compare two fields on the same row:

python

# Products where the sale price is lower than the original price
discounted = Product.objects.filter(sale_price__lt=F("price"))

# Orders where items were sold above their current product price
marked_up = OrderItem.objects.filter(price__gt=F("product__price"))

Combine F() with ExpressionWrapper when the result type isn't obvious to Django:

python

from django.db.models import ExpressionWrapper, DecimalField

# Annotate each item with its line total
items = OrderItem.objects.annotate(
    line_total=ExpressionWrapper(
        F("quantity") * F("price"),
        output_field=DecimalField(max_digits=12, decimal_places=2)
    )
)

Q objects

Q() objects let you build complex filter conditions with | (OR), & (AND), and ~ (NOT). You can't do OR with chained .filter() calls — those always AND.

python

from django.db.models import Q

# Orders that are either returned OR pending for more than 7 days
from django.utils import timezone
from datetime import timedelta

stale_cutoff = timezone.now() - timedelta(days=7)

flagged_orders = Order.objects.filter(
    Q(status="returned")
    | Q(status="pending", created_at__lt=stale_cutoff)
)

# Products that are low on stock OR have never been ordered
no_activity = Product.objects.filter(
    Q(stock__lt=10) | ~Q(order_items__isnull=False)
).distinct()

Q objects compose cleanly with Python logic, which is useful when filters are built dynamically:

python

def search_orders(status=None, customer_email=None, min_value=None):
    q = Q()

    if status:
        q &= Q(status=status)
    if customer_email:
        q &= Q(customer__email__icontains=customer_email)
    if min_value is not None:
        q &= Q(total_value__gte=min_value)

    return Order.objects.annotate(
        total_value=Sum(ExpressionWrapper(
            F("items__quantity") * F("items__price"),
            output_field=DecimalField()
        ))
    ).filter(q)

Subquery and OuterRef

Subquery lets you embed a correlated subquery as an annotation or filter. OuterRef references a field from the outer queryset inside the subquery — it's the Django equivalent of a SQL correlated reference.

A common use case: annotate each customer with their most recent order date without a JOIN that duplicates rows:

python

from django.db.models import OuterRef, Subquery

# Latest order date per customer — one subquery, no duplication
latest_order = Order.objects.filter(
    customer=OuterRef("pk")
).order_by("-created_at").values("created_at")[:1]

customers = Customer.objects.annotate(
    last_order_at=Subquery(latest_order)
)

Get the most expensive item in each order:

python

most_expensive_item = OrderItem.objects.filter(
    order=OuterRef("pk")
).order_by("-price").values("product__name")[:1]

orders = Order.objects.annotate(
    top_item=Subquery(most_expensive_item)
)

Use Exists() — a specialised, optimised form of Subquery — for boolean presence checks:

python

from django.db.models import Exists

# Customers who have at least one paid order
high_value_customers = Customer.objects.filter(
    Exists(
        Order.objects.filter(
            customer=OuterRef("pk"),
            status="paid"
        )
    )
)

✦ Tip

Exists() is faster than Count() > 0

Exists() generates a SQL EXISTS clause and short-circuits on the first match. Count() scans all matching rows. If you only need to know whether rows exist — not how many — always use Exists().

Prefetch with queryset=

Most developers know select_related (JOIN for FK/OneToOne) and prefetch_related (separate query for M2M/reverse FK). Fewer know the most powerful form: Prefetch with a custom queryset. It lets you filter, annotate, and order the prefetched data, and control what attribute it's stored under.

python

from django.db.models import Prefetch

# For each customer, prefetch only their paid orders,
# and on each order, prefetch items with their product names
paid_orders = Order.objects.filter(
    status="paid"
).select_related("customer").prefetch_related(
    Prefetch("items", queryset=OrderItem.objects.select_related("product"))
)

customers = Customer.objects.prefetch_related(
    Prefetch(
        "orders",
        queryset=paid_orders,
        to_attr="paid_orders"  # store under a custom attribute
    )
)

for customer in customers:
    for order in customer.paid_orders:  # no extra queries
        print(order.id, [item.product.name for item in order.items.all()])

Without Prefetch(queryset=...), prefetch_related fetches all related objects. With it, you can apply any filter or annotation to the prefetch queryset, and the result is stored on the parent object so no additional queries fire in the template or serializer.

⚠ Gotcha

Don't filter prefetched querysets after the fact

Once you access order.items.all() on a prefetched queryset, Django uses the cache. But if you call order.items.filter(quantity__gt=1), Django hits the database again — the cache is bypassed because it's a new queryset. Apply all filters in the Prefetch(queryset=...) or in Python after fetching.

values(), only(), defer()

Model instances carry overhead: Python object creation, field conversion, and the full object graph. When you only need a subset of fields — for a JSON response, a CSV export, or a background aggregation — there are three tools to reduce that overhead.

values() returns dictionaries instead of model instances. It's the right choice when you're serialising to JSON or feeding into a non-ORM path:

python

# Returns list of dicts — no model instantiation
orders = Order.objects.filter(status="paid").values(
    "id", "created_at", "customer__email"
)

only() returns model instances but defers all fields not listed. Use it when you need model methods or want to selectively access fields without pulling the whole row:

python

# Full model instances, but only id and status fetched initially
orders = Order.objects.only("id", "status")

# Accessing a deferred field fires an extra query per row — use carefully
for order in orders:
    print(order.status)         # no extra query
    print(order.created_at)     # extra query per row — avoid

defer() is the inverse: fetch everything except the listed fields. Useful when a model has a large text or JSON field you don't need most of the time:

python

# Fetch everything except the large metadata blob
orders = Order.objects.defer("metadata")

Use values() for read-only serialisation paths. Use only() when you need model instances but want to skip expensive columns. Use defer() when one field is disproportionately large. In all cases, verify the actual SQL with str(queryset.query) or connection.queries.

Summary

The Django ORM can express most of the SQL you'd write by hand. The goal isn't to avoid SQL — it's to let the database do what it's good at instead of pulling rows into Python and computing there. The patterns that pay off most:

Use aggregate() for single summary values across a queryset.
Use annotate() to add computed columns — they're filterable, orderable, and stay in SQL.
Use values().annotate() for GROUP BY queries. Remember: order matters.
Use F() for field references in expressions and for atomic in-database updates.
Use Q() for OR and NOT conditions, and to compose filters dynamically.
Use Subquery + OuterRef for correlated subqueries; use Exists() for presence checks.
Use Prefetch(queryset=...) when you need to filter or annotate prefetched relations.
Use values() for serialisation paths where you don't need model instances.
Always verify generated SQL with str(queryset.query) or django-debug-toolbar — the ORM can surprise you.