Window functions are a powerful feature in databases that allow you to perform calculations across a set of rows related to the current row. They are commonly used in SQL queries to analyze and aggregate data within a specific window or range.

Syntax

The basic syntax for using window functions is as follows:

SELECT
  column1,
  column2,
  window_function(column3) OVER (PARTITION BY partition_column ORDER BY order_column ROWS BETWEEN N PRECEDING AND M FOLLOWING) AS result_column
FROM
  your_table;
  • window_function: The specific window function to be applied (e.g., SUM, AVG, ROW_NUMBER, etc.).
  • column1, column2, …: The columns you want to include in the result set.
  • PARTITION BY partition_column: Optional clause that divides the result set into partitions to which the window function is applied independently. It is used to group rows based on one or more columns, and the window function is then applied separately to each partition.
  • ORDER BY order_column: Defines the order in which the rows are processed by the window function. It specifies how the data is sorted within each partition.
  • ROWS BETWEEN N PRECEDING AND M FOLLOWING: Specifies the window frame, indicating the range of rows used by the window function. It defines the relative position of the current row within the partition.

Sample Data

Consider the following sample data for better illustration:

orders table:

customer_id order_date order_amount
1 2023-01-01 100
1 2023-02-01 150
2 2023-01-15 200
2 2023-02-10 120

Common Window Functions

1. ROW_NUMBER()

Assigns a unique integer to each row within a partition, based on the specified ordering.

SELECT
  customer_id,
  order_date,
  order_amount,
  ROW_NUMBER() OVER (PARTITION BY customer_id ORDER BY order_date) AS row_num
FROM
  orders;

Result:

customer_id order_date order_amount row_num
1 2023-01-01 100 1
1 2023-02-01 150 2
2 2023-01-15 200 1
2 2023-02-10 120 2

2. SUM()

Calculates the sum of a column within a specified window.

SELECT
  customer_id,
  order_date,
  order_amount,
  SUM(order_amount) OVER (PARTITION BY customer_id ORDER BY order_date) AS running_total
FROM
  orders;

Result:

customer_id order_date order_amount running_total
1 2023-01-01 100 100
1 2023-02-01 150 250
2 2023-01-15 200 200
2 2023-02-10 120 320

3. AVG()

Computes the average of a column over a specified window.

SELECT
  customer_id,
  order_date,
  order_amount,
  AVG(order_amount) OVER (PARTITION BY customer_id ORDER BY order_date) AS avg_amount
FROM
  orders;

Result:

customer_id order_date order_amount avg_amount
1 2023-01-01 100 100
1 2023-02-01 150 125
2 2023-01-15 200 200
2 2023-02-10 120 160

4. LEAD() and LAG()

Accesses data from subsequent or preceding rows within the partition.

SELECT
  customer_id,
  order_date,
  order_amount,
  LEAD(order_amount) OVER (PARTITION BY customer_id ORDER BY order_date) AS next_order_amount,
  LAG(order_amount) OVER (PARTITION BY customer_id ORDER BY order_date) AS prev_order_amount
FROM
  orders;

Result:

customer_id order_date order_amount next_order_amount prev_order_amount
1 2023-01-01 100 150 NULL
1 2023-02-01 150 NULL 100
2 2023-01-15 200 120 NULL
2 2023-02-10 120 NULL 200

Window functions are a versatile tool in SQL, providing a way to perform complex analyses over defined windows or partitions. They enhance the capabilities of queries, allowing for efficient and concise data manipulation and analysis. The OVER clause, along with PARTITION BY and ORDER BY, enables fine-grained control over the window within which the window function operates, making it a powerful tool for data analysis.