GROUP BY and PARTITION BY are used in SQL to organize data, but they serve different purposes and are used in other contexts. Here’s a detailed comparison to help you understand the differences.
GROUP BY
Purpose: Used to group rows that have the same values in specified columns into summary rows, often used with aggregate functions like COUNT()
, SUM()
, AVG()
, etc.
- Context: Works with aggregate functions to summarize data by groups.
- Output: Returns one row per group.
Example:
If you want to calculate the total salary per department:
SELECT deptno, SUM(sal) AS total_salary
FROM emp_copy
GROUP BY deptno;
Explanation: Group all rows by deptno
and calculate each department's total salary.
PARTITION BY
- Purpose: Used within window functions to divide the result set into partitions (subgroups) to perform calculations across these partitions.
- Context: Used inside window functions like
ROW_NUMBER()
,RANK()
,SUM()
, etc., but does not reduce the number of rows; instead, it allows you to calculate values over partitions of the data. - Output: Keeps all rows in the result set but performs the calculation within each partition.
Example:
To rank salaries within each department:
SELECT sal, deptno,
ROW_NUMBER() OVER (PARTITION BY deptno ORDER BY sal DESC) AS rank
FROM emp_copy;
Explanation: Partitions the data by deptno
, and then within each partition, it orders the rows by sal
in descending order and assigns a rank.
Key Differences:
GROUP BY
: Aggregates data, reducing the number of rows.PARTITION BY
: Partitions data without reducing the number of rows, allowing window functions to operate within those partitions.
References
Originally published at http://srinimf.com on August 29, 2024.