Using the GROUP BY Clause in Databricks

The GROUP BY clause in Databricks SQL is used to group rows based on specified expressions and compute aggregations on these groups using aggregate functions like SUM, AVG, COUNT, etc. It supports advanced aggregations through GROUPING SETS, CUBE, and ROLLUP clauses.

Syntax and Parameters

The basic syntax is:

      GROUP BY group_expression [, ...] [ WITH ROLLUP | WITH CUBE ]
    

group_expression can be a column name, column position, or an expression. For example:

      GROUP BY a, b
      GROUP BY a + b
    

GROUPING SETS allow grouping by multiple sets of columns, which is equivalent to performing a UNION ALL of separate GROUP BY queries for each set.

      GROUP BY GROUPING SETS ((warehouse), (product))
    

ROLLUP and CUBE are shorthand for GROUPING SETS and provide hierarchical and comprehensive aggregations, respectively.

      GROUP BY warehouse, product WITH ROLLUP
      GROUP BY warehouse, product WITH CUBE
    

Example Usage

Here’s an example query that uses GROUP BY with SUM and AVG aggregate functions:

      SELECT year, product, SUM(sales) AS total_sales, AVG(sales) AS average_sales
      FROM sales_table
      GROUP BY year, product
    

Frequently Asked Questions

Bottom Line

The GROUP BY clause in Databricks SQL is powerful and flexible, allowing for complex data grouping and aggregation. By leveraging features like GROUPING SETS, ROLLUP, and CUBE, you can efficiently analyze data across multiple dimensions.


👉 Hop on a short call to discover how Fog Solutions helps navigate your sea of data and lights a clear path to grow your business.