Running SQL Queries in Databricks Notebooks

To run SQL queries in a Databricks notebook, you can use SQL cells directly or execute SQL queries using Python with the `spark.sql()` function. Here’s how you can do it:

  1. Create a New Notebook: Start by creating a new notebook in your Databricks workspace. Click on New in the sidebar and select Notebook to open a blank notebook.
  2. Query a Table: To query a table using SQL, you can use a SQL cell. For example, to query the `samples.nyctaxi.trips` table, use the following SQL command:
    SELECT * FROM samples.nyctaxi.trips
  3. Run the Query: Press Shift+Enter to run the SQL query in the cell. The results will appear below the cell.
  4. Using Python: Alternatively, you can execute SQL queries using Python by leveraging the `spark.sql()` function. Here’s an example:
    df = spark.sql("SELECT * FROM samples.nyctaxi.trips")

    Then, display the results using:

    display(df)

Frequently Asked Questions

Bottom Line: Running SQL queries in Databricks notebooks is straightforward and flexible, allowing you to use SQL cells directly or execute queries via Python. This flexibility, combined with features like HTML display and query scheduling, makes Databricks a powerful tool for data analysis.


👉 Hop on a short call to discover how Fog Solutions helps navigate your sea of data and lights a clear path to grow your business.