To run a Python script in Databricks, you can follow these steps:
Using a Databricks Notebook
- Open Databricks Workspace: Log into your Databricks account and navigate to your workspace.
- Create or Open a Notebook:
- Click on “New” and select “Notebook.”
- Choose Python as the language for the notebook.
- Attach to a Cluster:
- Ensure that your notebook is attached to a running cluster. You can select a cluster from the dropdown menu at the top of the notebook.
- Write or Paste Python Code:
- In a cell within the notebook, write or paste your Python script.
- Run the Script:
- Click on the “Run” button (or press
Shift + Enter
) to execute the code in the cell.
- Click on the “Run” button (or press
- Import External Libraries:
- If your script requires external libraries, you can install them using
%pip install
in a notebook cell before importing them.
- If your script requires external libraries, you can install them using
Using Databricks Jobs
- Create a Job:
- Navigate to the “Jobs” section in Databricks.
- Configure the Job:
- Click on “Create Job” and configure it by specifying the notebook or a Python file as the task.
- Schedule and Automate:
- You can set up a schedule for your job to automate the execution of the Python script.
Using Databricks CLI or SDK
- Databricks CLI: You can use the Databricks CLI to upload and run Python scripts directly on a cluster.
- Databricks SDK for Python: This allows you to automate tasks and run Python code from your local development environment.
Using Visual Studio Code
- Install Databricks Extension: Ensure you have the Databricks extension for Visual Studio Code installed.
- Open Python File: Open the Python file you want to run on a Databricks cluster.
- Upload and Run:
- Right-click the file in the Explorer view and select “Upload and Run File on Databricks.”
- Alternatively, use the “Run on Databricks” icon in the file editor.
These methods provide flexibility in running Python scripts in Databricks, allowing you to leverage its powerful distributed computing capabilities for data analysis and processing.