To import a CSV file in Databricks, you can follow these steps:
-
Navigate to the Data tab in your Databricks workspace.
-
Click on “Upload File” and select your CSV file from your local machine.
-
Databricks will automatically detect the file format and assign a table name.
# Define the path to your CSV file
file_path = "dbfs:/FileStore/your_folder/your_file.csv"
# Read the CSV file into a DataFrame
df = spark.read.csv(file_path, header=True, inferSchema=True)
# Display the first few rows of the DataFrame
df.show()
SELECT * FROM read_files(
'abfss://<container-name>@<storage-account-name>.dfs.core.windows.net/<path>/<file>.csv',
format => 'csv',
header => true
)
-
For large files, consider uploading to cloud storage (like Azure Blob Storage) first, then importing into Databricks.
-
Use options like
header
,inferSchema
, anddelimiter
to properly parse your CSV file. -
Always verify the imported data by displaying a few rows or running basic queries.
Remember, Databricks supports files up to 2GB for direct upload through the UI. For larger files, use cloud storage or programmatic methods5.