Reading a CSV File from Local in Databricks

Databricks does not directly support reading local files using its standard read_csv method because it relies on Spark, which requires files to be accessible by all Spark workers in the cluster. However, you can work around this limitation by uploading your local CSV file to a distributed file system like Databricks File System (DBFS) or cloud storage like AWS S3 or Azure Blob Storage. Here’s how you can do it:

Step 1: Upload the CSV File to DBFS

You can upload your CSV file to DBFS using the Databricks CLI or the Databricks UI. Here’s how to do it using the CLI:

      databricks fs cp /path/to/local/file.csv dbfs:/path/to/upload/
    

Step 2: Read the CSV File in Databricks

After uploading the file, you can read it using Spark’s read.csv method in Databricks:

      from pyspark.sql import SparkSession

      # Initialize SparkSession
      spark = SparkSession.builder.getOrCreate()

      # Read the CSV file
      df = spark.read.format("csv") 
                  .option("header", "true") 
                  .option("delimiter", ",") 
                  .load("dbfs:/path/to/upload/file.csv")

      # Display the DataFrame
      df.show()
    

Frequently Asked Questions

Bottom Line

Reading a CSV file from a local machine in Databricks requires uploading the file to a distributed file system like DBFS first. After uploading, you can use Spark’s read.format("csv") method to read the file. This approach ensures that the file is accessible to all Spark workers in the cluster.


👉 Hop on a short call to discover how Fog Solutions helps navigate your sea of data and lights a clear path to grow your business.