Providing Paths in Databricks to Load a File
When working with Databricks, specifying file paths is crucial for loading files from different file systems. Databricks supports two primary file systems: the Local File System of the Driver Node and the Databricks File System (DBFS). The syntax for specifying paths depends on the file system and the type of code being executed.
Default File Systems and Prefixes
Command/Code | Default Location | Prefix to access DBFS | Prefix to access Local File System |
---|---|---|---|
%fs | DBFS Root | Optional | file:/ |
dbutils.fs | DBFS Root | Optional | file:/ |
spark.read/write | DBFS Root | Optional | file:/ |
Spark SQL | DBFS Root | Optional | file:/ |
Python code | Local File System | /dbfs | None |
%sh | Local File System | /dbfs | None |
Examples of Loading Files
Using Spark: To load a file from DBFS using Spark, you can use the following command:
spark.read.parquet("dbfs:/mnt/test_folder/test_folder1/file.parquet")
Using Python: To load a file from DBFS using Python, you can use the following command:
dbutils.fs.ls("/dbfs/mnt/test_folder/test_folder1/")
Frequently Asked Questions
- Q: What is the difference between “dbfs:/” and “/dbfs”?
A: Both “dbfs:/” and “/dbfs” refer to DBFS, but “dbfs:/” is used in contexts where DBFS is the default file system, while “/dbfs” is used when the default is the Local File System.
- Q: How do I list files in the Local File System using Python?
A: You can list files in the Local File System using Python with the os module:
import os; print(os.listdir("/"))
- Q: Can I use shell commands to access DBFS?
A: Yes, you can access DBFS using shell commands by prefixing the path with “/dbfs”. For example:
ls /dbfs/mnt/test_folder
- Q: How do I create a directory in DBFS using dbutils?
A: You can create a directory in DBFS using the
dbutils.fs.mkdirs
command. For example:dbutils.fs.mkdirs("/dbfs/mnt/test_folder")
- Q: Can I display HTML content in a Databricks notebook?
A: Yes, you can display HTML content in a Databricks notebook using the
displayHTML
function. - Q: How do I link to other notebooks or folders in a Databricks markdown cell?
A: You can link to other notebooks or folders by using the standard markdown syntax for links:
[Link Text](url)
- Q: Does Databricks support all markdown syntax?
A: Most markdown syntax is supported in Databricks notebooks, but some features like emoji shortcodes are not supported.
Bottom Line
Providing paths in Databricks requires understanding the default file systems and prefixes for different types of code. By using the appropriate prefixes and commands, you can efficiently load files from both DBFS and the Local File System.