Thoughts from the First Ever Microsoft Fabric Community Conference
In May of 2023, Microsoft Fabric made its official debut, heralding its arrival as a comprehensive end-to-end data ecosystem encompassing data ingestion, storage, reporting, analysis, and AI. Positioned as a singular SaaS offering, Fabric aims to cover the entirety of data needs, from analytics platforms to AI integration
Initially, I questioned its viability as it had significant gaps compared to its more established competitors. However, Microsoft is quickly closing those gaps and has provided clarity around the direction of the Fabric ecosystem through their Fabric Roadmap. To further solidify their commitment to Fabric, Microsoft invited the world to Las Vegas to hear more about their plans.
It had been a while since I last visited Las Vegas, so I was looking forward to this trip as it was a chance to attend the first-ever Microsoft Fabric Community Conference (aka FABCON). It was a busy three days filled with 130 data-packed sessions, talks from the Microsoft product team, and the chance to connect with Microsoft and fellow data enthusiasts from around the globe.
Amidst the flurry of activity, Microsoft unveiled several impactful updates and previews, demonstrating their commitment to building out a world-class data ecosystem. Here are some noteworthy features unveiled during the conference, now available for public preview or slated for imminent release.
Task Flows (Coming Soon)
If you’re looking for a way to visualize your data project from start to finish, this is the answer. This could be useful for understanding the workflow involved in data analysis or data manipulation. During the demonstration of this feature, they showed the ability to select from different data project templates (e.g. medallion architecture, lambda architecture, etc.), which could jumpstart your development efforts.
Data mirroring (Public Preview)
This newly announced feature allows you to replicate data from various sources into Fabric’s OneLake. As of this announcement, it allows you to replicate from databases including Azure SQL Database, Cosmos DB, and Snowflake. Other sources are being contemplated as well.
There are some huge benefits of this feature.
- It simplifies the data ingestion process. You’re able to mirror an entire database or just the specific tables you need. This will require minimal effort and speed up the process of getting data into your lakehouse.
- Mirroring can provide near real-time access to data. Once configured, mirroring continuously replicates data into OneLake. This can provide users with more relevant data insights and also alleviate any potential reporting burdens on source databases.
- It can unlock new opportunities. Mirroring can provide data scientists and data analysts with access to data in near real-time that was not previously available for analysis.
There are obvious implications of mirroring data into Fabric’s OneLake. The cost of storage and compute would be one of them. To offset this, Microsoft is providing free compute and storage based on your purchased capacity size.
OneLake Shortcuts (Public Preview/Coming Soon)
Microsoft Fabric shortcuts are a way to access data stored in various locations without physically moving the data itself. They act like references or pointers to the actual data source. While shortcuts were already available for AWS S3 and ADLS Gen2, you now have the ability to create shortcuts to GCP and cloud-based S3 compatible sources in preview.
It was also announced that on-premise S3 compatible sources would be coming soon. These include sources like Pure, NetApp, Dell ECS, and many others. You might wonder about query performance when you have all of these pointers to data in other cloud environments. Microsoft is addressing this using caching. As files are read through an external shortcut, the files are stored in a cache in your Fabric workspace.
So, the next time you ask for that data, it’s read from the cache instead of going back to the external source. This should significantly increase performance and reduce potential egress charges. Shortcut Caching is currently in public preview and available for GCS, S3 and S3 compatible shortcuts.
Data Sharing (Coming Soon)
Sharing data externally is a valuable feature that’s long been available from the competition and it’s coming soon to Fabric. Data sharing in Fabric will allow external business partners to collaborate on data and data artifacts in a secure manner. Data does not need to be moved. You simply share it in place using new capabilities that are being added to shortcuts. One thing to note, the external party that data is being shared with will have to use Fabric in their tenant to access the data.
On-premises Data Gateway (Public Preview)
Ingesting data from on-premises sources is a typical pattern for building out your Lakehouse. However, outside of Dataflow Gen2, there hasn’t been a way to securely ingest data from on-premises data sources. Microsoft has addressed this by announcing the preview of on-premises connectivity for Data pipelines. This enhancement broadens the scope of data integrations that can be handled by Fabric. Plus, it provides the comfort of knowing you’re securely transferring data from on-premises sources into Fabric.
CI/CD Integration (Public Preview/Coming Soon)
Having proper CI/CD processes are critical for any software development team in their pursuit of reducing risk, improving quality, and shortening the time to market for their products. This is where Git comes into play. Git integration in Fabric enables developers to backup and version their work, revert to prior versions, and collaborate with other developers or work alone with branches.
Currently, Fabric supports Git integration with Azure DevOps for certain objects (PowerBI reports, PowerBI semantic models, notebooks and LakeHouse. All in preview). During the conference, Microsoft announced that they were extending this existing integration with Azure DevOps to bring data pipelines and data warehouses under source control and CI/CD management.
Furthermore, they announced plans to include spark environment and job definitions as well as add support for GitHub and GitHub Enterprise in the integration in the next few months. Eventually, Microsoft plans to include all objects within the Fabric environment to be managed by Git. For code deployment, Fabric currently supports CI/CD workflows via deployment pipelines, which are native to the Fabric environment. It was announced that these pipelines would now also be exposed via standard REST APIs to allow customers to bring basically any CI/CD tool they’d like to manage their Fabric deployments.
Copilot in Fabric (Coming Soon)
CoPilot in Fabric has been available in preview since November of 2023. Coming soon, a new Q&A feature was announced for CoPilot that will enable you to interact with your data using natural language. Without configuration, you will be able to select a Fabric data source, ask a question, and receive an answer as well as the query used to generate that answer. As a data analyst, data scientist or data engineer, this would be a great way to start doing some exploratory analysis of the data that’s available to you with minimal effort. To take advantage of current and future CoPilot features, you will need to have a paid Fabric capacity of at least F64 (or P1 in Power BI Premium) or above.