On May 23, 2023 (at the Microsoft Build conference),tela de microsoftwas announced in Public Preview, a new unified data and analytics platform that brings together and enhances Microsoft's existing suite of data products. We've been lucky enough to have been a part of the private preview for the past 6 months and we're here to share our thoughts.
So what is Microsoft Fabric?
Microsoft Fabric can be thought of as the third generation of Microsoft data platforms, where the first generation (eg HDInsight, SQL Data Warehouse) were somewhat isolated features of traditional data products; the second generation was Azure Synapse Analytics, which integrated platforms at the UX level but still felt a bit disjointed at the data level; and now we have Microsoft Fabric, which is built on Synapse's vision of "unification" with a particular focus on enabling deep interoperability at the data level.
To do so, a large investment has been made in standardizing the foundation of the platform to allow computing models to integrate and interoperate seamlessly. These computational models have then appeared in an experiential UX that is assigned to regular data people such as data engineers, data scientists, citizen analysts, etc. The user interface is very similar to Power BI, which should hopefully be familiar to many of you reading this blog.
The other big change is that Microsoft Fabric is SaaS instead of PaaS. It provides a more meaningful vision of how a modern data platform should take shape, and enables an even bigger shift toward data democratization (i.e., "more self-service") for all people. It also introduces a standardized capacity-based business model where users don't need to manage billing for different computers separately. (For those of you who cringe at that last sentence, don't worry, the entry point is nowhere near the entry point for Power BI Premium capabilities.)
How does Microsoft Fabric compare to Azure Synapse Analytics?
If you are familiar withAzure Synapse analytics, you can see Fabric as the next generation of Synapse ("Synapse Gen3" if you will) with a little added flourish. Synapse experiences have been largely lifted and moved, so the UX within Fabric's constituent elements (e.g. notebooks/pipes) should feel relatively familiar. Compared to Azure Synapse Analytics, Microsoft Fabric keeps some features, improves some, adds some, and removes some. Check out Barry's blogSynapse vs. Substance: A Side by Side Comparisonfor a detailed comparison. And of course, we can't forget that Power BI has essentially become part of the Microsoft Fabric, whereas in Synapse it was just an integration point.
Microsoft recognizes that a large number of Azure Synapse Analytics customers will have a large area of existing data in Synapse and will need a way to migrate. We understand that a lot of thought has gone into how to make this migration exercise as easy as possible, so that every element of Synapse has some sort of recommended migration strategy: some direct technical solutions, some process-based. And I think it's fair to assume that Azure Synapse isn't going anywhere anytime soon, so if moving to Fabric isn't on your radar anytime soon, you don't have to worry. Although I think it's also fair to assume that Microsoft Fabric will get most of the attention and investment at Microsoft, so I wouldn't expect much more innovation on the Synapse side unless it also benefits Fabric.
How does Microsoft Fabric compare to Databricks/Snowflake?
We're not comparing apples to apples yet. Databricks and Snowflake have a slightly more narrow and in-depth focus. Databricks is still widely viewed by customers as the go-to cross-cloud platform for data science and machine learning workloads, though its marketing strategy suggests it is now "home to Lakehouse" and even more recently has branched out into a full-fledged data storage offer. Snowflake is perhaps best known as an innovative SaaS cloud data warehouse that recently started releasing features (like Snowpark) that would appeal to data engineer type people.
From the start, Stof has taken a much broader view, innovating across the entire stack, from low-level data and computational details to these "knowledge worker" experiences you've probably seen in Microsoft demos. However, he wouldn't expect any of them to sit still; there is general convergence to this "unified analytics platform" across the industry. Keep an eye out for more blogs in this space!
More specifically, what does Microsoft Fabric allow me to do?
Microsoft Fabric includes a set of capabilities spanning descriptive, diagnostic, predictive, and prescriptive analytics, from batch to streaming workloads. More specifically, it's divided into several high-end "experiences" tailored to a variety of people:
- data factory(data integration).
- based onAzure Data Factory(and Azure Synapse Analytics pipelines). Note: the feature set is currently not as mature as Azure Data Factory
- Synapse data engineering
- Consiste principalmente en "Lakehouses", Notebooks, Spark Job Definitions
- Synapse Data Warehouse
- A development ofDedicated Synapse SQL groups
- Synapse Data Science
- Full suite of data science capabilities, including model creation and deployment, all within the Fabric. Think of it likeSynapse ML FeaturesyAzure machine learningrolled to one
- Synapse real-time analysis
- Development ofSynapse Data Explorer
- Also features "Eventstreams" - very similar featureAzure Stream Analytics editor without code
- energy BI(Business Intelligence).
- The sameenergy BIwe all know and love, with some new tight integrations with other fabric gadgets
- Trigger data
- New event-driven stream processing tool that tightly integrates with Power BI and Event Hubs. Qualified as a "digital nervous system".
Please note the continued use of the term "Synapse".
It is worth saying hello to"A lake", which is a single logical data lake provided with each Fabric tenant that stores/overlays all of its Fabric-related data. This is not shown in the experience switcher in the Fabric UI, but it is there, trust me. read along"What is OneLake?"blog for information only.
What's good (and bad) about Microsoft Fabric?
Let's remember that Fabric is still in public preview, so not all features have been released, things are still buggy, and UX/UI elements are still being worked on. But here are some of the main points:
🎉 Everything (and I mean everything) under one roof
- Products spanning data integration, data engineering, data warehousing, and real-time processing, all integrated. "But Synapse has this!" I hear you. I really do. But just because a new product follows this principle doesn't make it a bad thing.
- Since it's SaaS, all the data from a data perspective is stored in OneLake. As the name suggests, there is only one of these. So it's not like you have to manage and provision multiple storage accounts for all your workloads separately. It's your "OneDrive for data."
- Like Power BI, where all artifacts can be found in the Power BI service, Fabric is also presented in a Power BI-like user interface. Many data people of all stripes already spend a fair amount of time in this user interface, which will speed the transition to “higher in the stack” self-service analytics for people like data analysts and data citizens. And data scientists/engineers already familiar with Power BI/Azure Synapse/Azure Databricks will find the experience close enough to the real world that they can get up and running very quickly.
The underlying theme here is "democratization": Fabric equips people of all stripes with the tools and data they need to succeed. At least theoretically.
🎉 Tight integration between experiences, made possible by the adoption of OSS technology
In the section mentioned above, we looked at all the different experiences that emerge from matter. Basically, all these experiences work with/operate with data, stored ina lake. Along with the fact that all data tables are stored inDelta Lake open table format, this allows Fabric to take advantage of standardization and create tight integration between different compute engines within Fabric. This standardization on a single format has also meant that optimizations have been made that more engines can benefit from.
🎉 (true) Things are better/faster
The Fabric team has not only focused on high-level features and new user experience, but has also put a lot of effort into improving existing services. For example, converting a Spark session to fabric (either via a Notebook/Spark Job Definition) now takes ~30 seconds instead of ~3 minutes, which is a huge improvement. Separately, "Direct Lake" is an amazing new feature that makes querying Delta tables in your Data Lake from a Power BI report feel like you're using a dataset in import mode, but without having to duplicate and manually update the data. This is largely due to a new"VORDER" optimizationwhich is automatically applied to Delta tables written by Fabric, which introduces a new compression for Delta data files while still being compliant with the Delta Lake specification.
Other features we're excited about are the Power BI developer workflow enhancements, where the legendary "desktop hardening" begins to come to fruition. And to be fair, this isn't the only "DevOps" feature that's good news: there's end-to-end git integration spread across the Fabric, with deployment pipelines for all artifacts in the roadmap and space. nearest work.Git integrationNow available. On the Spark development front, we have a VS Code extension that allows us to connect to our Fabric Spark runtime remotely to facilitate local development. We will post more content on this feature, especially in the coming weeks.
😩 Some features of Synapse have not made it
When creating a new product, of course, there will be trade-offs regarding the functionality that the product will offer. However, we are particularly saddened that some products/features (or equivalent) did not materialize:
- Synapse SQL Serverløs(pay per consultation). For those of you who follow our blog, it won't come as a surprise that this was one of the first features we noticed was missing. SQL Serverless allowed querying raw csv/parquet/json files or existing Delta tables in the lake. Not only was it a great (cheap) replacement for a SQL engine as a core part of data platforms, but it allowed for really easy ad-hoc analysis and quick preview of the data sitting in your data lake. Fabric currently has no substitute for this functionality. Ad hoc analysis of files in OneLake is not possible using T-SQL. Data preview in OneLake is only possible for certain file types (parquet is somehow not one of them). To perform ad-hoc analysis of files in Fabric, you must use Spark notebooks or load the corresponding data into managed Delta tables before this is possible.
- Data Flow Mapping. This ETL GUI built into Azure Data Factory/Azure Synapse is extremely popular with our customers. It's easy to use and easy to update, allowing people to take advantage of Spark without having to understand its ins and outs. Time was spent designing the core elements of data pipelines by mapping data streams and generating IPs (sometimes in the form offlows). People had learned the expression syntax to allow them to make more complex expressions. Fabric has not inherited the MDF feature, but rather prioritizes Gen2 from Dataflow. It is our understanding that there will be a migration path for organizations using Mapping Data Flows, but it remains to be seen how seamless the experience will be. However, what is not a secret is that Dataflow's Gen2 is the future of low-code ELT pipelines in Fabric, so it is worth investing in.
- Spark .NET. Sorry C# Developers: .NET for Apache Spark didn't make it to the Fabric Spark runtime, although you could create C# notebooks and createC# Spark custom jobs in Azure Synapse. .NET is great for high-performance computing, and Spark is a perfect alternative to Azure Batch, so we're sorry this isn't part of the Fabric release.
😩 The UX can be quite fluid
The UX/UI is likely to evolve during the public preview period, so we won't spend a lot of time reviewing it. But since Fabric has now introduced a number of "new" artifacts, including organizational abstractions like experiences, workspaces, and domains, we've gotten a bit lost in the UI on a number of occasions. We now have vertical tabs instead of the horizontal tabs we had in Synapse, which I find more difficult to navigate quickly. To be honest I wasn't a fan.Power BI UI update in 2020, and this builds on those basic UI components. So maybe it's not a real surprise that you're having a hard time navigating!
😩 Capacity-based business model
At endjin, we're big proponents of the pay-per-query (consumption) business model. Whether it was Azure Data Lake Analytics (now deprecated) or Synapse SQL Serverless, the technology didn't matter. This business model opens doors in several ways:
- Organizations/teams that wanted to dip their toes in and experiment with new technology had an easy way to do extensive testing without having to commit to a specific level of consumption. This, in our opinion, was the USP of Synapse SQL Serverless: pay for what you use or query.
- Once live, data pipeline costs can be attributed at a granular level to specific teams that owned the specific workload. Being able to accurately price individual pipelines was invaluable for rollback scenarios, but also just understanding the relative price of running different workloads.
Not only that, but we've implemented various data platforms using these kinds of technologies at their core, and they've been cost-effective and efficient.
Fabric is losing the notion of consumption-based pricing, and by doing so, we believe it will lose a number of potential customers. It has unified all computing into a single capacity-based business model that uses capacity units (CUs) as its central unit. It's still unclear how much weight each compute engine will have in relation to CU consumption, and it's unclear exactly how much friction there will be to pausing/resuming capabilities when needed and how autoscaling will work (and how much it will cost). Fabric capabilities start at a much lower price than Power BI Premium capabilities, but will it still be too much for newcomers? And will the smaller scale capacities be "production ready"?
Microsoft Fabric: Final Thoughts for Public Preview
The future is certainly bright for all things data at Microsoft. Vision is healthy and the feature set is very interesting. Although there is certainly a long way to go.
Organizations must use this public preview periodto run some PoV/PoC exercises on some sample workloads. We certainly wouldn't recommend starting a full formal migration process to Fabric just yet; at this point, we think it would be a waste of time while the platform lacks stability. Start thinking about developing internal experience that triggers"Habilitation Champion(s)"So when Fabric becomes GA, there is an understanding of how Fabric can be used in the broader business in an effective and positive way.
Stay tuned for much more endjin cloth content over the next few weeks!