This article is based on interesting information just released by MicrosoftBarleyconference on May 23, 2023.
what we have today
Whensynapse analysiswas created, the technical sessions inspired me with some comparisons and explanations, and I reproduced them in my own technical sessions and writings.
Synapse was created based on a request from many Microsoft customers. They asked to be able to use a single tool for the entire data intelligence platform: ingest data, store, process, query, apply data science, and generate reports.
Synapse is a real Swiss army knife: we can do the ingestion with the help ofSynapse data factory;Query and process data using different methods,SQL server groupoDedicated SQL group;and apply data science using the Spark Pool and additional machine learning frameworks. Finally, Synapse is also linked toenergy BI, which allows us to use some shortcuts to create visualizations.
This unique set of features was always great, much better than the isolated tools we had before. However, according to Microsoft Fabric, we can notice the missing points in Synapse:
- The integration of the different tools was limited. It was the best at the time, but compared to Microsoft Fabric, the integration was limited.
- We still have to choose between different infrastructure resources, such as serverless SQL pool and dedicated SQL pool, instead of using all the data.
- We still have infrastructure decisions to make, especially the size of the dedicated SQL Pool. Decisions were often based primarily on guesswork.
- It does not fully isolate storage and processing. When you use a dedicated SQL pool, processing and storage are tied together.
Synapse is considered so advanced that few noticed these problems, and not all problems. Microsoft Fabric, the new product announced during BUILD, reveals this and more.
What is Microsoft Fabric?
Like Synapse, Microsoft Fabric brings together all the services needed for a data intelligence environment, highly integrated and built in a way that requires much less technical effort to implement.
The following image illustrates the services included in Microsoft Fabric
In the following sections, I will introduce these new concepts.
Data Intelligence as Software as a Service (SaaS)
Microsoft Fabric arrives, breaks standards and empowers new ones. In the cloud environment, we are used to classifying services as Infrastructure as a Service (IaaS), Platform as a Service (PaaS), and SaaS. Synapse is classified as PaaS while Microsoft Fabric is officially classified as SaaS. The following diagram shows the general areas offered by each hosted management level offered by each hosted management level.
Without a doubt, the level of managed services offered by Microsoft Fabric is well above Synapse. Many tasks in Synapse require careful configuration, however Microsoft Fabric offers a kind of automatic configuration that just works out of the box.
Usually, when we think of a SaaS service, we think of an end-user application, such as Office 365 or many other applications where the user simply uses them. It's a concept that doesn't often match the software used to ingest, transform, model, and generate intelligent results from data.
That's Microsoft Fabric, software that breaks down the barriers of what we know about cloud software and services.
Microsoft Fabric is not within the Azure environment, but rather within the Power BI Portal. This leads to a very different environment than what we have in Synapse.
But the new environment is unlike anything we know.Power BI Portalalso. The environment is designed for different experiences: you choose an experience based on the type of task you want to perform, and the environment will adapt to the common tasks related to that experience.
The following experiences are available:
- Power BI:The typical Power BI environment and tools
- Data factory:Using this persona, you can create and manage data flows and data pipelines just like in the data factory.
- Data Enabler:This is a brand new feature that allows you to create triggers on top of your images in Power BI.
- Computing technology:This experience involves several tasks. It's responsible for creating and managing the lake houses, but it's also what allows you to create notebooks and orchestrate them with pipelines.
- Data Science:With this experience, you can apply Azure ML techniques to your data.
- Data store:This experience allows you to model your data as in an SQL database and use SQL on your data. It's hard to compare this to anything else. We can create many star models in our data lake and these models will be reused by our Power BI datasets, making it easy to have a central model for all of our reports.
- Real-time analysis:This persona is somewhat comparable to Power BI Streaming Dataflows, which allows you to ingest data in real time.
Switching between ingestion (data factory), processing (data engineering), modeling and SQL (data warehouse), and more is just a matter of choosing the right experience to do the job on the same data sets.
Changing people is like a method to focus the environment on the kind of activities you want to do. The object creation itself still happens inside a Power BI workspace.
In addition, the main new objects are: aLake Houseand adata warehouse, they have their own way of switching jobs between one and the other.
Microsoft Fabric and OneLake
a lakeis the main yield itela de microsoft. It provides a data lake as a service, allowing us to build our data lake without all the hassle of provisioning it first. It is the central data storage for all data intela de microsoftand is ready for the tenant when the firsttela de microsoftthe artifact is created.
Namea lakealso pairs nicely with the shortcut feature ina lake: We can create shortcuts to remotely located files and access them directly as if they were in our own lake.
The following image illustrates howa lakeit relates to the other functions of Microsoft Fabric.
Onelake, Lakehouse and workspaces
The lake house is one of the central objects that we can create within oneBye. We'll create the lake house with the Data Engineer persona, and the lake house will be contained within a workspace that we know of as a Power BI workspace.
Once we've created a lake house, we can use the Data Factory to load data into file space or table space.
HeFile cabinetarea is the unmanaged area of the lake that accepts any file type. This is where we place the RAW files for further processing. ThatMesas, on the other hand, contains data only in Delta format.
Søhuset optimizesMesasarea with a special structure capable of making a regular delta table up to 10 times faster respecting the full Delta format.
However, Lakehouse is not the largest data structure we have. This position is reserved for OneLake. It is an automatically provisioned, invisible data warehouse that contains all data for data warehouses, sea houses, data sets, and more.
In this way, we can build an enterprise architecture using Workspaces to host departmental lakes. Data can be shared between multiple departments using Sea House shortcuts. This ensures domain ownership of the data and a cross-domain relationship at the same time.
This is just the starting point for an enterprise architecture: OneLake ensures unified control and management of data. Data line, data protection, certification, catalog integration and more are unified features provided by OneLake for all lake houses built in one company.
All these functions are inherited from the Power BI environment, ensuring the company an Enterprise Governance environment.
OneLake and render isolation
When you use Synapse, the dedicated Synapse pool stores and processes the data. This is a scenario where storage and processing are linked.
At OneLake, storage and processing are independent. The same data in OneLake can be processed by many different methods, ensuring independence of storage and processing.
Let's discuss the different methods available to process data in OneLake.
All workspaces enabled fortela de microsofthas a function calledlive pool. Helive poolallows running notebooks without prior Spark Cluster configuration.
Once the first block of code is executed in a notebook, theLive Spark PoolDeploys in seconds and gets the job done.
we can processa lakedata using Spark Notebooks with the advantage oflive pools
Data Factory objects, such as pipelines and data flows, within the Power BI environment are the beginning of a unification of ETL tools: we have data factory pipelines and data flows, and Power BI data flows.
These two are now united and work together under Microsoft Fabric. We have an additional advantage: Gen2 Dataflows.
HeGen2 data streamsit's a step up from the Power BI Dataflows or Wrangling Dataflows we're used to. One of the most interesting features, in my opinion, is the possibility to define the target of a transformation, which we could never do inPower BI data flow(oConflicting data streams)
Hetela de microsoftprovides two different methods to access the data using SQL as if the data were in a normal database.
One of the methods is to use the Lakehouse object. This object provides us with an SQL endpoint that allows us to model the tables and query the data using SQL.
The second method uses a data store object, which provides a complete SQL processing environment on top of the data in OneLake.
The following table highlights all the differences between Lakehouse SQL Endpoint and Data Warehouse. Some of these differences are available in the documentation; some are my personal conclusions.
Microsoft fabric offer
SQL Endpoint for Lakehouse
SQL MPP – Polaris
Vertipaq for tables
Open Data Format - Delta
Full data warehouse with T-SQL transaction support
Lakehouse read-only, system-generated SQL endpoint for T-SQL queries and services.
Only supports queries and views on top of Lakehouse delta tables
Recommended use case
Full support for DQL, DML and DDL T-SQL. Full transaction support
Full DQL, no DML, limited support for DDL T-SQL such as SQL Views and TVF
SQL, pipelines, data streams
Spark, pipelines, data flows, shortcuts
Delta table support
read and write delta table
Delta table reading
Microsoft Fabric is deeply connected withenergy BIatmosphere. We can at many points in our work, be it from aLake Houseor a warehouse, you need to start creating a Power BI report.
The best is the access method:energy BIhas a new access method toa lake, calleddirect lake.
direct lakeis a new method of connection betweenPower BI data setsya lake.
when we useDirectQuery, each update requires a new upload of the source, which slows down the connection. On the other hand, when we useMatter, the data is stored in memory and performance is better, but when the data is updated, a dataset refresh is required. Updates are not immediately visible in data sets and reports.
Hedirect lakeconnection mixes the best of both scenarios: It has the performance of import mode to keep data in memory and real-time updating of data obtained byDirectQuery.
What about Azure Synapse and Data Factory?
Customers using Data Factory and Synapse Dedicated Pool can also expect easy ways to migrate toMicrosoft fabric.Microsoft is focused on making the transition as smooth as possible.
Data Factory users even benefit from Gen2 dataflows, which are not supported by Azure data factory. Therefore, you benefit from developing data flows and pipelines with Microsoft Fabric, and you want an easy migration path from one to the other.
Microsoft Fabric seems to be the beginning of a new era. In an era of Open AI/ChatGPT and co-pilots, we have an extremely powerful tool that makes complex data solutions available to all companies, and we can think of a co-pilot fortela de microsoft