Better Together: Warehouse and Warehouse - Microsoft Fabric (2023)

  • Article

Applies to: Better Together: Warehouse and Warehouse - Microsoft Fabric (1)SQL Endpoint y Warehouse en Microsoft Fabric

This article explains the data storage experience withSQL endpointof Lakehouse and scenarios for using Lakehouse in data warehousing.

Important

Microsoft Fabric is currently in PREVIEW. This information relates to a preliminary product that may change significantly prior to release. Microsoft makes no warranties, express or implied, with respect to the information provided in this document.

What is a Lakehouse SQL endpoint?

In Fabric, whencreate a lake house, alagerit is created automatically.

SQL Endpoint allows you to query data in Lakehouse using the T-SQL language and the TDS protocol. Each Lakehouse has one SQL endpoint, and each workspace can have more than one Lakehouse. The number of SQL endpoints in a workspace matches the number of Lakehouse items.

  • The SQL endpoint is automatically generated for each Lakehouse and exposes the Lakehouse Delta tables as SQL tables that can be queried using the T-SQL language.
  • Each delta table in a Lakehouse is represented as a table. The data must be in delta format.
  • HePower BI default datasetit is created for each SQL endpoint and follows the Lakehouse object naming convention.

There is no need to create an SQL endpoint in Microsoft Fabric. Microsoft Fabric users cannot create an SQL endpoint in a workspace. An SQL endpoint is automatically created for each Lakehouse. To get an SQL endpoint,create a lake houseand an SQL endpoint for Lakehouse will be created automatically.

Automatic metadata discovery

A continuous process reads the delta and file folder logs and ensures that the SQL metadata for the tables, such as statistics, is always up to date. No user action is required, and there is no need to import, copy data, or configure infrastructure. For more information, seeAutomatically generated schema in SQL Endpoint.

Scenarios that Lakehouse enables for data storage

At Stof we offer a warehouse.

Lakehouse, with its SQL Endpoint, powered by Warehouse, can simplify the traditional decision tree with batch, streaming, or lambda architecture patterns. Together with a warehouse, the lake house allows for many additive analysis scenarios. This section examines how to leverage a lakehouse with a warehouse to achieve the best analytics strategy.

Analyze with the gold layer of your Fabric Lakehouse

One of the well-known strategies for organizing marine data is amedallion architecturewhere files are organized into Raw (Bronze), Consolidated (Silver), and Refined (Gold) layers. An SQL Endpoint can be used to analyze data in the gold layer of the medallion architecture if the files are stored inlago deltaformat, even if they are stored outside of Microsoft Fabric OneLake.

You can useOneLake Shortcutsto reference gold folders in external Azure Data Lake storage accounts managed by Synapse Spark engines or Azure Databricks.

Warehouses can also be added as subjects or domain-oriented solutions for specific topics that may have custom analytics requirements.

If you choose to keep your data in Fabric, yoube always openand accessible via API, Delta format and of course T-SQL.

Query your Lakehouse delta tables and other OneLake Data Hub items as a service

There are use cases where an analyst, data scientist, or data engineer may need to query data in a data lake. At Fabric, this end-to-end experience is fully SaaSified.

a lakeit is a single, unified, and logical data lake for the entire organization. OneLake is OneDrive for data. OneLake can contain multiple workspaces, for example, across your organizational divisions. Each element of the Fabric makes it accessible to data through OneLake.

Data in Microsoft Fabric Lakehouse is physically stored on OneLake with the following folder structure:

  • He/ File cabinetThe folder contains raw and unconsolidated (bronze) files that must be processed by data engineers before analysis. The files can be in different formats like CSV, Parquet, different types of images, etc.
  • He/MesasThe folder contains refined and consolidated (gold) data ready for business analysis. Consolidated data is in Delta Lake format.

An SQL endpoint can read data in/mesasfolder in OneLake. Analysis is as simple as querying the SQL endpoint in Lakehouse. Along with Warehouse, you also get cross-database queries and the ability to seamlessly switch from read-only queries to build additional business logic on top of your OneLake data with Synapse Data Warehouse.

Data engineering with Spark and service with SQL

Data-driven businesses need to keep their back-end systems and analytics in sync in near real-time with customer-facing applications. The impact of transactions must be accurately reflected through end-to-end processes, related applications, and online transaction processing (OLTP) systems.

In Fabric, you can take advantage of Spark Streaming or Data Engineering to curate your data. You can use Lakehouse SQL Endpoint to validate data quality and for existing T-SQL processes. This can be done in a medallion architecture or within multiple layers of your Lakehouse by serving bronze, silver, gold, or staging, curated and refined data. You can customize the folders and tables created through Spark to meet your data engineering and business requirements. Once you're ready, you can leverage one repository to serve all your downstream business intelligence applications and other analytics use cases without copying data, using views, or refining data using CREATE TABLE AS SELECT (CTAS), stored procedures, and other DML/Commands. DDL.

Integration with the gold layer of your Open Lakehouse

An SQL endpoint is not subject to data analysis only in the Fabric Lakehouse. An SQL endpoint allows you to analyze lake data at any lake house using Synapse Spark, Azure Databricks, or any other lake-focused data engineer. Data can be stored in Azure Data Lake Storage or Amazon S3.

This tight two-way integration with Fabric Lakehouse is always available through any engine with open APIs, the Delta format, and of course T-SQL.

Data virtualization of external data lakes with shortcuts

You can use OneLakeshortcutsto reference gold folders in external Azure Data Lake storage accounts managed by Synapse Spark engines or Azure Databricks, as well as any delta tables stored in Amazon S3.

Any directory referenced by a shortcut can be parsed from an SQL endpoint, and an SQL table is created for the referenced dataset. The SQL table can be used to expose data to externally managed data lakes and enable analytics on them.

This shortcut acts as a virtual repository that can be leveraged from a repository for additional downstream analysis requirements or queried directly.

Use the following steps to analyze data in external data lake storage accounts:

  1. Create a shortcut that references a folder on theAzure Data Lake StorageoAmazon S3-konto. After entering the connection details and credentials, a shortcut will appear in Lakehouse.
  2. Switch to the SQL endpoint in Lakehouse and find a SQL table that has a name that matches the name of the shortcut. This SQL table references the directory in the ADLS/S3 directory.
  3. Queries the SQL table that references the data in ADLS/S3. The table can be used like any other table in SQL Endpoint. You can join tables that reference data in different storage accounts.

Use

If the SQL table does not appear immediately in SQL Endpoint, you may need to wait a few minutes. The SQL table that references the data in the external storage account is created late.

Analyze archived or historical data in a data lake

Data partitioning is a well-known technique for optimizing access to data in data lakes. Partitioned data sets are stored in the hierarchical folder structures of the format/year=/mes=/day=, whereyear,mes, ydayare the partition columns. This allows you to store logically separated historical data in a format that allows computers to read the data as needed with efficient filtering, rather than reading the entire folder and all the folders and files in it.

Partitioned data allows faster access if queries filter predicates that compare predicate columns to a value.

An SQL Endpoint can easily read this type of data without configuration. For example, you can use any application to archive data in a data lake, including SQL Server 2022 or Azure SQL Managed Instance. When you partition data and store it in a lake for archival purposes with external tables, an SQL endpoint can read the partitioned Delta Lake tables as SQL tables and allow your organization to parse them. this lowers the total cost of ownership, reduces data duplication, and shines a light on big data, AI, and other analytics scenarios.

Data virtualization of substance data with shortcuts

Within the Fabric, workspaces allow you to segregate data based on complex business, geographic, or regulatory requirements.

An SQL Endpoint allows you to leave data in place and continue to analyze it in the Warehouse or Lakehouse, even in other Microsoft Fabric workspaces, through seamless virtualization. Each Microsoft Fabric Lakehouse stores data in OneLake.

shortcutsallows you to query folders at any location on OneLake.

Each Microsoft Fabric Warehouse stores tabular data in OneLake. If only one table is added, the data in the table is exposed as Delta Lake datasets in OneLake. Shortcuts allow you to reference folders in any OneLake where Warehouse tables are exposed.

Exchange and consultation between work areas

While workspaces allow you to segregate data based on complex business, geographic, or regulatory requirements, it is sometimes necessary to facilitate the sharing between these lines for specific analysis needs.

A Lakehouse SQL Endpoint can enable easy data sharing between departments and users, where a user can contribute their own capacity and storage. Workspaces organize departments, business units, or analytical domains. Using shortcuts, users can find data from any repository or in Lakehouse. Users can instantly perform their own custom analysis from the same shared data. In addition to helping with departmental chargebacks and consumption allocation, this is also a zero copy version of the data.

SQL Endpoint makes it easy to query any table and share. Added controls for workspace roles and security roles that can be overlaid to meet additional business requirements.

Use the following steps to enable data analytics in all workspaces:

  1. Create a OneLake shortcut that references a table or folder in a workspace that you have access to.
  2. Select a Lakehouse or Warehouse that contains a Delta Lake table or folder you want to analyze. After selecting a table/folder, a shortcut will appear in Lakehouse.
  3. Switch to SQL Endpoint in Lakehouse and find the SQL table that has a name that matches the name of the shortcut. This SQL table references the folder in another workspace.
  4. Queries the SQL table that references data in another workspace. The table can be used like any other table in SQL Endpoint. You can connect tables that reference data in different workspaces.

Use

If the SQL table does not appear immediately in SQL Endpoint, you may need to wait a few minutes. SQL table referencing data in another workspace is created late.

Analyze disaggregated data

Data partitioning is a well-known technique for optimizing access to data in data lakes. Partitioned data sets are stored in the hierarchical folder structures of the format/year=/mes=/day=, whereyear,mes, ydayare the partition columns. Partitioned data sets allow faster data access if queries filter data using predicates that filter data by comparing predicate columns with a value.

INSQL endpointit can represent partitioned Delta Lake data sets as SQL tables and allow you to analyze them.

Next step

  • What is a lake house?
  • Create a lake house with OneLake
  • Understand standard Power BI data sets
  • Load data in the house of the sea
  • How to copy data using the Copy activity in Data Pipeline
  • Tutorial: Move Data to Lakehouse via the Copy Wizard
  • Connection
  • Sea House Warehouse
  • Ask in store
Top Articles
Latest Posts
Article information

Author: Frankie Dare

Last Updated: 07/13/2023

Views: 6221

Rating: 4.2 / 5 (53 voted)

Reviews: 92% of readers found this page helpful

Author information

Name: Frankie Dare

Birthday: 2000-01-27

Address: Suite 313 45115 Caridad Freeway, Port Barabaraville, MS 66713

Phone: +3769542039359

Job: Sales Manager

Hobby: Baton twirling, Stand-up comedy, Leather crafting, Rugby, tabletop games, Jigsaw puzzles, Air sports

Introduction: My name is Frankie Dare, I am a funny, beautiful, proud, fair, pleasant, cheerful, enthusiastic person who loves writing and wants to share my knowledge and understanding with you.