- Article
Important
Direct Lake is currently onCONNECTION. This information relates to a preliminary product that may change significantly prior to release. Microsoft makes no warranties, express or implied, with respect to the information provided in this document. Before testing in your environment, be sure to readKnown issues and limitationslater in this article.
direct lakeThe Datasets mode is an innovative new feature for analyzing large amounts of data in Power BI. Direct Lake is based on uploading parquet format files directly from a data lake without having to query a lakehouse endpoint and without having to import or duplicate data in a Power BI dataset. Direct Lake is a quick way to load data from the lake directly into the Power BI engine, ready for analysis. The following chart shows how the classic and DirectQuery import modes compare to the new Direct Lake mode.
Use
Data Warehouse currently does not supportCONNECTION.
In DirectQuery mode, the Power BI engine queries the data at the source, which can be slow, but avoids having to copy the data. Any change to the data source is immediately reflected in the query results.
On the other hand, with import mode, performance can be improved because the data is cached and optimized for business intelligence queries without having to query the data source for each DAX query sent by a report. However, the Power BI engine must first copy the data to the dataset during the refresh. Any changes to the source are only picked up withNextupdate of the data set.
Direct Lake mode eliminates the import requirement by loading data directly from OneLake. Unlike DirectQuery, there is no translation to other query languages or execution of queries on other database systems, providing performance similar to import mode. Because there is no explicit import process, it is possible to detect any changes to the data source as they occur, combining the advantages of DirectQuery and import modes and avoiding their disadvantages. Direct Lake mode may be the ideal choice for analyzing very large data sets and data sets with frequent updates to the data source.
previous requirements
Direct Lake is only supported on Power BI Premium P and Microsoft Fabric F SKUs. It is not supported on Power BI Pro, Premium Per User, or Power BI Embedded A/EM SKUs.
Lake House
Before using Direct Lake, you must provision a lakehouse with one or more delta tables in a workspace hosted in a supported Power BI or Microsoft Fabric capacity. Lakehouse is required because it provides the repository for your parquet format files on OneLake. Lakehouse also provides an entry point to launch web modeling to create a Direct Lake dataset.
For information on provisioning a lakehouse, creating a delta table on the lakehouse, and creating a dataset for the lakehouse, seeCreate a lake houselater in this article.
SQL endpoint
As part of Lakehouse provisioning, an SQL endpoint for SQL queries and a default dataset for reports are created and updated with any tables added to Lakehouse. While Direct Lake mode does not query the SQL endpoint when data is loaded directly from OneLake, it is required when a Direct Lake dataset needs to seamlessly fall back to DirectQuery mode, such as when the data source uses specific features, such as views or advanced security. , cannot be read via Direct Lake.
Known issues and limitations
The following are known issues and limitations belowCONNECTION:
The direct limits of the size of the lake are likely to change duringCONNECTION. The more definitive limits will be determined and described in this GA (General Availability) article. If the limits are reached, the queries are executed in DirectQuery mode. The limits are based on the number of rows per table used by a DAX query. The row count varies based on the size of the SKU. To determine if queries fall back to DirectQuery mode, seeAnalyze query processing for Direct Lake data sets.
You must use the Lakehouse-integrated web modeling experience to generate Direct Lake data sets. Creating Direct Lake datasets with Power BI Desktop or XMLA-based automation tools is not yet supported.
When generating a Direct Lake dataset in a QSO-enabled workspace, manually sync the dataset using the following PowerShell commands with the installed Power BI admin cmdlets (replace WorkspaceID and DatasetID with your workspace GUIDs work and data set):
Login-PowerBIInvoke-PowerBIRestMethod -Url 'groups/WorkspaceId/datasets/DatasetId/sync' -Method Post | KonverterFrom-Json | Formato-lista
Calculated columns and calculated tables are not yet supported.
Some data types may not be supported.
Only single sign-on (SSO) is supported.
Built-in scenarios that are based on service principles are not yet supported. Direct Lake models use single sign-on (SSO).
The dataset UI might display a warning icon on a table even though the table has no problems. This will be addressed in a future update.
The initial default/auto-generated dataset may not be in Direct Lake mode if there is only one table in the Lakehouse. To make the dataset use Direct Lake mode, you need to make a small change to the table in Lakehouse, such as renaming the table. The name change should cause the dataset to switch to Direct Lake mode.
Create a lake house
Complete the following steps to create a lakehouse, delta table, and dataset in a Microsoft Fabric or Power BI workspace.
Create a lake house
In your Microsoft Fabric or Power BI workspace, selectHe>Show everything, and then enteredcomputer technology, you must chooseLake Houseteja.
yo dennew lake housedialog box, enter a name, and then selectcabinet. The name must contain only alphanumeric characters and underscores.
Confirm that the new Lakehouse is created and opened.
Create a delta table in Lakehouse
After you create a new Lakehouse, you must create at least one delta table so that Direct Lake can access some data. Direct Lake can read parquet format files, but for best performance it is better to compress the data using the VERDER compression method. VORDER compresses the data using the Power BI engine's native compression algorithm. In this way, the engine can load the data into memory as quickly as possible.
There are several options for loading data into a Lakehouse, including data pipelines and scripts. The following steps use PySpark to add a delta table to Lakehouse based on aAzure Open Dataset.
To add a delta table to Lakehouse
In the newly created Lakehouse, you have to chooseopen notebookand then selectnew notebook.
Copy and paste the following code snippet into the first code cell to give SPARK access to the open dataset, then pressShift + Enterto run the code.
# Azure-lageradgang infoblob_account_name = "azureopendatastorage"blob_container_name = "holidaydatacontainer"blob_relative_path = "Behandlet"blob_sas_token = r""# Tillad SPARK at læse fra Blob remotelywasbs_path = 'wasbs://%score@%s.blob.blob. net/%s' % (blob_container_name, blob_account_name, blob_relative_path)spark.conf.set( 'fs.azure.sas.%s.%s.blob.core.windows.net' % (blob_container_name, blob_account_name), blob_sas_token) print ('Blob-sti remoto: ' + wasbs_path)
Verify that the code generates an external blob path.
Copy and paste the following code into the cell below, then pressShift + Enter.
# Læs Parquet-fil ind i en DataFrame.df = spark.read.parquet(wasbs_path)print(df.printSchema())
Verify that the code generates the schema DataFrame.
Copy and paste the following lines into the next cell, then pressShift + Enter. The first statement activates the VORDER compression method, and the next statement stores the DataFrame as a delta table in Lakehouse.
# Gema como deltatabel spark.conf.set("spark.sql.parquet.vorder.enabled", "true")df.write.format("delta").saveAsTable("vacaciones")
Verify that all SPARK jobs have completed. Expand the SPARK job listing to see more details.
To confirm that a table has been created, in the upper left area next toMesas, select the ellipse (…), then selectUpdateand then expandMesasnode.
Using the same method as above or other supported methods, add multiple delta tables for the data you want to analyze.
To create a basic Direct Lake dataset for your Lakehouse
In your Lakehouse you have to chooseNew data set, and then INew data setdialog box, select tables to include in the data set.
chooseConfirmto generate the Direct Lake dataset. The dataset is automatically saved to the workspace based on your Lakehouse name, and then you open the dataset.
chooseopen data modelto open the web modeling experience where you can add DAX table relationships and measures.
Once you're done adding DAX relationships and measures, you can create reports, build a composite model, and query the dataset through XMLA endpoints just like any other dataset. DuringCONNECTION, XMLA write operations are not yet supported.
Analyze query processing
To determine if DAX queries from a report visual to the data source provide the best performance using Direct Lake mode or by falling back to DirectQuery mode, you can use Performance Analyzer in Power BI Desktop to analyze the queries. For more information, seeAnalyze query processing for Direct Lake data sets.