I am very excited to publish this blog post and can't wait to start using Microsoft Fabric and all the different components.
The goal of my blog post is to see how you might use Microsoft Fabric from the point of view of a Power BI developer.
Microsoft Fabric is a new all-in-one platform that includes Power BI. It's a single area where you can now complete all of your data requirements without the overhead of having to configure and manage multiple components. It is also possible to use the various components with a smooth experience.
As you can see below, this is how Power BI Direct Lake works, where you don't need to import the data into Power BI, but you can read the data directly from OneLake.
In this blog post below I'll show you how I made it work.
My approach comes from a Power BI user where I am comfortable using the no-code or low-code approach to getting things done.
In my example below, data is being inserted into an Azure SQL database every minute.
Then I'll show you how I use the new features in Fabric to keep the whole process flowing.
I will use the following functions:
Lake House
- This is where all the data will be stored.
Gen2 data stream
- This is the new version of dataflows used by Power Query Online.
Lake Direct Dataset
- This is the Power BI dataset that pulls the data directly from Lakehouse without having to import it.
Power BI Reporter
- This is where I create my Power BI reports
data pipelines
- The data pipe is what I use to update or load my data.
- Power Automate (out of structure, but still used to update Power BI dataset)
The first step in the process for me is to create a new Lakehouse.
Lakehouse is where all the files and tables will be stored.
When I think of Lakehouse, I think of files as source files that are simply loaded in their original form from their source.
While the tables in Lakehouse are the parquet files that have been processed through Dataflow Gen2 or Spark Notebooks.
I navigate to my application workspace, which is not configured as a Microsoft Fabric application workspace.
So in the bottom left corner I click on the Power BI icon and select Data Engineering
So at the top under again, I clicked on Lakehouse.
I then entered my name LH_FourMoo and clicked Create.
Once set up I was able to see my Lakehouse as shown below.
Now I had created my Lakehouse
The next step was to create my Gen2 dataflow, which will pull data from my Azure SQL Server database and load it directly into the Lakehouse table.
I clicked on New Dataflow Gen2.
You will notice that this is very similar to Dataflow.
I then clicked Import from SQL Server.
I then entered all the necessary credentials for the server, the database and clicked Login to authenticate. And once I logged in, I clicked Next.
I then browsed to my "tb_TriedentData" table, selected the table, and clicked Create.
I then added a new column called "Blog Post" to Dataflow Gen2 as shown below.
Before proceeding to load data into Lakehouse, all columns must have a defined data type.
If they don't, as my new column called "Blog Post" currently renders, if I tried to load the data into Lakehouse, I would get the following below, where ANY data type cannot load the data into Lakehouse.
I went to my column called "Blog Posts" and set the data type to Text as shown below.
Another thing to note is that tables going to Lakehouse can't have spaces, so I also had to change the column name to Blog_Post. If you had a space, you would have the same screen as above where the column would not be part of the table loaded into Lakehouse.
The next step I needed to configure is to add a data destination which is where I want the data to go.
In my example I want to put this data in my Lakehouse named "LH_FourMoo", to do that on the ribbon I clicked Home then Add Data Destination and select Lakehouse as shown below.
It should then detect that it will be a Lakehouse connection as shown below. Then I clicked Next.
Next, I need to select the Lakehouse location where I want to create my new Lakehouse table. As shown below, I selected the Lakehouse that I created earlier.
I also chose to create a new table and gave it the table name "tb_TridentData".
Then I clicked Next.
On the "Select target options" screen, I then selected the update method to replace and in the column mapping I could see that all my columns would be mapped.
I then clicked Save Settings, which then saves all my data destination details.
Now I could see the details of the data destination at the bottom right of my Dataflow Gen2.
Then I clicked Publish to publish my Dataflow Gen2
Currently, by default, what it does is save the dataflow with a default name as shown below.
Once I click publish, I also have to wait for the dataflow post to complete.
NOTE: This is the first version of the dataflow, no data has been inserted.
To change the name of the dataflow, I scrolled to select the three dots and then selected "Properties"
I then renamed it to "DF - Azure SQL Data" and clicked Save.
I then clicked the refresh now button which will now load the data into my lakehouse table.
So I could see that Dataflow Gen2 completed successfully.
To confirm this, I entered my Lakehouse "LH_FourMoo"
Then I could see that my table was in Lakehouses under Tables as shown below.
I can also confirm that it is in the required Delta/Parquet format used to create tables in the lake house (note this little triangle on the table icon + the _delta_log folder in the file hierarchy behind this table). If it was just Parquet, we wouldn't be able to take advantage of Direct Lake mode. Delta/Format file format for DirectLake by clicking the three dots next to "tb_Trident" and selecting "View Table Files"
So I was able to see all the files in my table.
One other thing to note, there's a great new feature where on the left hand side I can see my app workspace and below that I can also see my Lakehouse that I have open.
If you were to open the LH_FourMoo SQL endpoint, you would see that it was also opened on the left hand side.
I have now completed the creation of my Dataflow Gen2, published it and uploaded the data to Lakehouse in the required parquet format.
Now this is the most amazing part where I have to create my DirectLake dataset where I don't have to import the data into Power BI but just create my dataset directly from the tables in Lakehouse.
To do this, I must first go to my Lakehouse "LH_FourMoo"
Once in Lakehouse, I clicked on New Power BI Dataset
I then selected my table "tb_TridentData" and clicked continue
Then I was able to see how my dataset was created.
Once it was created I was able to see the dataset
The first thing I wanted to do was rename the default dataset "L_FourMoo(1)" at the top left.
I clicked on the name and updated it as shown below.
To save the new name, I clicked the name at the top again.
The next step for me was to create a Power BI report.
In my dataset, I clicked Create a report and from the dropdown menu I chose "Start over."
Since this data set is not complex, I made a single image with the details. By doing that, this will show me if the data is updating once I put it together. Here's what the Power BI report looks like.
I then saved my report as shown below.
The final piece was that I needed to create a data pipeline that would update the data in my gen2 dataflow and then update the Power BI dataset as well.
Although I am aware that the Power BI dataset uses DirectLake, I still need to refresh the dataset, which will then perform a metadata refresh so that I can see the refreshed data directly from the parquet files.
Before completing my data pipeline, I had already created a Power Automate flow that will update the Power BI dataset, starting with "When an HTTP trigger is received" and then updating the Power BI dataset.
Below is an overview of what I have created with Power Automate, I will not go into detail on how this is completed as I feel it is beyond the scope of the current blog post.
On the bottom left I made sure I selected "Data Engineering"
Then I clicked on Datapipeline.
Then the new pipeline window appeared, I inserted the name "DP - Azure SQL" and clicked Create.
I then clicked Add Pipeline Activity.
Then I chose Dataflow.
On the General tab, I renamed "Dataflow 1" to "DF - Azure SQL" and left all other default values
I then clicked on Settings, clicked on the drop down menu and selected "DF - Azure SQL Data" as shown below.
I then clicked Activities on the ribbon and selected "Web"
So in the general settings I named it "PA - Update URL" and left the other settings default.
I then clicked Settings and pasted in the HTTPS URL of my Power Automate flow that I had shown earlier, selected the "GET" method, and left the defaults.
My last step was to drag "On Success" from the dataflow to the web activity, which means I'll first refresh the dataflow (which will fetch the fresh data) and then refresh the Power BI dataset (via of the Power Automate flow)
I then clicked save to save my pipeline activity.
The last step was to run the pipeline to make sure everything works as expected I clicked Run.
I was then able to see my pipeline details in the output as I ran.
And then when it completed, I was able to confirm it in the output window as below.
Then when I went to my Power BI report, I could see that it updated the report with the details.
In this blog post, I've demonstrated step-by-step how to use Microsoft Fabric to create a Direct Lake for Power BI datasets and Power BI reports.
I'm really excited about Microsoft Fabric and what it has to offer.
Please let me know if there's any other content you'd like me to blog about and I'll see what I can do.