Power BI performance optimization

Karthik Jamalpur
Jul 12, 2024
6 min read

Updated: Sep 14, 2024

PBI optimizing techniques:

When working with an Import model, the Tabular developer has the freedom to optimize and make design choices. Since the Import model is stored in Tabular, any modifications to the model are feasible.

However, in the case of DirectQuery, the database is stored in SQL. Tabular retrieves data from SQL and performs computations with it. However, any changes to the data model require corresponding modifications to the data structure in SQL. This means that you either need complete access to the SQL database, a good understanding of SQL architecture, and the ability to make changes, or you need assistance from a database administrator.

Optimizing DirectQuery involves making various decisions in both the Tabular data model and the SQL database. Be prepared to make or request multiple changes in the source SQL database, as the speed of a DAX query in DirectQuery is directly influenced by the performance of the underlying SQL database.

Lastly, it's important to note that DirectQuery is inherently slower than VertiPaq. Achieving the exceptional performance of an Import model with DirectQuery is nearly impossible. The choice between DirectQuery and Import should be driven by valid reasons, such as the need for real-time reporting or handling a large model that cannot fit in RAM.

If performance is a top priority, it is advisable to choose an Import model as the first option. Opting for DirectQuery means accepting a compromise in terms of performance, and there should be a compelling reason to make that compromise.

Import model:

Using Vertipaq Analyzer, you can analyze the components that consume the most memory and impact the overall size of the model.

Optimizing the model involves reducing its size and implementing best practices when designing semantic models.

Optimization techniques for semantic models in Import mode:

Adhere to star schema principles.
Avoid creating an OLTP model; instead, focus on building an OLAP model with ideally one relationship between all dimension attributes and facts.

Import Only What You Need:

Remove unnecessary columns to optimize the model: Eliminate columns that are not used in reports or the model structure. This practice, known as vertical filtering, ensures the most optimized column is one that is not imported.
Filter out unnecessary rows to minimize the number of rows loaded into the data tables. Employ horizontal filtering by loading filtered rowsets based on entity or time. Ideally, apply this filtering in the native query of the data source (SQL, MDX).

Disable Load of Queries:

Disable queries that act as staging tables and are not used for reporting purposes. This prevents unnecessary memory consumption in Power Pivot.

Push ETL and Transformation Logic Upstream:

Perform ETL tasks as early as possible in the data flow, avoiding ETL operations in DAX. Apply transformations directly in the source and load the result set into Power BI. This reduces the resources required for data import and improves query performance.
Consider using views or materialized views to build complex logic, perform calculations, or filter specific portions of data from data sources instead of loading tables in Power BI and performing transformations.

Minimize Usage of Calculated Columns and Calculated Tables:

Use calculated columns and calculated tables sparingly and only when necessary, providing clear justifications for their creation. Calculated columns are less compressed, and calculated tables are not compressed at all.

Avoid Importing High Cardinality Columns:

Columns with high cardinality, such as timestamps, should be avoided unless necessary. If possible, remove the time component or split them. This recommendation also applies to columns with many decimals.

Do Not Import Primary Keys or Indexes in Fact Tables:

Exclude primary keys and indexes from the fact tables to save space, as they are usually not required.

Do Not Import Header/Details Patterns:

Denormalize header information into the details table and create a correct star schema without relationships between fact tables.

Avoid Many-to-Many Relationships:

Use many-to-many relationships only when absolutely necessary and with strong justifications.

Do Not Create Junk Dimensions in Power BI:

Instead of creating junk dimensions, promote the attributes to a mini-dimension.

Disable Auto Date/Time:

Disabling auto date/time helps reduce the model size and improves performance.

Shorten Strings in Imported Columns:

If feasible, reduce the length of strings to minimize dictionary size. For example, replace GUIDs with integers in the database, enabling processing using VALUE encoding and eliminating the need for dictionaries.

Enable the Large Semantic Models Option for Large Models:

Enabling the large semantic models option increases the segment size from 1 million rows to 8 million rows, enhancing compression rates.

Group by and Summarize:

Pre-summarize data whenever possible in fact tables to effectively reduce the model size.

Use Dual Mode:

Select the appropriate mode, Import or DirectQuery, for each table in Power BI based on requirements.

SharePoint and OneDrive:

Utilize SharePoint and OneDrive for storing files, as they offer seamless integration with Power BI. Storing CSV and Excel files in SharePoint enables online report refreshing without the need for a gateway installation.

CSV VS Excel:

Prefer using data in CSV format for Power BI over the default *.xlsx Excel format due to its efficiency and data processing performance.

Property for Analysis Services Tabular:

Set the property for columns inside a dataset table to optimize performance. This property determines if attribute hierarchies on the columns will be built. Setting it to False for non-attribute columns reduces the model size and decreases processing times.

Strategies for optimizing SQL queries:

Use EXISTS instead of subqueries for improved query performance.
Utilize JOIN clauses for Many-to-1 relationships.
Avoid blind searches by using wildcard characters effectively.
Ensure proper indexing for quick database access.
Minimize loops in query structure.
Avoid correlated subqueries for efficient SQL query processing.
Avoid using functions on the right-hand side (RHS) of operators.
Utilize temporary tables and materialized views to handle bulk data and improve data refresh speed.
Use WHERE clause instead of HAVING for primary filters.

By implementing these optimization techniques, you can enhance the performance and efficiency of your Power BI models and SQL queries.

Direct query optimization:

There are multiple steps you can take in Snowflake to optimize the performance of SQL queries generated by Power BI:

Snowflake-Specific Optimizations

Dedicated Warehouse for Power BI: Allocate a dedicated warehouse for Power BI queries and size it correctly. Auto-scaling can help manage varying concurrency, but setting a maximum size that is too low can impact query performance.
Materialized Views: Utilize materialized views to enhance query performance. Snowflake query optimizer can leverage these views without explicitly mentioning them in the SQL queries generated by Power BI.
Search Optimization Service: Improve performance for high-cardinality dimension columns using Snowflake Search Optimization Service.
Clustering Key: For large fact tables, it is essential to set a clustering key. Include columns used in joins or in the GROUP BY or ORDER BY clauses of SQL queries. Automatic clustering simplifies finding the optimal configuration.
Non-Nullable Columns: Set columns as non-nullable if they will never contain null values. This can simplify the SQL generated by Power BI in some cases.
Referential Integrity: Ensure referential integrity between columns used to join dimension tables and fact tables. Enabling the Assume referential integrity property on relationships in Power BI can enable inner joins instead of left outer joins.
Result Caching: Enable result caching since Power BI often runs the same queries.

Power BI-Specific Optimizations

Limited Access Identity: When connecting Power BI to Snowflake, use an identity that only has access to the necessary tables or views. This involves specifying a Snowflake role and database in the advanced connection options. This reduces the scope of metadata retrieval queries, enhancing performance.
SSO Enablement: Enable Single Sign-On (SSO) on the Power BI-Snowflake connection to ensure that the user identity is used for SQL queries, respecting any security rules defined in Snowflake.
Location Considerations: Minimize network latency and bandwidth issues by ensuring the proximity of your Power BI tenant or Premium capacity to your Snowflake data. Store data in the same Azure Region as your Power BI tenant default region for Snowflake on Azure. Power BI Premium capacity allows for multi-geo configurations.
Avoid Ad-Hoc SQL: Avoid using ad-hoc SQL queries as sources for tables in your dataset. This can lead to performance issues during development and make maintenance more challenging. Instead, use predefined tables or views.
Dynamic M Parameters: Use the dynamic M parameters feature to have more control over SQL generation. This allows you to pass slicer and filter selections back to Snowflake. Prefer Snowflake Tabular UDFs over SQL SELECT statements for this purpose.
Maximum Connections Per Data Source: Adjust the Maximum connections per data source property to improve performance, especially for reports with many visuals or slow queries. The default is 10, but it can be increased to 30 in Power BI Premium.
On-Premises Data Gateway: Utilize an On-Premises Data Gateway to enhance performance by enabling increased parallelism. Gateways adjust settings based on CPU capacity, with additional manual adjustments available.
VNet Data Gateway: For VNet data gateway users, high availability clusters can distribute the load across multiple gateways, improving performance with many concurrent users. Each VNet gateway supports six concurrent queries.
Power BI Premium Workload Settings: Adjust workload settings on Power BI Premium capacities, such as the Max intermediate rowset count, to manage performance. Remodeling data or rewriting DAX expressions can often help avoid hitting these limits.
Aggregations Feature: Use Power BI aggregations feature to add pre-aggregated data tables to your dataset, improving query performance. Both Import and DirectQuery mode aggregations are supported, with automatic aggregations available when SSO is not enabled.
Import Mode Advantage: Import mode generally offers better performance than DirectQuery mode. Mixing storage modes, such as using Dual mode for dimension tables, can optimize performance. Power BI hybrid tables feature allows combining Import mode and DirectQuery for different date ranges.

Power BI performance optimization

Recent Posts

Comments