Troubleshooting automatic aggregations to optimize DirectQuery performance
With the click of a mouse button, dataset creators of any skill level can improve the query performance of their DirectQuery datasets! In the Power BI portal, display the settings for a DirectQuery dataset, expand the Scheduled refresh and performance optimization section, and toggle the Automatic aggregations training option to On, as in the following screenshot, and don’t forget to configure a data refresh schedule to update your aggregations on a regular basis. That’s basically all it takes. For more information about how automatic aggregations can help to improve the performance of your report visualizations, see the Automatic Aggregations Overview in the product documentation.
Sometimes, however, when you enable automatic aggregations (auto aggs), you end up with no aggregations and no corresponding performance gains. In these troubled situations, it can be helpful to have more control over the auto aggs process than the user interface provides. Granular control lets you fine-tune auto aggs. You can exclude specific tables by setting a minimum size limit below which you don’t want to generate auto aggs. Or maybe you want to test auto-aggs training with different configuration settings and verify that Power BI indeed added auto aggs to your dataset. The Tabular Object Model (TOM) and the Tabular Model Scripting Language (TMSL) give you the advanced options to cover these scenarios, as explained in this article.
If you are a BI developer and want to follow the .NET Core console application used in this article in your own environment, refer to the attached TOM sample code for details. Make sure you use the latest Analysis Services client libraries and host your dataset on a Premium or Fabric capacity with XMLA Read/Write enabled. If you prefer to work with TMSL instead, keep your SQL Server Management Studio (SSMS) installation at the latest version. And if you are not a BI developer, you still might find the following troubleshooting guidance useful. So, let’s quickly review how automatic aggregations work before diving deeper into various scenarios and code snippets.
For auto aggs to work, you must accomplish the following three essential tasks:
- Generate DAX queries. You need to use reports or other means to query your dataset because the process of generating automatic aggregations relies on a query log that tracks the queries sent to the dataset over seven days. As the diagram below illustrates, the query log serves as input for auto aggs training. So, as a prerequisite to generating auto aggs, make sure you seed the query log!
- Perform automatic aggregations training. Auto-aggs training is the process that generates the system-managed aggs tables in the dataset. This process analyzes the query log as well as the cardinality and other characteristics of the source table columns. This process typically runs at dataset refresh time during the first refresh of the day, but you can also trigger auto-aggs training manually and programmatically.
- Run a data refresh. Keep in mind that auto-aggs training only generates the aggs tables and adds them to the dataset, but it does not fill these tables with data. Loading the aggs data is the job of data refresh, so do not forget to refresh your dataset and configure scheduled refresh to keep the aggs data updated. Only then will you see a query perf improvement for eligible DirectQuery tables.
With the basics covered, let’s now dive into a common troubleshooting scenario: You enabled auto aggs, but you don’t notice any perf gains. In this situation, a good first step is to check that you actually have aggs tables in your dataset. Perhaps Power BI didn’t generate any because there just wasn’t any meaningful input data in the query log, or perhaps the dataset hasn’t been refreshed yet so training never ran, or perhaps training didn’t finish within the time limit of 1 hour, which means training hasn’t completed yet and will resume the next day. Auto aggs training can take a long time depending on how fast the data source can process the cardinality queries and other queries that Power BI generates during the training phase. Whatever the reason, if there aren’t any aggs tables in the dataset, your DAX queries can’t leverage aggregations, of course.
Aggs tables are relatively easy to spot in SSMS. Connect to your workspace, expand your DirectQuery dataset, and look for tables with GUIDs as their names. The following screenshot shows one such aggs table named b83b1a0c-5712-4f35-8d73-d1db0a0c6c33. After scripting out this table, you can see that it is a hidden system-managed table with an inferred partition source.
In managed code using TOM, it’s equally uncomplicated to check for aggs tables. See the following code snippet. As mentioned earlier, the TOM sample code is attached to this article.
/// Check if there are any auto aggs tables, /// i.e. system-managed tables with an inferred partition source. var model = database.Model; if (model.Tables.Where(t => t.SystemManaged && t.Partitions.All(x => x.SourceType == PartitionSourceType.Inferred)).Any()) { Console.WriteLine("This dataset has auto aggs tables!"); } else { Console.WriteLine("This dataset does not have any auto aggs tables!"); }
Let’s continue our troubleshooting journey and assume the dataset has no auto-aggs tables yet. A logical next question could be if auto aggs are enabled at all. Now, you might be tempted to look for auto-aggs settings in the dataset. For example, you could check the AutomaticAggregationOptions property of the Model object. Yet, note that Power BI does not store its auto-aggs settings in the model metadata. The following screenshot illustrates this aspect. The dataset owner enabled auto aggs in Power BI, but there are no AutomaticAggregationOptions in the model. The auto-aggs settings that Power BI uses are only available on the dataset settings page.
So, what’s the purpose of the AutomaticAggregationOptions property if Power BI doesn’t use it? Its purpose is to store a default auto-aggs configuration that you can apply programmatically to auto-aggs training by using the ApplyAutomaticAggregations method. Power BI doesn’t use this property, but custom solutions can.
Instead of using a default auto-aggs configuration, you can also provide an explicit AutomaticAggregationOptions object as an input parameter to the ApplyAutomaticAggregations method. This is the method Power BI uses to submit its own configuration settings. And this is the method that lets you easily and quickly trigger auto-aggs training with different configuration settings as well. The following snippet code illustrates this approach. Refer to the AutomaticAggregationOptions Class in the API documentation for details about supported properties, such as QueryCoverage, DetailTableMinRows, and AggregationTableSizeLimit.
/// Perform auto aggs training using a one-off configuration. /// model.ApplyAutomaticAggregations( new AutomaticAggregationOptions { QueryCoverage = 0.5 }); database.Update(Microsoft.AnalysisServices.UpdateOptions.ExpandFull);
Calling the above ApplyAutomaticAggregations method is all it takes to trigger auto-aggs training. Perhaps training never ran before. Yet, don’t forget the database Update call to persist any generated aggs tables in the dataset.
Still, even a successful completion of the ApplyAutomaticAggregations method is no guarantee that auto-aggs training indeed generated aggs tables. As mentioned several times already, the query log might be empty, so auto-aggs training only “successfully” finds out that there are no aggs to generate. Or auto-aggs training took longer than 1 hour. In this case, you can perform training repeatedly until it finishes. Training is incremental and resumes where the previous cycle left off. Just train, train, train, and when aggs training is finally done, perform a single data refresh operation.
However, how do you decide that aggs training is finished? For this, you must capture the training status raised in an AutoAggsTraining – Progress Report End trace event. This is perhaps most easily accomplished by using SQL Profiler. Make sure you include at least the Progress Report End event class in the trace. The following trace includes an AutoAggsTraining – Progress Report End event with a few important pieces of information:
- aggregationTableCount is 0, so no aggs tables were generated.
- queryShapes.eligible is 0, which means in this particular case that auto-aggs training did not find any useful queries in the query log.
- queryShapes.discarded is empty because the query log was empty to begin with, but if there were discarded query shapes, the discarded property would include one or more of the following counters:
Counter | Comments |
DroppedByUser | Query shapes dropped because the queried table(s) no longer exist in the dataset. |
ImportMode | Query shapes dropped because the queried table(s) are in import mode, which auto aggs do not cover. |
LimitedRelationship | Query shapes dropped because of a limited relationship between the queried table(s). |
RelationshipMismatch | Query shapes dropped because of a relationship mismatch between the queried table(s). |
CalculatedColumn | Query shapes dropped because the queried table(s) have calculated columns, which is not supported. |
DualRootTable | Query shapes dropped because the queried table(s) are in dual mode, which auto aggs do not cover. |
TableExcludedByUser | Query shapes dropped because the queried table(s) are explicitly excluded by the user. |
UnsupportedMDataSource | Query shapes dropped because the queried table(s) use unsupported M data sources. |
UnsupportedNativeDataSource | Query shapes dropped because the queried table(s) use unsupported native data source. |
SSOEnabled | Query shapes dropped because the queried table(s) use an SSO-enabled data source, which is not supported. |
CardinalityEstimationFailure | Query shapes dropped because the cardinality estimation of the queried table(s) failed. |
LargeSizeAggs | Query shapes dropped because the aggregation size of the queried table(s) is too large. |
TableTooSmall | Query shapes dropped because the queried table(s) are too small to create aggregations. |
LeftUncovered | Query shapes dropped because the queried table(s) are left uncovered because of user-provided training targets. |
Unknown | Query shapes dropped because of unknown failures. |
Of course, you can also create a trace programmatically, as the attached TOM sample code demonstrates and the following screenshot illustrates. This time, 77 eligible query shapes existed. Power BI reports had queried the dataset. And aggregationTableCount is 1, so auto-aggs training was able to create one aggs table for this dataset. Success!
Congratulations! At this point, we have mastered the first two essential auto aggs tasks. We generated DAX queries and performed auto-aggs training, which added a system-managed aggs table to the dataset. Yet, the aggs table does not yet contain any data, as the DAX query result in the following screenshot reveals. To load the data, we must run data refresh using any of the available options (manual, scheduled, or programmatic).
The next screenshot shows this article’s console app performing a programmatic refresh through the following TOM code. The sample code then submits the above DAX query for all system-managed aggs tables to count their rows. In production datasets, there would likely be many aggs tables and any aggs table with actual data can help to improve the performance of your DirectQuery dataset.
/// Refresh the model to import data into the aggs tables. /// model.RequestRefresh(RefreshType.Full); model.SaveChanges();
And this concludes this little excursion into auto-aggs processing details. We hope the explanations provide you with sufficient details to master the most common auto-aggs troubleshooting situations. The sample code might also serve as a starting point for a custom solution to add fine-tuned auto aggs to your enterprise BI DirectQuery models. Don’t hesitate to download the attached TOM sample code and try out different AutomaticAggregationOptions settings. For example, you could set the AggregationTableMaxRows parameter or the DetailTableMinRows parameter and see for yourself how different limits affect the generation of your aggs tables. And as always, please provide us with feedback as you leverage automatic aggregations to boost the performance of your DirectQuery datasets. We would love to hear from you!