The scarce resource of Dataverse storage capacity

Is this price right when people are afraid to use your service?

Citizen developers who discover Microsoft Power Platform tools usually start building apps on top of SharePoint lists. Because it’s “free”. The freedom comes in two shapes:

  1. Makers are far more likely to have the rights to create new SharePoint sites and lists, compared to new Power Platform environments.

  2. Users can run the app with just their Office 365 / Microsoft 365 license, without requiring additional Power Apps premium licenses.

For many reasons, this isn’t optimal for either Microsoft or the IT department. Power Platform management and governance capabilities are designed to leverage Microsoft Dataverse. To truly scale low-code solutions into everyday enterprise use and more critical apps, it’s imperative to unlock the broad use of Dataverse as the underlying data source/target. Even if it means bending the truth and calling it “the enterprise data platform for Copilot” as PP product marketing team has recently chosen to do. But we all need marketing in one form or another to get our ideas across, so that’s all right.

Once the customers are finally sold on the idea of purchasing premium licenses for their Power Apps users, are we all good? Well, there’s one detail that might not have been highlighted during the contract negotiation phase. The growing need for Dataverse storage space.

The small print on storage cost

There are public pricing pages on Microsoft’s product websites for most things related to Power Platform. Even if the list prices as such may not be applicable to enterprise customers, the relative cost of each platform element can be quite useful for your calculations (certainly much better than the dreaded “call us” buttons on many SaaS product pages).

The one pricing component that has been absent from the websites as well as the official Licensing Guide PDFs is the Dataverse capacity add-ons. Relying on the public web resources, one had to be smart enough to navigate to the MS Learn pages for Power Platform licensing FAQ to locate this piece of information. I personally had to include it on my blog post about Power Platform price points, so that it was quick enough for me to find as a reference to share with customers.

Recently, with the August 2024 update of the Power Platform Licensing Guide, the pricing for Dataverse capacity add-ons was finally included in the official PDF:

This is a welcome step in the right direction. There’s nothing new about the information, yet it helps to make it apparent to everyone that the cost impact of Dataverse database capacity can be significant. Storing just 1 GB worth of data in Microsoft Dataverse will cost you $40/month, or $480/year.

Everything counts in large amounts

“I’ve never had to purchase those storage capacity add-ons before, and I’ve been using Dynamics 365 CRM / Power Apps just fine.” Good for you! That means you’ve been able to survive with the combination of the tenant’s default capacity (10 GB after first premium license purchase) plus the per-license accrued capacity that grows as the number of licenses grows.

Perhaps you’ve primarily had one central system of record like CRM running on your Power Platform resources. Maybe you haven’t yet realized that your citizen developers are building everything in the Default environment (and with SharePoint lists). Once you reach a point where a formal Power Platform environment strategy is put in place, the number of different environments needed for segregating apps of different criticality, use case and target audience will quickly start to grow. Remember: solution metadata and test data will consume the same storage capacity pool of your tenant.

If you’re not actively monitoring the consumption of Dataverse storage capacity, you can run into nasty surprises. Since there are not features in the platform to allocate storage capacity to specific environments, running over your storage entitlement thanks to one runaway environment suddenly expanding can disable admin operations for all environments. If you’d need to create a copy for testing, backup or restore purposes of your most business-critical Dynamics 365 environments, a citizen developer environment with a recurring Dataflow pumping gigabytes into it on a daily basis can cause collateral damage to the business.

Let’s look at an example of a non-D365 organization adopting low-code business applications and purchasing Power Apps Premium licenses for their 1,000 users. Since not all use of flows is within app context (maybe they need RPA), they also bought Power Automate Premium for 300 users. This would cost them $24.5k per month (list price). Thanks to the tenant base capacity plus additional capacity accrued from Premium licenses, they have 335 GB available for all their Power Platform environments.

It can sound like a lot at first. Still, it’s only 335 megabytes per user. I regularly see PowerPoint presentations bigger than that. With 1k users there’s bound to be org units and geographies that may need their dedicated Power Platform environments for production, test, development (one per each solution if going managed). If these would average 10 GB in size, we would have capacity for around 30 environments in total. Is it going to be enough to allow platform adoption growth in the long run?

Purchasing 150 GB in add-on database capacity would cost $6k/month, or $72k/year. If we’d want to go enterprise scale and reach the Tier 2 discounted price of $30 for Dataverse add-on database capacity, that will mean a minimum 1 TB purchase. Which would already make the storage capacity bill higher than the price of Premium users’ licenses paid. Even then, I don’t think 1 TB feels like “big data” in the year 2024. Some people have that amount of storage on their iPhones.

The expectations for both business process owners and IT professionals on how much data you can throw at a modern system without needing to worry about it aren’t always in line with the reality of Microsoft business apps. This means Power Platform solution architects need to play the cost optimization game while balancing the business requirements vs. the licensing implications. MVP Thomas Sandsør recently wrote an excellent blog post on this topic: Solution Designing With Limits: The Solution Architect’s Challenges in Dynamics/Power Apps.

Solutions from Microsoft

Since Dataverse storage is often a scarce resource in tenants, it makes sense to regularly spend a few human hours on cleaning up their “data estate”. MS has been maintaining a very useful Free up storage space page on their documentation site ever since the XRM days. They recently updated a detail on the page, saying that the results from your cleaning operations may take up to 3 days to be reflected on the capacity reports:

Yes, data deletion can be resource intensive - just like the inserts/updates to Dataverse. If you ever need to run bulk delete jobs for a big Dataverse table, keep in mind that you’ll only be able to delete around 10 million rows per day - even if it’s a simple table with hardly any relationships. I’ve heard that with help from MS you could get to around 20 million rows. Another rumor says you can’t even drop a table if it’s beyond 100 million rows in size. Any way you look at it, you should expect recovering from a data overload to take days/weeks if you don’t notice the situation early enough.

In 2023 Microsoft launched a promising new feature: Dataverse long term data retention. Available for Managed Environments, it allows specifying a data retention policy that will automatically move records that meet a certain criterion (“older than X years and related record Y has field Z set to value 123”) into the long term store. In theory it remains within Dataverse boundaries, while in practice MS pushes it into a data lake managed by them and makes it available in read-only format for compliance and audit type of requirements.

Business application data lifecycle (source: MS Learn)

How much can a customer expect to save once they offload their business records from the transactional premium data storage (Dataverse) to the flat data lake?

Every GB moved from Dataverse database to Dataverse long term retention, consumes, on average, 50% less database capacity. This is because the data is compressed in Dataverse long term retention.

That’s… not a lot. Sure, with all the ongoing schema sync and other functionality built into the service, it’s understandable that the cost is more than just dumping a CSV of the tables into some cloud storage. But still. With data in the long term retention side, you can’t easily view it, retrieve it, nor restore it. It is mostly useful only for compliance scenarios where you actively need to dig up the history.

Fine, there are other tools for analytics scenarios to cover the whole database, right? Yes indeed. Previously, there used to be a convenient Data Export Service (DES) available for customers to keep a replica of their Dataverse databases in Azure SQL. Great for reporting purposes, and more. Until November 2021 when MS decided to deprecate DES.

The alternative solution MS initially proposed was Export to Data Lake, renamed as Azure Synapse Link. More recently, with the introduction of Microsoft Fabric, there is now also the option of not physically replicating the data anywhere outside of Dataverse. Tools like Power BI can query the same data from a OneLake replica that MS makes available for those who purchase Fabric capacity. There’s one catch: you need to also pay for an increase in Dataverse storage:

How much will this cost? Luckily the “compression rate” here is expected to be more efficient than with the long term retention feature. MS docs on Fabric Link present the following estimate:

For example, if you sync 500 GB of data from Dynamics 365, Dataverse storage could increase by around 100 GB (assuming five to eight times data compression).

At Tier 2 list price that 100 GB is $36k in database storage add-on licenses per year. Perhaps no big deal in larger organizations. Still, as mentioned earlier, a few gigabytes don’t feel like it should be “a lot of data” for normal folks these days. Therefore, it’s important to build awareness on the licensing implications of using Dataverse capacity for such scenarios - to avoid nasty surprises later that would erode customer trust in the platform.

Who owns the storage space anyway?

Trust is the keyword to success in business and in life. Trustworthy cloud computing needs to include the service providers respecting the resources customers have rented from the data centers owned by them through the SaaS subscription contracts. But are the Dynamics 365 and Power Apps subscriptions real SaaS products? Not entirely. Despite the per-seat licenses, some aspects are more akin to a PaaS model where you pay for the tools and capacity.

The big issue many Power Platform professionals have raised is that Microsoft themselves are happy to keep consuming more & more of the Dataverse storage capacity for their purposes. The capacity that the customer is charged for directly. The size of internal tables keeps growing while new features get rolled out that rely on Dataverse database to host them.

Just like SharePoint appears “free” to citizen developers, the Dataverse capacity in the customer tenant is technically “free” to the MS engineering team to consume. Customers have little control or notice over how big a share MS will capture from the available storage quota. Which understandably feels really unfair. A while ago, one Power Platform MVP’s statement inspired me to generate the following pic to illustrate this sentiment:

Why is it more dangerous when MS does the capacity consumption, rather than the developers building on top of the platform? It’s about impact and control.

A single rogue citizen may be able to fill up their Power Platform environment with Dataflows and cause a sudden spike there. What Microsoft can do, on the other hand, is to design a new feature (or neglect the maintenance of an old feature) that ends up burning storage capacity in every single environment in the customer tenant.

How can you control what components MS pushes into your environments? That’s the neat part: you don’t. Hidden solutions will get deployed and updated to your Power Platform environments daily. New tables, web resources and other solution components will often get deployed behind the scenes before new features like Copilot stuff gets announced. As a customer, you’re not even expected to have a look at what the 1st party solutions are doing. Just ensure your tenant has sufficient capacity for them.

Power Platform keeps growing as it accumulates more features over the years. This is a good thing - one prime reason you’re building business apps on top of a dedicated platform rather than just rolling your own software hosted somewhere. In the past 5 years, how many new features have introduced inside solutions that use the Dataverse capacity in environments? Quite a few.

In the same time period, how much has the default storage capacity and per-seat accrual from Premium licenses been increased? 0 %. They are still at the same 10 GB and 250 MB levels as 5 years ago. You could think of this as inflation that eats up the amount of resources you get in exchange for your money. While that is a common economic phenomenon, the usual trend and assumption is that the cost of storage decreases every year. That’s not happening here.

“It’s just a database”

This brings us to the final, particularly important aspect of Microsoft Dataverse. Ultimately, customers are not paying just for storage. Yeah, the name of the license may be “Dataverse Database Capacity add-on”. That’s not an accurate way to think about the resources, though. Because if you would just think of it as a relational database like SQL, then the price tag for Dataverse would hardly make sense.

There’s a major difference between whether you are just buying storage space for Azure SQL Server vs. expanding the storage capacity available in your Power Platform environments. On Azure that $0.25/GB storage price is just one piece of the cloud services puzzle that an enterprise-ready business application requires. Dataverse, on the other hand, is a bundled, managed package of a wealth of Azure services designed specifically to best serve your critical business systems, like enterprise CRM.

You can reference the Why choose Microsoft Dataverse docs pages to dig deeper into the reasons why us Power Platform folks coming from the model-driven apps side (Dynamics 365 CE/CRM/XRM) love it so much. It truly is a beautiful foundation for managing business processes and data that I’ve been using professionally for almost two decades now. Once you realize the many ways how Dataverse can reduce both effort and complexity in building and managing common capabilities that would otherwise need custom code or Azure service subscriptions - you’ll want to choose Dataverse as the place for your business data almost every time.

If only we lived in a world where that choice was always possible and obvious.

Reply

or to participate.