Why LLMs currently suck when it comes to DAX code

Why LLMs currently suck when it comes to DAX code
Photo by Vincent van Zalinge / Unsplash

In my recent post about Power BI MCP I wrote, that there is no real-world training-material for enterprise usage like DAX code bases. In this post, I would like to go a bit more into detail because I have the feeling, that a lot of people underestimate this fact a lot.

I won't consider myself an AI-expert, but I know enough to have an opinion on their output.


Background:

Training data for large language models (LLMs) such as the ones behind ChatGPT, Grok or Claude is gathered from a variety of sources. In most cases it is:

  • publicly available information from the Internet
  • bought datasets from third party companies
  • data from users of the service

Because the model quality strongly relies on the training data, the output kind of shows what quality the training data had. If you ask a chatbot a question and it answers with some gibberish that does not make sense, it is because either:

  • no sufficient training data has been found and the models simply answers what is the most probable answer or
  • the training data itself did not have a sufficient quality.

The first generations of chat bots such as Llama or Mistral tend to not give high quality answers to specific answers because they were just kinds of proof-of-works, where innovators could show that the mathematical proof-of-concepts - describing these possibilities - actually work.


Why this is a problem:

The data companies are generating internally is their competitive advantage. They learn from their own and their competitors mistakes and use the available data for getting insights in complex correlations.

So what happens now is a double edged sword:

On the one hand, LLMs are used to push more code more frequently. It is faster to let AI write a base and to manipulate it in a suiting way than starting from zero. Imagine you want to write a query for your single point of truth like SQL Server. Consider that a lot of people working with Power BI did the completely same just with varying credentials. There is a huge possibility, that someone has posted correctly written code on GitHub for the AI model to be trained with.

For basic tasks that every company does the same way, there is a lot of training data on the internet. It is comparable easy to find a lot high quality material for training. This is literally what "standards" are for.

But at a certain point in the Query, no chat bot can give you really good answers without either getting trained on your companies data or with sufficient context/RAG possibilities e.g. with a vector database attached. This means, that when the data that is generated from your custom coded SAP module from that one senior ABAP developer who coded it 10 years ago and then went into pension (that the current developers would never touch) needs to be refined or made sense of via queries, current models will have a hard time knowing what you want it to do.

For certain tasks, on the internet still no proper training data exists because it is locked in the companies databases.

Now people will say "yeah but you know that a LLM will only provide good answers that match your case if you give enough context, right?". Yeah I know, but it does not need to have a lot context for currently easier tasks. If there would be training data for more complex tasks, you would not need as much context as right now. Context and additional knowledge is a sledgehammer that is used to hit the nail that hides the fact that AI models are still very stupid when it comes to specific tasks.

It is unthinkable that a company will voluntarily give away their competitive advantage. And so it is unlikely, that LLMs from proprietary services will ever reach the same knowledge as a model trained on company-owned data.

Maybe the consultant-job is not as unsafe as I thought it would be...


Conclusion:

The most valuable data, and this is why innovators often speak of the data if it was the new "gold", is the company data that represents the companies competitive advantage. It is the data that you produce when you e.g. book stock movements in the companies ERP.

Only companies that are lean in their processes will survive on the long therm and the generated data is the documentation of how the competitive advantage was created. Take the company acquisition of "Signavio" by SAP in 2021 as an example. Signavio created a tool to data mine through processes in SAP and to show how users use SAP. This enables companies to make their processes transparent without a consultant sitting 24/7 next to your employees. SAP realized the potential that tool had and simply bought the technology including the company.

If you as an employee use any free version of a chat bot in your work time for your tasks at the company, you are giving that competitive advantage freely away. That is how the free tiers of chat bots make money. They take the information you give them in form of free high quality training data and let you use their base model for free.

If you are an employer and your employees are using free versions of chat bots you are allowing that they give away your competitive advantage.

The only solution is to either:

  • subscribe to a service of your choice with an exclusion contract to prevent training on your data or
  • create in house solutions that your employees can use.

If you have the knowledge and money, maybe do both.

Either way, people will use chat bots, you can't stop it. You can not control the outgoing flow of competitive advantage if you do not own the process.


Some last words:

We may have to differentiate in the future. Maybe, AI models won't have "all world knowledge" or never make any errors. We should consider using some kind of Pareto–principle here, where we accept that 80% of the tasks completed by AI is good enough to work with. If the 80% are mostly the same because we write similar DAX code, we should rather question ourselves why we unnecessarily burn so much time for repeating tasks.

At least for the next 5-10 years. If this blog survives until then, we'll speak again.


Updates:

27.01.2026 - Youtuber Maxim Anatsko did a benchmark on 70 different LLMs to check their ability to write DAX code for measures. The results kind of reflect my personal opinion but we have to keep in mind, that his measures are not all openly disclosed and the models have most likely been trained on the dataset because it belongs to Microsoft and is free to use/scrapeable.

Maxim claims to use the Contoso Dataset

https://web.archive.org/web/20260127080433/https://www.reddit.com/r/PowerBI/comments/1qnnwwa/i_benchmarked_70_llms_on_dax/


Here are some details about training sources from OpenAI and Antrophic.

Liked this article? Hit me up with a private message on LinkedIn to discuss it or leave feedback.