Knowledge Graphs and Large Language Models (LLMs) Together [part 2].

LLMs are increasingly present in our daily lives: they answer questions, generate texts, summarize information and much more. But despite their amazing ability to deal with natural language, these models have limitations: they can “make up” facts, confuse concepts or lack access to up-to-date or reliable knowledge. And this is where knowledge graphs come in. These structures organize information in a precise and relational way, allowing LLMs to draw on well-organized and verifiable data. We explore how knowledge graphs can become a key ally in improving the accuracy, transparency and reliability of language models, helping them “really know” what they are talking about.

Tempo di lettura: 7 minuti

In the article Knowledge Graphs and Large Language Models (LLMs) Together [part 1] we explored how LLMs can be used to generate and curate Knowledge graphs (KGs). However, there is more than just this interaction between these two tools. In fact, KGs are used in many contexts to enhance the responses of LLMs.

There are several reasons to use a KG to power and govern GenAI pipelines and applications. According to Gartner, “Through 2025, at least 30 percent of GenAI projects will be abandoned after proof of concept (POC) due to poor data quality, inadequate risk controls, increased costs, or unclear business value.” KGs can help improve data quality, mitigate risks and reduce costs.

Data governance, access control, and regulatory compliance

Only authorized people and applications should have access to certain data and for certain purposes. Typically, companies want certain types of people (or applications) to be able to chat with certain types of data, in a well-governed way. How do you know which data should go into which GenAI pipeline? How can you make sure that personal information doesn’t end up in the digital assistant you want all your employees to chat with? The answer is data governance. Some additional points:

Policies and regulations can change, especially when it comes to AI. Even if your AI apps are compliant now, they may not be compliant in the future. A good foundation of data governance allows your company to adapt to these changing regulations.

Sometimes the correct answer to a question is “I don’t know,” “you don’t have access to the information you need to answer this question,” or “it is illegal or unethical for me to answer this question.” The quality of answers is not only a matter of truth or accuracy, but also of regulatory compliance.

Key players implementing or enabling data governance through KGs (in alphabetical order): Semantic KG companies such as Cambridge Semantics, data.world, PoolParty, metaphacts, and TopQuadrant, but also data catalogs such as Alation, Collibra, and Informatica (and many others).

Accuracy and understanding of context

KGs can also help improve overall data quality: if your documents are full of contradictory and/or false statements, don’t be surprised when your ChatBot tells you inconsistent and false things. If your data is poorly structured, storing it in one place will not help. This is how the promise of data lakes became the scourge of data swamps. Similarly, if your data is poorly structured, vectorization will not solve the problems; it will only create a new problem: a vectorized data swamp. If the data are well-structured, on the other hand, KGs can provide LLMs with additional relevant resources to generate more personalized and accurate recommendations in a variety of ways. There are several ways to use KGs to improve the accuracy of an LLM, but they generally fall into the category of natural language querying (NLQ), which is the use of natural language to interact with databases. As far as I know, the ways in which NLQ is currently implemented are RAG, prompt-to-query, and fine-tuning.

Retrieval-Augmented Generation (RAG)

RAG means supplementing a prompt with additional relevant information outside the training data to generate a more accurate response. Although LLMs have been trained on big data, they have not been trained on your data. Think of the cover letter example above. I could ask an LLM to “write a cover letter for Steve Hedden for a product management position at TopQuadrant” and he would respond, but with hallucinations. A smarter way to do this would be for the model to take this request, retrieve Steve Hedden’s LinkedIn profile, retrieve the job description for the open position at TopQuadrant, and then write the cover letter. There are currently two main ways to do this retrieval: by vectorizing the graph or by turning the prompt into a graph query (prompt-to-query).

Vector-based retrieval: This retrieval method requires vectorizing the KG and storing it in a vector archive. If you then vectorize the request in natural language, the vectors most similar to the request can be found in the vector store. Since these vectors correspond to entities in the graph, it is possible to return the most “relevant” entities in the graph given the natural language requests. This is the same process as described earlier in the tagging functionality: in essence, we are “tagging” a prompt with the relevant tags from our KG.
Prompt-to-query retrieval: Alternatively, you can use an LLM to generate a SPARQL or Cypher query and use it to get the most relevant data from the graph. Note: You can use the prompt-to-query method to query the database directly, without using the results to integrate a query to an LLM. This would not be an application of RAG, since you are not “augmenting” anything. This method is explained in more detail below.

Some additional pros, cons and notes on RAG and the two retrieval methods.

RAG requires, by definition, a knowledge base. A knowledge graph is a knowledge base, and so KG advocates will be advocates of graph-powered RAG (sometimes called GraphRAG). But RAG can also be implemented without a knowledge graph.

RAG can integrate a prompt based on the most relevant data in your KG, based on the content of the prompt, but also on the metadata of the prompt. For example, we can customize the response based on who asked the question, what they have access to, and additional demographic information about them.

As described earlier, one advantage of using the vector-based retrieval method is that if the KG has been incorporated into a vector database for entity labeling and resolution, the hard part has already been done. Finding the most relevant entities related to a prompt is no different than tagging an unstructured piece of text with the entities in a KG.

RAG provides some level of explainability in the answer. The user can now see the additional data that was used for his prompt and, potentially, where the answer to his question lies in that data.

I mentioned earlier that AI is affecting the way we build KGs, while we are expected to build KGs that facilitate AI. The prompt-to-query approach is a perfect example of this. The pattern of the KG affects the ability of an LLM to query it. If the purpose of the KG is to feed an AI application, then the “best” ontology is no longer a reflection of reality, but a reflection of how the AI sees reality.

In theory, more relevant information should reduce hallucinations, but that does not mean that RAG eliminates hallucinations. We are still using a language model to generate a response, so there is still a lot of room for uncertainty and hallucinations. Even with my resume and job description, an LLM could still exaggerate my experience. As for the text-to-query approach, we are using the LLM to generate the KG query and response, so there are two places for potential hallucinations.

Similarly, RAG offers some level of explainability, but not entirely. For example, if we used vector-based retrieval, the model can tell us which entities it included because they were the most relevant, but it cannot explain why they were the most relevant. If we used an automatically generated KG query, the automatically generated query would “explain” why certain data was returned from the graph, but the user would have to understand SPARQL or Cypher to fully understand why that data was returned.

These two approaches are not mutually exclusive, and many companies are pursuing both. For example, Neo4j offers tutorials on implementing RAG with vector-based retrieval and prompt-to-query generation. Anecdotally, I am writing these lines just after attending a conference focused on implementing KG and LLM in the life sciences, and many of the companies in the field that I saw presenting are using some combination of vector-based RAG and prompt-to-query.

The major players implementing or enabling RAG solutions (in alphabetical order) are: data.world, Microsoft, Neo4j, Ontotext, PoolParty, SciBite, Stardog, TopQuadrant (and many others).

Prompt-to-query

Use an LLM to translate a natural language query into a formal query (as in SPARQL or Cypher) for your KG. This is the same prompt-to-query retrieval approach described above, except that you do not send the data to an LLM after it has been retrieved. The idea is that by using the LLM to generate the query and not to interpret the data, hallucinations are reduced. However, as mentioned above, no matter what the LLM generates, it may contain hallucinations. The argument for this approach is that it is easier for the user to detect hallucinations in the self-generated query than in the self-generated answer. I am a bit skeptical of this point because, presumably, many users who use an LLM to generate a SPARQL query do not know SPARQL well enough to detect problems with the automatically generated query.

Anyone who implements a RAG solution using prompt-to-query retrieval can also implement prompt-to-query on their own. These include: Neo4j, Ontotext, and Stardog.

KG for the fine-tuning of LLMs.

Use your KG to provide additional training to a ready-made LLM. Instead of providing KG data as part of the prompt at query time (RAG), you can use your KG to train the LLM itself. The advantage is that you can keep all the data locally, without having to send the prompts to OpenAI or others. The disadvantage is that the first L of LLM is going to large, so downloading and setting up one of these models takes a lot of resources. Also, although a model tuned based on company or industry-specific data will be more accurate, it will not completely eliminate hallucinations. Some additional thoughts on this point:

Once you use the graph to fine-tune the model, you also lose the ability to use the graph for access control.
There are LLMs that have already been developed for different sectors, such as MedLM for health care and SecLM for cybersecurity.
Depending on the use case, an optimized LLM may not be necessary. For example, if the LLM is used primarily to summarize news articles, special training may not be necessary.
Rather than refining the LLM with industry-specific information, some use refined LLMs to generate code (such as Code Llama) as part of their prompt-to-query solution.
Key players implementing or enabling solutions focused on the use of KGs to fine-tune LLMs: To my knowledge, Stardog’s Voicebox is the only solution that uses a KG to fine-tune an LLM for the client.

A note on the different ways of integrating KG and LLM that I have listed here: These categories (RAG, prompt-to-query, and fine-tuning) are neither complete nor mutually exclusive. There are other ways to implement KG and LLM and there will be more in the future. In addition, there is considerable overlap between these solutions and it is possible to combine them. For example, it is possible to run a hybrid RAG solution based on vectors and prompt-to-query on a fine-tuned model.

Efficiency and scalability

Building many separate apps that do not link together is inefficient and what Dave McComb calls a software desert. Never mind that apps are “powered by artificial intelligence.” Separate apps result in duplication of data and code and general redundancy. KGs provide a basis for eliminating these redundancies through a smooth flow of data across the enterprise.

Gartner claims that many GenAI projects will be abandoned because of increased costs, but I do not know if a KG can significantly reduce those costs. I know of no studies or cost-benefit analyses conducted to support this claim. Developing an LLM-powered ChatBot for an enterprise is expensive, but so is developing a KG.

More To Explore

DBMS

Apache Kafka Part 1: What Stream Processing Is and Why It Changes Everything

Kafka is not a typical message broker — it’s the distributed nervous system powering Netflix, LinkedIn, and Uber. It handles millions of events per second without losing a single one, with guaranteed ordering per partition. This first installment explains the core concepts (topics, partitions, offsets, consumer groups) using a real use case: the 50 ARPA Piedmont stations from the Smart City project at Politecnico di Torino.

Alessandro Fiori 6 July 2026

Development

Supabase: the Open-Source Backend for Your Vibe-Coded Apps

Lovable and Bolt build the frontend in minutes. But where does user data live? How does login work? Who can see what? Supabase answers all of these questions: managed PostgreSQL, ready-to-use authentication, file storage, and Row Level Security — all free up to a generous limit, all integrable in a single click from the main vibe coding tools.

Alessandro Fiori 29 June 2026