Tell-HS-KL (RAG): How AI can explain your university?

Students and applicants need a wide range of information to find their way around in university life.

What are the requirements for my study? Which person do I need to address for my internship? Which study programme suits me?

The first source of information is a university’s website. But finding a specific piece of information
can be very complex, and the information could be hidden on one of many sub-pages like a needle in
a haystack. Especially if the university is divided in different departments and locations.
Unfortunately, search functions can fall short of providing clear answers by showing a huge amount
of pages with irrelevant data for the user query. This can lead to problems and confusion for
students. To support students and applicants finding what they search for, an AI chat tool is to be
created. This is a student project of the master’s programme in computer science Deep Learning at
the University of Applied Sciences Kaiserslautern.

How does the AI know about the university?

The most applicable AI tool in mind for this use case is considered to be a RAG. RAG stands for
Retrieval Augmented Generation. This is a type of programme that can provide more precise
answers from a Large Language Model (LLM) to prompt questions by specifying a prepared data
set. Large Language Models like GPT (from ChatGPT), Llama3 or Mixtral can provide qualified
human-like responses for a wide range of general topics to the user. As in the case of a university
website, they can only scan a small number of provided web page links, which is sub-optimal for the
purpose of helping users with their queries. A RAG tool, on the other hand, stores the necessary
data, processes human queries and searches its data to provide an answer.
In this project, required data from the university website is forwarded to a local LLM server based on
Llama3 via the program. The user’s question is converted into a useful format by the programme and
after that send to the AI. The AI analyses the request and filters the available data to obtain the
relevant text blocks required to answer the question.

How is the system structured?

The system is divided into the following components:

User view with Chat
Data storage and processing (backend)
LLM server
Communication with the user view
Communication with the LLM
Testing system

The user view is designed like a chat, where users can ask their own questions to the AI or load the
examples provided. In order for answers to be sent to the user view, the questions asked must be
forwarded to the LLM via the backend server. Beforehand, relevant data is searched for in a
database, given to the AI and, based on this, the AI formulates an answer. Relevant frameworks in
Python and Javascript were used here, such as LangChain and Vue.

Figure 1: Structure of the system providing a chat in the WebApp and receiving data from the RAG.

Why is it necessary to store and process the data?

It is possible to obtain the university’s information via web search and pass it on to the LLM.
However, this requires a lot of resources to find the relevant data. By saving the data from the
university’s website, it can be structured according to relevant information, reduce duplicate content
and stored in text blocks (called chunks) in a vector store. In a vector store, data is stored in a
multidimensional space using multidimensional vectors.
We are using qdrant as vector store in this project.

With simple web searches, duplicate data such as headers and footers can lead to confusion and
unnecessarily large amounts of data. The information text blocks in the vector memory offer the
LLM a simplified search since they are semantically sorted in this space according to their content.
The process of the data retrieval is shown below:

Figure 2: Process of the data retrieval from the university website.

The web pages of the university website get scanned.
The information of each page is divided into small chunks of texts depending on the text blocks
(div container) of the page structure.
These chunks are getting saved in a vector store with some extra information called metadata like
the link from the page and the topic from the page.
Additionally, we let the LLM generate some example questions to the stored data for an automatic
testing approach on a wide range of useful questions.

Why is the filtering special?

As mentioned above, a university page contains a huge amount of text and an LLM tends to
hallucinate the more information it has to process. This leads to false or inaccurate answers.
Therefore, the amount of data sent to the LLM for processing should be reduced. Currently, we send
about 4-8 text chunks to the LLM. To obtain these text chunks, we introduced the vector store
earlier. The text chunks are stored semantically in the vector store. So, by default, the LLM searches
the data based on the semantics of the user query. But in the case of our university context, all the
data has very similar semantics about studies and research. And on top of that, the University of
Applied Sciences Kaiserslautern is divided into different departments and locations. For example, a
student might ask for the Dean’s Office representative for their programme. In the vector store,
several representatives for each department are stored “next” to each other, because they contain
similar information. The process of retrieving the useful text chunks searches by similarity, so all the
text chunks could be retrieved. The LLM chatting with the user has the task of generating an answer
based on the retrieved chunks of text. If the LLM receives the wrong text chunks, it has no choice
but to give no answer or an incorrect answer.

To reduce the risk of forwarding the wrong chunks, we added a filtering system for the chunk
retrieval process from the vector store. The filtering process is displayed in the following figure:

Figure 3: Process of the data retrieval from the university website.

The student asks their question, which is then sent to the server.
The server selects the filter suiting to the user question. The filters are topics like a study
programme, a department or staff. For that, it has different possibilities:
2a. Let the LLM generate a suitable filter in the way of a so called SelfQueryRetriever.
2b. Use a predefined filter depending on the words used in the question. If a department is
indicated in the question, the filter checks the vector store for text chunks with the specified
department.
2c. As final option: Search without filters.
The filtering method is selected and will be processed.
Useful text chunks are searched within the vector store with the selected filters.
Found text chunks are forwarded to the LLM.
The LLM provides an answer based on the given text chunks.

How can an LLM response be tested?

Automatically testing whether an unstructured text response makes sense can be quite difficult. The
whole point of this system is to give sensible answers about the university’s website. There is no
system that will give us a valid answer to a general question that we can compare. So we test
questions about facts concerning the university, where the facts have to be in the answer. These test
questions which we call test cases are either created manually or created when the data is stored by
the LLM - as mentioned before. We send the test question to the LLM and wait for the response to
check if the facts are present.
Example of a test case (for understanding in English):

Question: Who is the programme director of the Applied Computer Science's programme?
Answer: Prof. Dr. Bastian Beggel

We have two approaches for testing the question responses with these test cases:

comparing the response with the test case programmatically with text search
using another LLM to compare the question response with the predefined test case

The programme checks if the predefined keywords are present in the LLM response. This is a simple
task to implement, however there is a problem because the answer of the LLM is unstructured and
varies often. So the keywords with the facts have to consider slightly variations like time format or
different spelling options.

The AI test with the LLM is more suitable for taking variations into account because it is able to
analyse the written text. The system sends a prompt to the LLM indicating the test question and
important answer keywords, followed by the given answer. In the prompt, the LLM is taught to
compare the given answer with the test case and to indicate how many of the keywords are used in
the answer. As a result, the LLM is quite good at checking which answers contain the keywords, but
the scoring of the answer is currently a little different from the expected result. For example, in some
cases the AI will give a score of 80% if all the keywords are mentioned.

How useful are the answers given?

Currently, a wide range of basic information can be recorded on the university website. The AI
answers roughly 70% of all questions that match the information provided. However, the greater the
number of pages stored, the less accurate the LLM is in answering questions. For this reason, an
initial limit of 20 pages has been agreed for precise answers, including pages on study programmes
and campus information.

Conclusion

In the course of the project, a RAG tool was created, consisting of a web application and a backend
data server required for communication with an LLM. Moreover, an automatic testing system was
added to quickly test how code changes affect the output of the system.
A small but effective portion of the university’s website data is stored in a structured form in a
vector store. The AI can answer questions about basic information on these pages. Capturing a wider
range of data from the university’s website requires more in-depth data processing and data
searching to achieve accurate results.