background

Connect SQL databases to AI chatbot | LLM Tutorials

Connect SQL databases to AI chatbot | LLM Tutorials
Sheldon Niu
Sheldon Niu

It's very common for an AI chatbot to interact with unstructured files like PDF, Word, but there are also common needs when you want AI to retrieve data from some structured data sources like MySQL, PostgreSQL, Snowflake, Clickhouse, MongoDB, or Microsoft SQL Server.

Enabling LLM to interact with databases involves a lot of untypical techniques which you don't need with PDF files. In this article, we will walk through all the tech problems you need to solve for databases, helping you to build a good database chatbot faster and better.

Problems related to database

Here's the main problems you need to address when interacting with databases for LLM agent:

  1. SQL Accuracy

You need to make sure the generated SQL is just right for the user's question. If wrong, then it will become very misleading for your customers.

  1. Security and authorization

Most of the time you cannot let AI run any SQL queries, like update data, or drop table.

And you may also want to "hide" some sensitive tables and columns which you do not want your users to see.

  1. Access control

It's very common to restrict one user to only view their own data. So you have to have fine-grained access control, defining the row-level filter to filter out the data one user should not see.

  1. Speed

Unlike PDF files, you run SQL queries to fetch data, and sometimes SQL queries may take too long for users to wait. You have to optimize your database to handle queries fast to ensure a good user experience.

SQL Accuracy

To ensure SQL accuracy, you have to inject the table schema into the context.

You have to retrieve and cache the schema, and retrieve the related tables according to the user's question using some embedding techniques.

But that's not always enough, you may also need some documentation to tell the AI what these schemas mean (schema itself may be hard to understand and has a lot of implicit conventions).

If you do not want to implement this yourself, you can use tools like AskYourDatabase, and all you need is to input your connection string, and you are ready to go. The tool will do all the work in the background with best practices.

Security and authorization

Here are the best practices to ensure AI does not accidentally modify or delete your data:

Always use the user with minimum access, like read-only access, restrict the read access to the system tables, etc.

And also remember to only whitelist your IP to the server that needs to connect to your database, and make sure your credentials are stored safely.

If you use AskYourDatabase, they store your credentials securely and encrypted in a secure vault, and all data transfers are made over the TLS protocol.

Moreover, AskYourDatabase will also sanitize the SQL to ensure it does not contain any harmful instructions, like DROP, UPDATE, etc.

It enables you to hide tables and columns which you do not want users to see; all blacklisted tables and columns will not be accessible by AI. Even if the AI generates SQL code to fetch the data from some hidden table, the SQL will not be allowed to run.

Access control

This is a very important feature if you want to ship a chatbot to your end user.

Implementing this yourself is really difficult and will take a lot of time.

With AskYourDatabase, you can define a row-level policy with a simple "SELECT *" statement, and just by referencing the context variables like userId, you are done.

User level access control for LLM Agent interacting with database
User level access control for LLM Agent interacting with database

You can also test it in debug panel, mock context variables:

Debug access control within one webpage
Debug access control within one webpage

Speed

You need to get answers as quickly as possible, and here are some techniques:

  1. Cache the schema; do not try to retrieve the schema every time the user asks a question.
  2. Use OLAP databases like Snowflake, Clickhouse, etc.
  3. Add necessary indexes to streamline the SQL, and make sure your schema is well designed.

Conclude

Implementing a production-ready AI chatbot that connects to a database is technically challenging and requires a lot of work.

Using platforms like AskYourDatabase will save you a lot of time and get the best results instantly.


More Blogs

Query SQL database in natural language

Query SQL database in natural language

The best approach depends on your team and whether you can invest time in building a SQL AI Agent. For technical teams who want more customized solutions, tools like Langchain may be the best choice. For non-tech teams, choosing tools like AskYourDatabase will likely be the lowest-cost option.

Sheldon Niu
Sheldon Niu
How AskYourDatabase has saved 40 hours per week for SmartLead.ai

How AskYourDatabase has saved 40 hours per week for SmartLead.ai

AskYourDatabase has helped the famous cold email SaaS company SmartLead.ai save 40 hours each week between the CS team and the engineering team. More importantly, the tool has taught the CS team the skills crucial for their growth and success.

Sheldon Niu
Sheldon Niu
;