Connect SQL databases to AI chatbot | LLM Tutorials
It's very common for an AI chatbot to interact with unstructured files like PDF, Word, but there are also common needs when you want AI to retrieve data from some structured data sources like MySQL, PostgreSQL, Snowflake, Clickhouse, MongoDB, or Microsoft SQL Server.
Enabling LLM to interact with databases involves a lot of untypical techniques which you don't need with PDF files. In this article, we will walk through all the tech problems you need to solve for databases, helping you to build a good database chatbot faster and better.
Problems related to database
Here's the main problems you need to address when interacting with databases for LLM agent:
- SQL Accuracy
You need to make sure the generated SQL is just right for the user's question. If wrong, then it will become very misleading for your customers.
- Security and authorization
Most of the time you cannot let AI run any SQL queries, like update data, or drop table.
And you may also want to "hide" some sensitive tables and columns which you do not want your users to see.
- Access control
It's very common to restrict one user to only view their own data. So you have to have fine-grained access control, defining the row-level filter to filter out the data one user should not see.
- Speed
Unlike PDF files, you run SQL queries to fetch data, and sometimes SQL queries may take too long for users to wait. You have to optimize your database to handle queries fast to ensure a good user experience.
SQL Accuracy
To ensure SQL accuracy, you have to inject the table schema into the context.
You have to retrieve and cache the schema, and retrieve the related tables according to the user's question using some embedding techniques.
But that's not always enough, you may also need some documentation to tell the AI what these schemas mean (schema itself may be hard to understand and has a lot of implicit conventions).
If you do not want to implement this yourself, you can use tools like AskYourDatabase, and all you need is to input your connection string, and you are ready to go. The tool will do all the work in the background with best practices.
Security and authorization
Here are the best practices to ensure AI does not accidentally modify or delete your data:
Always use the user with minimum access, like read-only access, restrict the read access to the system tables, etc.
And also remember to only whitelist your IP to the server that needs to connect to your database, and make sure your credentials are stored safely.
If you use AskYourDatabase, they store your credentials securely and encrypted in a secure vault, and all data transfers are made over the TLS protocol.
Moreover, AskYourDatabase will also sanitize the SQL to ensure it does not contain any harmful instructions, like DROP, UPDATE, etc.
It enables you to hide tables and columns which you do not want users to see; all blacklisted tables and columns will not be accessible by AI. Even if the AI generates SQL code to fetch the data from some hidden table, the SQL will not be allowed to run.
Access control
This is a very important feature if you want to ship a chatbot to your end user.
Implementing this yourself is really difficult and will take a lot of time.
With AskYourDatabase, you can define a row-level policy with a simple "SELECT *" statement, and just by referencing the context variables like userId, you are done.
You can also test it in debug panel, mock context variables:
Speed
You need to get answers as quickly as possible, and here are some techniques:
- Cache the schema; do not try to retrieve the schema every time the user asks a question.
- Use OLAP databases like Snowflake, Clickhouse, etc.
- Add necessary indexes to streamline the SQL, and make sure your schema is well designed.
Conclude
Implementing a production-ready AI chatbot that connects to a database is technically challenging and requires a lot of work.
Using platforms like AskYourDatabase will save you a lot of time and get the best results instantly.