Integrate AI with SQL databases | LLM Tutorials
The structured data is often stored in SQL databases like MySQL, PostgreSQL, SQL Server, Oracle, etc., and chances are if your AI agent is going to interact with your SQL databases, you need to implement some technologies for your AI agent to understand and interact with the DB correctly.
Unlike traditional RAG that is mostly handling unstructured data like PDFs or websites, totally different techniques should be used for your AI to generate high accuracy SQL queries.
In this article, we will walk you through the 3 most important techniques you should use to successfully integrate AI with your SQL databases.
Understand your scenarios
If all you want is a question-answer chatbot that can answer questions by querying the databases using SQL, for analytic use cases, you don't have to develop this from scratch. You can use tools like AskYourDatabase, which support out-of-the-box solutions. Just provide your database connection string, connect, and you are ready to go:
The chatbot can be used on desktop for secure purposes, and also can be embedded into any website as an AI chatbot that answers customers' questions.
If your scenarios need a lot of customization work, and none of the products meet your needs, you may need to develop this yourself.
Here are several techniques to consider if you want to build this from scratch:
Use GPT-4 Level Model
For production scenarios, you should use a GPT-4 level model. Good candidates are Mistral Large, Llama 3 70b, Qwen 2 72b, etc. Using a model at the GPT-3.5 level will result in a poor user experience and increased error rates, so ensure you use a high-quality model unless you are testing on a very simple, empty database.
Retrieve Schema in the Right Way
For most production databases with hundreds of tables, you cannot fit all schema information into the context due to limited context windows. You need a method to search for relevant tables based on user questions to fit them into the context window. Tools like AskYourDatabase have implemented best practices for doing RAG around database schema.
Use Function Calling and Code Interpreter
Make sure the model you are using supports function calls and a code interpreter. Function calls are crucial for generating SQL queries to run, and you also need a code interpreter to analyze the data further. OpenAI has built-in code interpreter support. If you use your own open-source model, you can use E2B to achieve the same functionality. The code sandbox is important because an analytic chatbot often needs to perform further data analysis on the fetched data.
Data Access Control Made Right
For customer-facing scenarios, it is crucial to ensure two things:
- Users can only view their own data with the correct data access.
- Ensure the generated SQL code cannot fetch any unauthorized data or perform any unauthorized operations.
Implementing this requires extensive work, such as parsing and validating SQL queries, applying row-level policies, and more. AskYourDatabase has built-in support for fine-grained data access control, which can save you a lot of time.
Conclusion
Integrating AI with databases is a common feature most AI agents need, but implementing it correctly requires significant effort. Choosing tools like AskYourDatabase will save a lot of time compared to building this from scratch.