ChatGPT for SQL: How to Generate better SQL Queries with OpenAI
Introduction
You can leverage ChatGPT for various tasks like:
- Gaining Business Insights.
- Designing and Migrating Table Schemas
- Conducting Data Analysis
- Optimizing and Correcting SQL Queries
Although ChatGPT can write SQL, but there are some important best practices that can help ChatGPT write SQL with much higher efficiency and accuracy.
In this article, we'll delve into how to unleash the full potential of ChatGPT in SQL-related tasks.
Context Preparation
ChatGPT needs to understand some context to generate valid SQL queries. Typically, for SQL generation, you need three types of data:
- Table schemas related to your question.
- Background documentation about tables and columns.
- Your question.
Table schemas are the most important information; they contain what columns your database has and how tables are related through foreign keys.
However, table schemas are not always enough for AI to generate accurate results, because there is usually some implicit information that is hard to infer from the schema itself, and only the developers know it.
For example, if you have a column named "status" typed as VARCHAR, and it has three values: "Active", "Inactive", "Pending", it's impossible for ChatGPT to know the exact enumerations and their meanings of this column without being given this extra information.
Table Schemas Related to Your Question
First, you need to obtain the string representation of your table structures.
Taking MySQL as an example, we use the "DESCRIBE" command to retrieve the table schema. By executing this command, you can get a detailed view of the table's structure:
It becomes cumbersome when your query involves several tables; you have to manually execute repetitive commands and concatenate them. This process becomes even more tedious if your schema changes, forcing you to redo this.
However, if you are using tools like AskYourDatabase (the ChatGPT for SQL), you don't have to go through these steps at all. It automatically fetches all related table schemas for you, including table relations, comments, data types, and then feeds this information into ChatGPT, just like this:
It saves you a lot of time, and will save you more when you have tons of tables.
AskYourDatabase automatically fetches related schemas, syncing with the latest version against your database, and let ChatGPT understand your database automatically.
It's worth noting that the clearer and more self-explanatory your schemas are, the more likely you are to obtain better SQL results.
So, how can you make your schema more AI-friendly? Here are some tips:
- Use semantic naming for tables and columns so that ChatGPT can better understand your schema. For example,
user
is better thanu
,sales_amount
is better thansa
. - Use foreign constraints to link tables, so that ChatGPT can understand the relationships between them.
- Add descriptive comments to schema when it's not self-explanatory or has implicit conventions.
For more information about how to get better generated SQL, kindly refer to these two articles:
Background documentation about tables and columns
Why this matters?
Imagine ChatGPT is a new intern who just came to your office yesterday, although he/she is very smart and capable of writing complex SQL queries, it's still hard for he/she to deeply know the business meaning behind the table schema.
Usually only the backend or BI engineer who knows the business meaning behind the schema so well, so giving these background documentations to ChatGPT will help it better understand your database's business logic.
What should the background documentations include? Here's our advice:
- The business meaning of each table.
- SQL examples. These examples are more important when you wanna ask some quick questions like "What's the revenue of yesterday", if you have given the right SQL of "What's the revenue of the last month", then ChatGPT will quickly come up with the write SQL instead of thinking from scratch.
If you are using AskYourDatabase, adding these docs is really easy and straightforward, just go to the training page, and add the docs you need.
First, click the 'Train' button:
Then add all the docs and SQL examples:
After adding all of this information, you are ready to go without any extra setup, and ChatGPT will know all of the information you specified and hopefully give you better results.
Your question
The thing is, the more information you provide in the documentation, the shorter and more ambiguous your question can be.
If you have a well-defined SQL example of "What are the best five customer acquisition channels last month?"
You can directly ask another question about "What are the top 4 customer acquisition channels last week?" without having to specify any columns and tables.
But if you have no docs, you have to manually specify all the columns, like "Give me the top five customer channels (the source is stored in the source column) sorted by the signup users divided by the visitors, and the signup users are in the users table, and the visitors are in the log table....)"
So you see, if you have some often asked questions that are complex but have some fixed patterns, you'd better write a SQL example. It's like a "Quick manual" for ChatGPT.
SQL Generation vs Chat with databases
Actually, there are two main ways of interacting with databases:
- Generate SQL, copy the SQL, execute the SQL, view the response in a database client.
- Chat with databases in ChatGPT - ChatGPT will fetch the data, explain the data for you, and even do further data analysis & visualization.
It's obvious that chatting with databases is a more preferable way to interact with databases because you don't need to do a lot of trivial things like copy & paste, and ChatGPT serves as an AI data analyst who can connect to the database and perform any possible data analysis.
If you are using raw ChatGPT or other tools like text2sql.io, you are just doing SQL generation. To chat with databases, it's highly recommended to use tools like AskYourDatabase, as they let you get any data insights without leaving your chat thread:
Data analysis & visualization
SQL work always involves further data tasks, including analysis and visualization.
Instead of toggling between several tools, you can directly do all of these things in AskYourDatabase.
Below is a visualization example:
You can even let ChatGPT to do data analysis tasks like:
Question: Analyze the order details to identify potential problems.
Answer:
Conclusions
With enough context, ChatGPT can indeed generate relatively accurate SQL.
Using tools like AskYourDatabase, which is highly optimized for querying SQL databases, will enable you to chat with your databases, saving you a significant amount of time.
More importantly, AskYourDatabase is a desktop app that keeps your database credentials safe and even supports SSH tunneling connections.
If you are looking for effective ways to generate SQL using AI, then tools like AskYourDatabase might be a suitable solution for you.
You can also learn more about SQL AI articles here:
How does AskYourDatabase protects your data
Comparison of the Top 4 SQL AI Tools in 2024