SQL Performance Tuning
SQL Textbooks teach us how to get the results by writing queries. In the production environment, query execution time matters just as much since it influences user experience or even leads to a production incident of “Runtime Out” due to platform threshold setup. For example, Tableau Server Admin usually set up a “runtime threshold” to automatically fail the backgrounder extract refresh if a query runs too long.
There are two major topics when we talk about SQL performance: Index Utilization and Avoiding Whole Table Reference. Before you apply any of these practices to your slow query, try to use
EXPLAIN QUERY PLAN on the query to see how the database interprets it.
Here are the tips/tricks/best practices I have utilized in my SQL journey:
- String search
When we perform LIKE %%, SQL automatically gives up index reference.
SELECT * FROM USER WHERE USER_NAME LIKE '%Sean%'
SELECT * FROM USER WHERE USER_NAME LIKE 'Sean%'
2. Avoid OR and IN Operator
OR operator is the “behind logic” of IN operator. E.g.
USER_ID IN (1, 3, 5) is translated into USER_ID = 1 OR USER_ID = 3 OR USER_ID = 5. Both IN and OR operators require SQL to rebuild the execution plan, which can be an expensive step sometimes.
--For OR Operator
SELECT * FROM USER WHERE USER_ID = 1 OR USER_ID = 5--After Tuning
SELECT * FROM USER WHERE USER_ID = 1
SELECT * FROM USER WHERE USER_ID = 5--For IN or NOT IN operators
----If it is numeric filed:
SELECT * FROM USER WHERE USER_ID IN (1,2,3,4,6)
SELECT * FROM USER WHERE USER_ID BETWEEN (1,6) AND USER_ID!=5----If it is string field:
USE EXISTS in a subquery
Utilizing these practices can be grammarly redundant sometimes. You don’t have to abandon the SQL-told way if the query runs within seconds. These practices can help if you have a long execution time.
3. WHERE 1=1
This query has been used by me for quite a long time since it makes the indent looks much neater in SQL. I just learned that
WHERE 1=1 can actually cause SQL to do a table scan instead of an index scan. Need to validate this one!
4. ORDER BY aligning with WHERE
When the fields in ORDER BY and WHERE are the same, the index can be utilized more efficiently compared to when they are not.
A lot of times ORDER BY can be unnecessary when your query result is just fed into another platform like Tableau, where they will be sorted again, either on the same field or even a different field. If ORDER BY is needed, the numeric field is faster than the string(varchar) field.
5. UNION ALL vs UNION
UNION ALL is faster since UNION removes duplicates. If you are sure that there is no duplicate, use UNION ALL. If you are not sure about the duplicate, you can use UNION ALL and then remove duplicates in your code later, especially in the situation when you have to perform multiple UNIONs.
6. Multiple JOINs
When you are joining multiple tables together, try to put the big tables at first and smaller tables at last. MS SQL and Oracle SQL read tables from the right to the left and reading larger tables first makes the query faster. However, MySQL uses the opposite read sequence: you should put smaller tables first for MySQL.
A stored Procedure generally performs faster than a function. It is also more powerful compared to functions.
8. The WHERE magic
SQL interprets WHERE clause from the bottom to the top. Putting the WHERE clause which can filter out most data at last.