Efficient Ways to Update Data in a Table: Simple `CASE` vs. Temporary Tables.
Many times, we are required to update data on a table. This is particularly relevant when the table already exists, and we want to modify a column based on certain conditions. Whether it’s a few rows or thousands, I have found myself coming to a point of choosing the most efficient method for updating data. 🥂📊
In this article, I’ll share the two common methods I have applied: one that works well when the number of rows to update is small (using a `CASE` statement) and another approach that handles large datasets more efficiently (using a temporary table).
- Using the `CASE` Statement
When I only need to update a small number of rows let’s say 300–500, the `CASE` statement is an option I adopt. The `CASE` statement is part of the SQL `UPDATE` query and allows us to specify different values for a column based on various conditions.
Example Scenario:
Imagine you need to update the `current_balance` column for some specific account numbers in a table. Let’s call the table `customer_details`. Here’s how you could do it using a `CASE` statement:
SELECT * FROM customer_details;
account_no created_time current_balance
4018 2024-11-06 NULL
0221 2024-11-06 NULL
7739 2024-11-06 NULL
0644 2024-11-06 NULL
6735 2024-11-07 NULL
UPDATE customer_details
SET current_balance = CASE account_no
WHEN 4018 THEN -16505.01
WHEN 0221 THEN -4050
WHEN 7739 THEN -2152
WHEN 0644 THEN -6353
WHEN 6735 THEN -1052
ELSE current_balance -- In case the account number is not in the list, keep the current balance
END
WHERE account_no IN (4018, 0221, 7739, 0644, 6735);
-- The `CASE` statement checks the value of the `account_no` column for each row and set to the corresponding value.
2. Using a Temporary Table for Large Datasets
When dealing with large numbers in their thousands, using a `CASE` statement may be difficult to use or manage correctly to avoid errors. In such cases, creating a temporary table to store the new values and performing the update is a much better solution. This approach is usually my preference, maybe because it’s cleaner and prone to less errors.
Step 1: Create a Temporary Table
Create a table that will hold the `account_no` numbers and their corresponding `new_balance` values.
CREATE TEMPORARY TABLE account_balances (
account INT,
new_balance DECIMAL(15, 2)
);
Step 2: Insert the Data into the Table
INSERT INTO account_balances (account, new_balance) VALUES
(4018, -16505.01),
(0221, -4050),
(7739, -2152),
(0644, -6353),
(6735, -1052);
Step 3: Update the Main Table Using the Data from Temporary Table
After loading the temporary table, we then run our `UPDATE` by joining the main table with the temporary table on the `account` column. This ensures that only the rows that match in both tables get updated.
UPDATE customer_details c
JOIN account_balances tmp ON c.account_no = tmp.account
SET c.current_balance = tmp.new_balance;
Step 4: Clean Up
Even though temporary tables are dropped automatically at the end of the session, it becomes necessary to explicitly drop them earlier to free up resources.
DROP TEMPORARY TABLE account_balances;
I hope I’ve been able to clearly simplify these two methods for updating data in a table. Whether you’re working with a small set of rows or dealing with a large dataset, choosing the right approach can make all the difference in terms of performance and maintainability.
Have you used either of these methods in your own work, or are there other approaches you’ve found effective for updating data? I’d love to hear your thoughts and experiences. Feel free to share in the comments!🥰