How to Review and Refactor Code with GPT-4 (and ChatGPT)

March 28, 2023

It’s time to enhance your coding process with AI-driven code improvement. The GPT family of models from OpenAI offers developers a wealth of opportunities to improve, review, fix, and even outsource code writing. Knowing how to use these large language models during the development process will soon become an essential tool in the developer toolkit.

Ever wondered how AI could transform your coding process? In this guide, we’ll demonstrate techniques for using ChatGPT or GPT-4 to review and refactor code, as well as discuss some limitations and provide handy resources for using these LLMs in your programming workflow.

We’ll start with ways we can review and refactor, and for each of these, you may want to experiment with different prompts and the provided techniques to unlock the full potential of AI-driven code improvement.

Note: I will use the term GPT4 during the content as this is the current model prompted for the containted examples. ChatGPT (GPT3.5) works in a similar way and can be used interchangeably.

Refactoring Code with GPT-4 and ChatGPT
Tips for Better Refactoring
A Multi Prompt Approach (Prompt Chaining)
Limitations
Conclusion

Refactoring Code with GPT-4 and ChatGPT

Refactoring code is crucial for several reasons: it enhances readability and maintainability, making it easier for developers to understand and modify the code. It also helps prevent potential bugs and ensures that your code adheres to established best practices. Ultimately, refactoring can lead to more efficient, reliable, and robust software.

GPT-4 can significantly contribute to this process. By using GPT-4 to review and assess your code, you can quickly identify areas for improvement and create cleaner, more efficient code. Leveraging GPT-4’s capabilities can save time and reduce human errors in the process, resulting in more consistent adherence to coding standards.

Modernizing Old Code

You can quickly modernize your code by providing your old function to GPT-4 and ask it to refactor it to modern coding practices.

Note: This prompt example has introduced a logical error in the function. We’ve intentionally shown this example as a great as example of ‘you get what you ask for’. When asking ChatGPT to review the code for logical errors, it is able to identify them.

Prompt: Refactor the following code to modern es6 programming standards

Before:

function tempConvert(temp, scale) { // Vague function name
  var result = 0; // 'var' is so es2015
  if (scale == "C") { // Is this the correct way to compare strings?
    result = (temp - 32) * (5 / 9); // Magic numbers
  } else {
    result = (temp * (9 / 5)) + 32;
  }
  return result.toFixed(1) + "°" + (scale == "C" ? "F" : "C"); // This line feels a bit cluttered...
}

After:

/**
 * Converts a temperature between Celsius and Fahrenheit.
 * @param {number} temp - The temperature value to convert.
 * @param {string} scale - The temperature scale ('C' or 'F') to convert from.
 * @return {string} The converted temperature value, formatted with one decimal place and the appropriate unit.
 */
const tempConvert = (temp, scale) => {
  const isValidScale = scale === "C" || scale === "F";
  if (!isValidScale) {
    throw new Error("Invalid scale: must be 'C' or 'F'.");
  }

  const conversionFactor = 5 / 9; // <-- Note: This is not correct when 
  const result = scale === "C"
    ? (temp - 32) * conversionFactor
    : (temp * conversionFactor) + 32;

  const resultUnit = scale === "C" ? "F" : "C";
  return `${result.toFixed(1)}°${resultUnit}`;
}

In this refactored version, GPT4 quickly made the following improvements:

Added JSDoc comments
Converted the tempConvert function to an arrow function.
Replaced var with const for immutable variables.
Used strict equality checks (===) for string comparisons.
Introduced a constant to eliminate magic numbers.
Simplified the return statement with template literals.

Issues:

Logic: This technique and prompt is adept at updating known coding practices but can turn a blind eye to business logic. The example below incorrectly refactors the conversion logic.

After (Take 2):

We the same prompt from … and voila, it returned a much better version of the function. This two, or more pass, approach is extremely effective.

...
function convertTemperature(temperature, scale) {
  const isValidScale = scale === "C" || scale === "F";
  if (!isValidScale) {
    throw new Error("Invalid scale: must be 'C' or 'F'.");
  }

  const convertedTemperature = scale === "C"
    ? (temperature * (9 / 5)) + 32
    : (temperature - 32) * (5 / 9);

  const resultUnit = scale === "C" ? "F" : "C";
  return `${convertedTemperature.toFixed(1)}°${resultUnit}`;
}

Adding Coding Best Practices or Principles

Your organisation or code base might use certain coding practices and styles that you want to keep. For situations like this, provide instructions to GPT4 on what particular coding practice, or style you’d like it to focus on.

Prompt: Review the following code and refactor it to make it more DRY and adopt the SOLID programming principles.

Note with the same input it has adopted the SOLID programming principle of Single Responsibility and split the function into 3. Unlike the ‘Modernize’ step earlier, it did not convert the function to an arrow function.

“By extracting the conversion logic into separate functions, we’ve improved the code’s adherence to the Single Responsibility Principle (part of SOLID), as each function now has a clear, focused purpose.”

function tempConvert(temp, scale) {
  const isValidScale = scale === "C" || scale === "F";
  if (!isValidScale) {
    throw new Error("Invalid scale: must be 'C' or 'F'.");
  }

  const result = scale === "C" ? celsiusToFahrenheit(temp) : fahrenheitToCelsius(temp);
  const resultUnit = scale === "C" ? "F" : "C";
  return `${result.toFixed(1)}°${resultUnit}`;
}

function celsiusToFahrenheit(temp) {
  const conversionFactor = 9 / 5;
  return (temp * conversionFactor) + 32;
}

function fahrenheitToCelsius(temp) {
  const conversionFactor = 5 / 9;
  return (temp - 32) * conversionFactor;
}

Detecting and Fixing Errors

Sometimes we are blind to the vulnerabilities or potential issues our code can create. Getting GPT-4 to review and address code issues can save you more than just time.

Prompt: Review this code for errors and refactor to fix any issues: ...

Input (Flawed) Function:

function updateTotalPrice(items, discountPercentage) {
  let totalPrice = 0;

  for (let i = 0; i < items.length; i++) {
    const item = items[i];
    totalPrice += item.price * item.quantity;
  }

  // Potential error: discountPercentage is not validated or checked for boundary conditions
  const discount = totalPrice * (discountPercentage / 100);
  totalPrice -= discount;

  return totalPrice;
}

Output:

function updateTotalPrice(items, discountPercentage) {
  // Fix 1: Validate input parameters
  if (!Array.isArray(items) || typeof discountPercentage !== 'number') {
    throw new Error('Invalid input parameters');
  }

  // Fix 2: Check for boundary conditions of discountPercentage
  if (discountPercentage < 0 || discountPercentage > 100) {
    throw new Error('Invalid discount percentage value. It should be between 0 and 100');
  }

  let totalPrice = 0;

  for (let i = 0; i < items.length; i++) {
    const item = items[i];
    
    // Fix 3: Validate item properties and types
    if (!item.hasOwnProperty('price') || !item.hasOwnProperty('quantity') || 
        typeof item.price !== 'number' || typeof item.quantity !== 'number') {
      throw new Error('Invalid item properties');
    }

    totalPrice += item.price * item.quantity;
  }

  const discount = totalPrice * (discountPercentage / 100);
  totalPrice -= discount;

  return totalPrice;
}

Changes that GPT-4 made:

Added input parameter validation to ensure items is an array and discountPercentage is a number.
Checked boundary conditions for discountPercentage to ensure it’s within the 0-100 range.
Added validation for item properties and their types to ensure price and quantity are numbers.

Create Boilerplate Code (*my favourite)

Starting new projects can be a painful. GPT-4 doesn’t know your business logic but it can be used to kick off the boring boilerplate code on day 1. This isn’t technically refactoring but it’s amazing and can be part of an process in the programming lifecycle.

Prompt: Write me a boilerplate Node.js function that will take a variable of type User, validate that the user has the right permissions, fetch an array of item type Posts from a postgres database and return them. Leave comments for business logic.

Create Unit Tests

Not only are unit tests important for your code base but they can be a useful way to quickly validate anything produced by GPT-4.

Prompt: Create 2 unit tests for the provided code. One for a successful condition and one for failure.

Transpiling Code

There are many reasons you may need to convert code from one language to another. You’ve found a repo with code for one language that you need in another, you’re moving code bases, or maybe your boss read an article on the latest front end framework and now you’re moving to {divisive new library}.

In any case, GPT-4 can provide assistance with a simple prompt.

Prompt: Rewrite the following code in Rust: ...

Add comments to code

If your code is self-explanatory but requires commenting, this can be a huge time-saver.

Prompt: Add comments to the following code ...

Tips for Better Refactoring

Like many things in life, with GPT-4, you get out what you put in. In this case, providing more context, instructions, and guidance will usually produce better results.

Here’s a short list of tips and techniques to improve your code refactoring:

Split your prompts: Try breaking your prompts and desired outcome across multiple steps. Keeping prompts to have a single outcome has shown to produce better results than combined prompts. For example, ask for a review, then ask for a refactor based on the review response. This may become less important in time as LLMs increase their token limit.
Give Examples: Provide expected inputs, data and outputs to improve accuracy quality.
Be Specific: Don’t be afraid to list exactly what you want, what you know, what is needed, and what not to include.
Ask it to Reflect: A technique called reflexion has been shown to increase GPT4’s accuracy. Basically ask it ‘Why were you wrong?’ or get it to reflect and review its own response.

A Multi Prompt Approach (Prompt Chaining)

Writing prompts for GPT-4 is much like writing functions. A well designed set of prompts is one where each has separated concerns and singular responsibilities. Here’s a simple chain of prompts to update, refactor, and review a piece of code. You may not need each step in your case.

You will notice that the produced function at the is very different to what we created earlier. In each step of the process we either improve the code or validate it’s efficacy before running the code on our system.

Note: The same input function from earlier was used but run on a different model and different day.

1. Modernize and Add Best Practices

Get GPT-4 to re-write your code into the style you want. As seen above, this step will generally result in coherent output, in the style you want, but may introduce errors, so we do it first.

Prompt: Review the following code and re-write it to modern es6 programming standards and formatting: {{tempConvert function from earlier}}

GPT-4 Response:

const tempConvert = (temp, scale) => { // More descriptive function name
  let result = 0; // Use 'let' instead of 'var'
  const fahrenheitToCelsius = (temp - 32) * (5 / 9);
  const celsiusToFahrenheit = (temp * (9 / 5)) + 32;

  if (scale === "C") { // Use strict equality operator (===) for comparison
    result = fahrenheitToCelsius;
  } else {
    result = celsiusToFahrenheit;
  }

  const convertedScale = scale === "C" ? "F" : "C";
  return `${result.toFixed(1)}°${convertedScale}`; // Use template literals for cleaner string concatenation
};

2. Review for Logical Errors and Security Concerns

Get a review of your code with recommendations to improve any logical or security concerns introduced. It’s important that we don’t ask for a refactor, just the reasoning behind wanting the refactor.

Prompt: Review your provided code 'tempConvert' for any logical or security concerns and provide a list of recommendations.

GPT-4 Response: Here are some recommendations for improving the tempConvert function to address logical concerns and make it more robust:

Input validation: The function does not perform any input validation for the provided temp and scale parameters. This could result in unexpected behavior if the input is not in the expected format. To address this, you could add input validation checks:

if (typeof temp !== 'number' || (scale !== 'C' && scale !== 'F')) {
  throw new Error('Invalid input. Temp must be a number and scale must be either "C" or "F".');
}

Unhandled scale value: If the scale parameter is not “C” or “F”, the function will still perform the Celsius to Fahrenheit conversion, which could be misleading. You could add a check to handle this case:

if (scale === 'C') {
  result = fahrenheitToCelsius;
} else if (scale === 'F') {
  result = celsiusToFahrenheit;
} else {
  throw new Error('Invalid scale value. Scale must be either "C" or "F".');
}

Code readability: To improve readability and maintainability, you can separate the conversion logic into two separate functions for each conversion (Fahrenheit to Celsius and Celsius to Fahrenheit). This will make the code easier to understand and maintain:

const fahrenheitToCelsius = temp => (temp - 32) * (5 / 9);
const celsiusToFahrenheit = temp => (temp * (9 / 5)) + 32;

3. Validate the recommendations (Reflexion)

Validate the provided recommendations. As shown above reflexion is a powerful technique to improve the accuracy of the initial recommendations and try to eliminate any hallucinations.

This is not always required but it is worth asking if you are unsure about any recommendations. In our example, it just repeats the same recommendations. In this case, it did not provide any additional information but did confirm the recommendations were valid.

Prompt: Review your above recommendations. Tell me why you were wrong and if any recommendations were overlooked or incorrectly added?

GPT-4 Response: I apologize for any confusion. After reviewing the tempConvert function I provided earlier, I realized that I made an error in my previous response. I already incorporated strict equality (===) and template literals in the provided function.

However, my initial recommendations still apply to improve the function’s robustness and readability. To summarize, here are the updated recommendations:

Input validation: Add input validation checks for the temp and scale parameters to ensure they are of the expected types and values: {code}

Unhandled scale value: Handle cases where the scale parameter is not “C” or “F” to avoid misleading conversions: {code}

Code readability: Separate the conversion logic into two separate functions to improve readability and maintainability: {code}

I hope this clears up any confusion. The above recommendations should enhance the logical flow of the tempConvert function and make it more robust in handling various input scenarios.

4. Write the code

Combine your reviews, recommendations and feedback and get GPT-4 to write your fancy new function.

Prompt: Re-write the convertTemp function based off your review and recommendations.


const fahrenheitToCelsius = temp => (temp - 32) * (5 / 9);
const celsiusToFahrenheit = temp => (temp * (9 / 5)) + 32;

const tempConvert = (temp, scale) => {
  if (typeof temp !== 'number' || (scale !== 'C' && scale !== 'F')) {
    throw new Error('Invalid input. Temp must be a number and scale must be either "C" or "F".');
  }

  let result = 0;

  if (scale === 'C') {
    result = fahrenheitToCelsius(temp);
  } else if (scale === 'F') {
    result = celsiusToFahrenheit(temp);
  } else {
    throw new Error('Invalid scale value. Scale must be either "C" or "F".');
  }

  const convertedScale = scale === "C" ? "F" : "C";
  return `${result.toFixed(1)}°${convertedScale}`;
};

5. Create Tests

Finally, we ask it to create some simple JavaScript tests that we can run locally and validate the results of our fancy new function.

Prompt: Create two vanilla javascript tests for the above tempConvert function. One that is expected to pass and one that is expected to fail.

GPT-4 Response: Here are two simple vanilla JavaScript tests for the tempConvert function, one expected to pass and the other expected to fail:

const testTempConvertPass = () => {
  const inputTemp = 32;
  const inputScale = 'C';
  const expectedResult = '0.0°C';

  const result = tempConvert(inputTemp, inputScale);

  if (result === expectedResult) {
    console.log('Test pass: tempConvert with valid input');
  } else {
    console.error(`Test fail: tempConvert with valid input. Expected ${expectedResult} but got ${result}`);
  }
};

const testTempConvertFail = () => {
  try {
    const inputTemp = 32;
    const inputScale = 'InvalidScale';
    tempConvert(inputTemp, inputScale);
    console.error('Test fail: tempConvert with invalid input should throw an error');
  } catch (error) {
    console.log('Test pass: tempConvert with invalid input');
  }
};

// Run the tests
testTempConvertPass();
testTempConvertFail();

Limitations

This article is very pro-AI, however these models are not perfect and cannot (yet) accurately replicate business logic, among other things. Here’s a list of things to look out for and avoid when using GPT-4 to review or refactor your code:

It can be (confidently) wrong: GPT4 is trained to sound convincing but that doesn’t mean its always right. Another great article on refactoring Golang with ChatGPT reported ‘It got rid of the type checks with the confident explanation that type-asserting a non-int value to an int type will return the zero value for the int type, but this is not correct and will panic‘.
Saving time upfront may not be worth it in the long run: Sure, GPT-4 can generate you 50 lines of code in a minute but it may end up taking you 45 minutes to debug and tweak it if it is not fit for your codebase. You would have been better off writing it yourself.
It can be out of date: The technology world moves fast. “GPT-4 generally lacks knowledge of events that have occurred after the vast majority of its data cuts off (September 2021)“. You may encounter issues with any newly updated library, framework, or technology.

Conclusion

AI-powered programming is only new but it is here to stay. When used correctly it can save time and help us write better code. I hope you’ve enjoyed this article and have taken away some new skills to boost your programming productivity or error handling.