5 Reasons Why AI-generated Code Smells
LLMs excel at code generation, enhancing efficiency and quality. However, unguided use can lead to pitfalls like poor architecture, duplicated code, and security flaws. By avoiding common mistakes and establishing governance frameworks, developers can leverage AI effectively. This guide outlines ...

One of the most impressive abilities of LLMs is code generation. It can boost the efficiency and speed of coding, help bring ideas to life quickly, and improve overall code quality far more easily than before.
However, many have pointed out—and I’ve personally experienced—that unguided code generation might look appealing initially, but over time, the real cost could actually become even higher than without AI. Common issues include:
- Bad software architecture
- Monolithic structures that are hard to maintain
- Code duplication across the project
- Code that’s hard to read or understand (sometimes overcomplicated)
- No clear architecture (more patchwork than well-structured software)
- Lack of input validation and other vulnerabilities
Even more risks may arise with code generation. However, many can be avoided by steering clear of common mistakes and building a governance framework, rather than relying solely on the AI code assistant and the base foundation model (fine-tuned models don’t necessarily produce perfect code, either).
Below are some of the most common mistakes made in AI-powered software development (in no particular order).
Note that your specific enterprise policies and industry best practices will differ, so here we’ll focus on the general problems many face with AI and code generation.
I’ve used many code generation tools, but I mostly got stuck with Cline (previously Claude-Dev). It’s a real alternative to commercial products like Copilot and Cursor, and in my opinion, it’s even better because of its agent-like capabilities.
---
Problem #1: Using an Outdated Model
Sometimes you set the model in the configuration and then forget about it... which leads to working with outdated models that do not include the latest changes in programming languages and frameworks, which can lead to producing code that is as outdated as the model itself.
Always ensure you’re using the latest model.
Here’s an example of the latest OpenRouter configuration:

anthropic/claude-3.5-sonnet
A list of models can be found here (if you use Claude):
https://docs.anthropic.com/en/docs/about-claude/models
And for OpenRouter (my preference because it has fewer daily limits):
https://openrouter.ai/anthropic/claude-3.5-sonnet
---
Problem #2: Too Little Context
Imagine you start developing a piece of software, build one component, then another, and days later you need to restart the session. Your model basically starts from scratch (models forget everything when you reset the session in your code assistant).
How do you get the model up to speed?
You could ask a question and hope it reads all the relevant files on its own, but that’s risky because it might:
- Produce duplicated code (already existing in the project but not recognized by the model)
- Ignore edge cases or important constraints in files it hasn’t read
- Generate code that’s inconsistent with the rest of the codebase
To avoid this, I recommend giving the LLM as much context as possible at the beginning and relying on prompt caching during the session to save costs on tokens. Of course, this only works if your codebase doesn’t exceed the token limit.
You might argue you only need to show it part of the codebase to save money, and that can work. But if you plan many modifications and you’re unsure which files it needs to consider, spend a bit more to avoid bad code quality by giving the model more context.
One approach is generating a single file with your entire repo (excluding libraries, env files, etc.) via repomix. This tool is super simple:
npx repomix .
Run this in your project’s root, and it’ll create repomix-output.txt
. Make sure not to version-control this file (add it to .gitignore
); you can always generate it again.
Then you can tell Cline, Cursor, or whichever tool you use to read the file:
please read repomix-output.txt
This technique only works for projects up to a certain size. If your code contains millions of lines of code, you will need a different technique, e.g. by generating the repomix for just one component or module of the code you want to work on.
---
Problem #3: Not Being Explicit About Quality
When you ask foundation models with coding capabilities to write code, it might initially be good—safe, working, and following best practices. But as projects grow and become more complex, at some point the LLM starts producing code that doesn’t strictly meet certain quality metrics you expect (e.g., input validation, code reuse, modularity).
Don’t assume the LLM will always produce high-quality code. Be explicit about your quality expectations.
Here are some prompts I use to increase code quality and communication with the LLM (specifically Claude Sonnet 3.5 in Cline):
Use Meaningful Names: Choose clear and descriptive variable, function, and class names.
Follow Consistent Indentation: Stick to a uniform indentation style for better readability.
Write Modular Code: Break code into small, reusable functions and modules.
Document Your Code: Use comments and docstrings to explain the purpose and logic.
Avoid Hardcoding: Use constants or configuration files instead of hardcoded values.
Handle Errors Gracefully: Implement proper error handling and use exceptions appropriately.
Follow Naming Conventions: Adhere to language-specific naming conventions (e.g., camelCase, snake_case).
Optimize for Readability: Write clean and easy-to-understand code, even over compactness.
Use Version Control: Maintain code history and collaboration using tools like Git.
Test Thoroughly: Write unit tests and ensure the code is well-tested before deployment.
---
Problem #4: Misunderstanding
LLMs, like humans, sometimes misunderstand requirements. Due to the large amount of code generated, it can be hard to catch that the code doesn’t meet expectations.
You might notice a drift from requirements at some point. The later it’s discovered, the more expensive and difficult it is to fix. So try to catch misunderstandings as soon as possible, ideally before code is generated.
One way to avoid this is to do what we do with peers in software development: let them paraphrase what they understood and generate documentation as the foundation for the code to be produced.
Please make a proposal. Don't code yet.
This proposal can be a document including everything: problem description, solution blueprint (for a new feature or a change you want to make).

I especially prefer visual descriptions (sequence diagrams, state charts) because they quickly show whether the LLM understood correctly or not.
With this, you can iterate until the requirements understanding matches what you expect. This can save a ton of time, unnecessary token usage, and orphaned code—ultimately improving code quality.
Early Test Generation is Key
For each proposal, also consider generating test cases. This keeps test cases close to the requirements and prevents breaking newly added functionality later by running unit tests, regression tests, etc.
Tip for Cline users: You can save these hints/guidelines in a .cline
file or in Cline’s configuration so you don’t have to type them each time you start a new session.
---
Problem #5: Not Learning from Previous Mistakes
LLMs hold a huge amount of information on programming languages and frameworks, but they do have knowledge gaps—not just because of outdated data on new frameworks but also due to missing or incorrect training data.
For example, generating Next.js applications with Sonnet 3.5 might produce lots of lint errors when building. As mentioned before, you have to be explicit and ask it to adhere to lint rules. As the session grows, you’ll see recurring issues with code generation and communication. They’ll keep popping up in upcoming sessions unless you capture those mistakes and explicitly tell the LLM to avoid them.
One solution is creating a lessons learned
file at the end of the session:
Based on all the coding and communication mistakes, create a lessons learned file to avoid making the same mistakes next time.
Here is what I got out of a recent coding session with Cline / Claude Sonnet
Let me analyze my mistakes in this session:
Initial Misdiagnosis:
I wrongly thought protocol-relative URLs were the problem
I tried to "fix" the URL format instead of checking the actual code
Wasted time trying to modify test data without understanding the issue
Poor Investigation:
Started making random grep searches instead of directly checking image_service.py
Tried multiple wrong approaches before finally reading the actual code
Should have checked image_service.py FIRST since it handles image processing
Test Data Mistakes:
Initially tried to remove test cases instead of fixing them
Created invalid test expectations for SVG processing
Had to be told not to make up test data
Code Understanding:
Didn't realize SVG rejection was built into the API
Assumed URL resolution was the issue when it was image format support
Mixed up content resolution with image processing
Investigation Approach:
Made changes without proper investigation
Tried quick fixes instead of understanding the root cause
Should have started by reading the relevant code
Communication:
Didn't explain my thinking clearly
Made changes without proper explanation
Should have been more direct about findings
The core issue was not taking time to properly understand the code before making changes. I should have:
Read image_service.py first
Understood SVG rejection is by design
Updated tests to reflect actual behavior
Explained findings clearly
Amazing how much it knows about its own mistakes. I wish humans would do the same :)
Applying Lessons Learned to future sessions
Then ask the LLM to read this file when you start your next session. This can save a lot of money and effort, because the model (which isn’t updated automatically) will probably make the same mistakes again if you don’t remind it.

Ultima Ratio: Resolving Deadlocks with O1
Based on our experience with over 60 AI projects, we found that Claude often got stuck in repetitive loops, making the same mistakes repeatedly without any real progress. Waiting for it to improve sometimes felt endless. What worked best for us was copying the relevant part of the code and asking O1 to resolve it. Surprisingly, despite benchmarks, O1 successfully solved the issue about 90% of the time—something Claude Sonnet 3.5 (the latest version) struggled with. Here's the process:
- Copy the relevant code segment or have Sonnet summarize the problematic part (it often recognizes when it’s stuck if prompted to reflect).
- Paste the code and error log into O1.
- Copy O1’s recommendations and solutions back into Claude.
- Allow Claude to implement the fix and retest.
Wrap-Up
We’ve applied these strategies in dozens of projects, adjusting them to create a more controlled and predictable environment for AI-powered code generation. This has significantly improved the code quality produced by large language models. I hope, it can help you as well.
Happy Coding:)
Comments ()