4 AI coding risks and how to address them
2024年6月13日
0 分で読めます96% of developers use AI coding tools to generate code, detect bugs, and offer documentation or coding suggestions. Developers rely on tools like ChatGPT and GitHub Copilot so much that roughly 80% of them bypass security protocols to use them.
That means that whether you discourage AI-generated code in your organization or not, developers will probably use it. And it comes with its fair share of risks.
On one hand, AI-generated code helps developers save time. They can then dedicate this time to more creative and high-level aspects of development, such as designing architecture and experimenting with innovative features.
On the other hand, an overreliance on AI-generated code leads to security vulnerabilities. It also leads to issues with code uniqueness and intellectual property disputes. Let’s look at these risks in detail and see how you can tread the fine line between human oversight and AI autonomy.
1. Lack of explainability and transparency
Developers that use AI-generated code must review, debug, and improve it. They also need to be able to justify decisions made by AI systems. This is especially true in the case of healthcare, finance, and legal applications, where AI decisions have significant implications. And yet, there’s an inherent lack of explainability and transparency that comes with AI-generated code.
Explainability is the developer’s ability to understand and trace how AI models arrive at their outputs.
Transparency is the AI tool’s ability to provide stakeholders with clear, understandable, and accessible insights into its processes and decisions.
The absence of explainability and transparency in AI-generated code can lead to:
Uninterpretable code, which is challenging to debug and maintain, leads to higher costs and reduced efficiency in the software development lifecycle.
Issues with regulatory compliance, especially in industries regulated for safety, privacy, or ethics.
In the case of publicly available generative AI, it is difficult to understand the rationale behind its output. After all, you didn’t train the AI, so you don’t have complete insight into the data it trained on. This is why maintenance challenges usually arise with AI-generated code.
In some cases, such code may come with inherent biases that are hard to detect. For example, an AI system used in healthcare for predicting patient risks was found to be biased and inaccurate back in 2019. The system's decision-making processes were not transparent or easily interpretable, and it took a considerable amount of time before researchers could identify and rectify the biases caused by flawed associations in the training data.
ChatGPT, GitHub Copilot, and similar tools can code quickly, but these tools don’t always explain their outputs. To overcome these challenges:
Review and assess the AI-generated outputs. Watch for inconsistencies, illogical patterns, or responses that don't align with best coding practices.
Cross-reference the generated code with trusted documentation or authoritative sources to verify its accuracy and logic. Official documentation of your programming languages, frameworks, or libraries will be useful here.
Document any significant AI-generated code or decisions. Include how the AI was used, what inputs were provided, and any modifications made to the output. This documentation will be vital for future reference or if questions arise about the decision-making process.
2. Security vulnerabilities
A study conducted by researchers at Stanford University found that developers using AI systems to generate code were more likely to produce insecure apps. Much like human-written code, AI-generated code can contain flaws or bugs that lead to security vulnerabilities. However, unlike human-written code, such flaws may not follow predictable or well-understood patterns, making such vulnerabilities more challenging to detect and rectify.
Some of the common security issues found in AI-generated code include:
Injection vulnerabilities: Due to AI's potential lack of comprehensive validation for user inputs or improper handling of data, these vulnerabilities enable attackers to inject malicious code or queries into the system.
Authentication weaknesses: Due to flawed authentication mechanisms. These weaknesses allow unauthorized access.
Misconfigured permissions: Due to AI not properly configuring permissions for different components or users. Misconfigured permissions lead to privilege escalation attacks.
Without proper oversight, AI-generated code can be inconsistent and will not adhere to standardized security practices across different parts of the application. To overcome this AI coding risk:
Conduct audits and peer reviews of AI-generated code.
Automate static application security testing (SAST) tools to test AI-generated code as it is generated and deployed.
Train your development teams in the potential security pitfalls of AI-generated code. Security for Developers, a certified course by Snyk and New York University, is a good training program for this purpose.
3. Intellectual property infringement
Recently, a group of authors sued OpenAI for allegedly using copyrighted material to train the large language model behind ChatGPT. While federal judges have recently declared that the output of the generative AI system does not violate the rights of copyright holders, the laws around AI and IP are still ambiguous at best.
AI tools generate code based on vast datasets, including publicly available code and potentially proprietary or confidential information. That means some of its output may inadvertently infringe on IP rights.
To avoid this scenario, take the following steps:
Audit AI-generated code for potential IP issues, especially in the case of publicly available AI systems like ChatGPT and Github Copilot.
Keep an eye on AI news and stay aware of laws related to AI-generated content.
Use mechanisms like watermarking or other digital rights management (DRM) techniques to trace AI-generated code.
4. Lack of policies around AI-generated code
Since generative AI is new and exciting, most organizations are rushing to adopt it, but most haven’t taken the time to create policies around it. Only 10% of companies have a formal policy around AI-generated content.
Without clear policies, the use of AI will vary significantly across teams and projects. This will lead to inconsistent outputs and compromised code integrity and application security. Additionally, you risk reputational damage if your AI-generated code causes harm or exhibits bias.
With the right policies, you will provide clear guidelines to your developers on using AI-generated code. To create such policies and reduce your AI coding risks, you can begin with the following steps:
Define specific areas where AI-generated content can be used within your company. These areas could include code generation, documentation, and/or automated testing. This will help you set boundaries to prevent misuse.
Use tools like Snyk Code to analyze AI-generated code to identify and fix security vulnerabilities.
Assign responsibility for reviewing and approving AI-generated content. Implement peer review systems where developers verify the integrity and reliability of AI-generated code before it's deployed.
Develop protocols for addressing issues arising from AI-generated content, such as potential IP infringements or security vulnerabilities. This should include communication strategies as well as steps for mitigation.
Once you have a policy, educate developers and other relevant stakeholders about it. Also, review and update your policies based on new regulations as they come out.
Reduce your AI coding risks with Snyk
Snyk Code is an easy-to-use SAST tool that provides real-time security analyses with full application context for any type of code, including AI-generated code. Our software uses extensive, expert human-in-the-loop feedback for superior quality and accuracy.
Try checking your AI-generated code for vulnerabilities with Snyk.