Building Security Using Generative AI

Generative Artificial Intelligence, also known as Generative AI, is a cutting-edge technology that generates new data for users using past studies and manually fed information.

The AI-generated data is not limited to text, images, videos, or even code. The mixture of “Data-driven creation” and “New content creation using its knowledge” is something that makes AI unlock a whole new level of possibilities in any given field.

Some very high-level examples of “what generative AI tools can do” are shared below.

Writing realistic and creative text formats: Imagine an AI that can write poems, scripts, musical pieces, emails, or even letters in different styles.
Generating realistic images: This could involve creating new photos, editing existing ones, or even creating entirely new objects or scenes.

Developers and Generative AI

Now that you have understood the basic concept of Generative AI and got an idea of its limitless potential. There is no doubt that it is making immense progress in the field of software engineering as well.

Imagine a world where building applications is not just about coding functionality, but creating new and innovative features for your application with a clear thought process and rapid development speed. This is where Generative AI steps in.
In the traditional app development process, developers had to craft each element of the application (User Interface, Content, Media, etc) with careful attention to the details. Leveraging Generative AI tools can ease most of the development process such as developing features, training machine learning algorithms, intelligent personalized user experience, etc. There are exciting possibilities that arise with leveraging Generative AI.

Generative AI can create entirely new data. For developers, this data is nothing but code or script. By analyzing massive datasets of existing code, generative AI models can learn the patterns and syntax of different programming languages. This allows AI models to not only understand the code functionality but also generate new code snippets or even complete functions based on the given instructions.

Generating code using Generative AI can potentially help developers build infrastructure, automate repetitive coding tasks, boost productivity, assist junior developers, improve creativity, and many other areas.

What are the security concerns of generative-AI-generated codes?

Even though the approach toward development might have changed, there is a critical aspect that is continuously increasing. This critical aspect is nothing but “Security”. We are aware of the fact that Generative AI can automate some coding tasks and speed up SDLC. But, it also introduces a new attack surface.

We have listed some of the top security concerns associated with code generated using generative AI. Let us take a deeper look at it;

Black box problem

One of the biggest challenges of generative AI is the need for more transparency in how generative AI models work. These models are said to be complex and their decision-making process opaque. This makes it difficult to understand how these AI models generate code and identify potential security weaknesses in the output.

For example, a programmer used a GenAI tool to create a login function. The generated code might work as intended on the surface. However, a developer will never know if the GenAI model has introduced any hidden vulnerabilities due to its internal working.

Potential for malicious code injection

Malicious actors could potentially exploit vulnerabilities within the generative models themselves or trick them into generating code with hidden security flaws. This could involve manipulating the training data or crafting specific prompts to influence the AI output.

For example, an attacker finds out how to overload a GenAI model with a specific prompt, causing it to generate code with a backdoor vulnerability that allows unauthorized access.

Insecure coding practices

We know that GenAI models can learn from existing code but may not always understand the minute differences between coding and secured coding practices. The generated code may lack proper security measures, making it an entry for attackers.

For example, a GenAI model might generate code that doesn't properly sanitize user input, leaving the application vulnerable to SQL injection attacks.

Difficulty in code review and auditing

You should know that AI-written codes can be more complex when compared with human-written. This can make it challenging for human programmers to review and audit the code for security vulnerabilities, potentially delaying the identification and mitigation of risks.

Can you trust the AI-generated code and security recommendations?

You must be familiar with tools like GitHub Copilot or AWS Code Whisperer and many similar tools are used by developers to generate code snippets or get security recommendations. These models were trained based on the millions of code lines available at open-source platforms. So, according to you are these recommendations even helpful enough to meet your security requirements?

Earlier in one of our ScaletoZero podcast recordings we asked an AI security expert Jim Manico “What is his confidence score on the AI-based code and security recommendations?” This is what Jim simply quotes

“Well, it depends on what you ask!” - Jim Manico - Founder, Manicode Security

Let us help you understand what Jim meant by articulating what things were discussed. We are sure, you must have been using GenAI tools for a while now. By far, you must have understood that the depth of the output depends on the clarity and depth of your command given to a given AI tool. Now let us break down the entire process that Jim shares with us!

Specific and Clear Command

Vague requests like “Give me a script to perform XYZ” will never help you. This might result in generating lame or less secured code or script. However, if you ask a clear and detailed question such as “Give me a script to perform XYZ task while keeping rigorous and best security practices baked in”, the generated output is likely to have a more secure script. And further you can also break down your commands according to your specific needs.

Don’t Rely Solely on GenAI

Blindly trusting any AI-generated codes and using them in your development processes can land you in a situation that you would never want. There can be a possibility of a potential licensing issue with AI-generated code, and is important to perform rigorous security checks regardless.

Security Best Practices

Software developers with the help of security practitioners should perform a thorough security review process on the AI-generated output. You can review security checks by involving static analysis tools, third-party library scanning, and dynamic security scanners.

Focus on Critical Code

This is as simple as - The more critical the code is for security, the more rigorous the review process should be. For such cases, we recommend to follow a deeper manual code review process.

Static Analysis

At the very least, we recommend using a static analysis code like ours (Cloudanix) to identify and fix security vulnerabilities present in the generated AI code.

Code Complexities

It is a recommended practice to look at the code complexity metrics like cyclomatic complexities. There are potentially complex AI-generated codes that might be challenging to maintain and understand, leading to security risks. Remember that the code should have lower complexities for better understanding and maintainability.

Standard Security Checks

It is recommended to use standard code security tools that are commonly used in DevOps pipelines to review any code, especially AI-generated code, before deploying it to production.

OWASP Top 10 Risks for LLM Applications

Recently, OWASP TOP 10 for LLM (Large Language Models) was introduced to raise awareness and provide a common framework for understanding and mitigating security risks associated with Large Language Models (LLMs).

This OWASP Top 10 for LLM Applications Cybersecurity and Governance Checklist is for leaders across executive, tech, cybersecurity, privacy, compliance and legal areas, DevSecOps, MLSecOps, and Cybersecurity teams and defenders. Here's a list with explaination:

LLM01 Prompt Injection: Attackers can manipulate LLMs through carefully crafted prompts or inputs, causing the LLM to execute unintended actions. This could lead to unauthorized access, data breaches, or the generation of misleading or harmful content.
For example, An attacker might craft a prompt that tricks the LLM into revealing sensitive information or performing actions that violate security policies.
LLM02 Insecure Output Handling: Treating LLM outputs as completely reliable can be dangerous. If outputs are not carefully validated and sanitized, they could be used for downstream attacks like code injection or social engineering scams.
For example, An LLM used to generate marketing copy might create content containing malicious scripts if the output isn't properly reviewed and filtered.
LLM03 Training Data Poisoning: The quality and security of an LLM heavily depend on the data it is trained on. Attackers might try to inject biased or poisoned data into the training data, influencing the LLM's behavior and potentially causing it to generate harmful content or perpetuate biases.
For example, Training an LLM on a dataset containing a lot of racist or offensive language could lead to the model generating similar outputs, perpetuating hate speech.
LLM04 Model Denial-of-Service (DoS): Attackers might attempt to overload an LLM with excessive requests, causing it to become unavailable to legitimate users. This could disrupt critical services or applications that rely on the LLM.
For example, A coordinated attack flooding an LLM with millions of useless prompts could exhaust its resources and prevent legitimate users from accessing its functionality.
LLM05 Supply Chain Vulnerabilities: LLMs often rely on external libraries, frameworks, and other components. Vulnerabilities in these dependencies can be exploited to compromise the LLM itself or the applications that use it.
For example, An LLM that uses a text processing library with a critical security flaw might be susceptible to attacks that exploit that vulnerability.
LLM06 Sensitive Information Disclosure: LLMs can inadvertently disclose sensitive information if they are not properly configured or if they are trained on data containing confidential details.
For example, An LLM used for customer service chatbots might accidentally reveal a customer's credit card number if it wasn't trained to handle such information securely.
LLM07 Insecure Plugin Design: Many platforms allow extending LLM functionality with plugins. Poorly designed or insecure plugins can introduce vulnerabilities that attackers can exploit to compromise the entire system.
For example, A malicious plugin for an LLM used for content generation might inject malware into the generated content without detection.
LLM08 Excessive Agency: Assigning too much autonomy or decision-making power to an LLM can be risky. LLMs can make mistakes or be biased based on their training data. Human oversight and control mechanisms are crucial.
For example, An LLM used for automated financial trading decisions might make risky investments due to limitations in its understanding of the financial market.
LLM09 Overreliance: Solely relying on LLMs for critical tasks without proper validation and human involvement can be dangerous. Overreliance can lead to missed errors, biases, and security vulnerabilities.
For example, Using an LLM to write legal documents without human review could result in inaccurate or legally problematic contracts.
LLM10 Model Theft: LLMs represent valuable intellectual property. Measures need to be in place to prevent unauthorized access, copying, or theft of the LLM model itself.
For example, An attacker might steal a trained LLM model and use it for malicious purposes, such as generating fake news or creating deepfakes.

Use of generative AI to build more secure application architectures

We saw the best practices to generate not-so-complex codes that you can understand. After going through all the risks and threats that are caused by AI-generated code, you may question “What is the use of AI if it cannot take care of the security side of architecture?”. We felt the same, and found that we can leverage AI to build secure application architectures as well! Here are some examples;

Threat Modeling using GenerativeAI

As we said earlier, Generative AI models can be trained based on datasets. Similarly, we can train our AI models on vast datasets on security vulnerabilities and attack patterns. This allows them for potential threats in application designs during the early planning stages. By simulating attacks and analyzing weaknesses, AI models can guide developers toward more secure architectural choices.

Read more about Threat Modeling here.

Generating Secure Code

Maybe not the entire code, but AI models can assist in generating code snippets with built-in security best practices. For instance, it could suggest secure coding patterns or identify common pitfalls to avoid during development. This can improve the overall security posture of the codebase.

Read more about Code Security best practices for developers here.

Automated Security Testing

Generative AI can be used to create a wider variety of automated security tests. GenAI can help you automatically generate test cases that target different attack vectors and scenarios, GenAI can also help uncover vulnerabilities that traditional static analysis tools might miss.

Security Configuration Optimization

Even if you have configured your system for security, GenAI can help you optimize that and suggest improvements. Identifying weaknesses or redundant settings can help optimize security controls and ensure they are aligned with best practices for the specific application architecture.

Penetration Testing Assistance

AI models can be leveraged to assist penetration testing teams by creating customized test scripts or simulating specific attacker behaviors. This can streamline the testing process and uncover hidden vulnerabilities that might be difficult for manual penetration testing alone.

Continuous Security Monitoring

AI can now be integrated into security monitoring systems to analyze network traffic and application logs for suspicious activity. By continuously learning and adapting, GenAI models can potentially detect novel attacks or zero-day vulnerabilities that traditional signature-based detection might miss.

Insights from Cloudanix

Case Studies

The real-world success stories where Cloudanix came through and delivered. Watch our case studies to learn more about our impact on our partners from different industries.

Read Case Studies >

Checklist for you

A collection of several free checklists for you to use. You can customize, stack rank, backlog these items and share with your other team members.

Go to checklists

Cloudanix docs

Cloudanix offers you a single dashboard to secure your workloads. Learn how to setup Cloudanix for your cloud platform from our documents.

Take a look

Code security enhancements, full scan support, and more

Adding 150 policies for AWS services takes our total to 610 misconfiguration policies, making checks even more comprehensive. Read on to see more details.