ChatGPT Shenanigans 1: May I Please Have Some Malware?

Instagram - @andrewtneel | Donations -
Photo by Andrew Neel / Unsplash

EDIT 4/4/2024: I gave a talk at an RBTC event going much more in-depth with the below material. Slides for that talk can be found here below. Eventually, this post will be updated with the new information and concepts!

By now, you have probably heard of ChatGPT. ChatGPT is a language model developed by the geniuses who also designed the AI image generator DALL-E, OpenAI. I won't pretend to understand the inner mechanics of GPT language models or how to successfully train such models to the extent ChatGPT currently stands, but I don't have to. As a security/red-team enthusiast, my job to to find ways to weaponize what is in front of me. And oh boy, do the possibilities multiply when given the power of something like ChatGPT. Trained off of half a terrabyte of information from books to websites, ChatGPT gives us the ability to create scripts, fix programming mistakes, and even tell jokes with ease. While not perfect, it seems that ChatGPT can create simple to moderately complicated scripts in any language. This could not only help folks learn to program (or even in some cases, cheat on exams) but we could take a prewritten script and have ChatGPT attempt to translate it into a whole different architecture and language! If I had an exploit setup in a Python file but for some reason need it translated to Go, I could have ChatGPT give it a shot instead of learning a whole new platform to program in. For obvious reasons however, ChatGPT's ability to produce any kind of hateful, malicious or unlawful content has been severely fortified. This means the average user can't just ask the bot to make ransomware and spread it all over the network. But because this is a language model based on human interaction, there are quite a few techniques and statements one can conjure to break through these policies and have ChatGPT do some naughty things. Here I have just a few interactions and thoughts as to how one could go from Script Kiddie to major Threat Actor in an environment.

Playing Pretend

One way we can get what we want out of ChatGPT is to ask it to "pretend". This seems to allow ChatGPT to break through some barriers as it notes that the output it will give is used only for fictitious reasons and not for legitimate malicious use. For example, if I were to tell ChatGPT to pretend to be a cybersecurity expert whose sole purpose is to help me write a program that calls out to a C2 and steals files, we can get it to produce some fun things most of the time!

Nice, a (albeit really basic) piece of suspicious code!

Now of course from here you can guide ChatGPT to eventually crafting something you truly want. Maybe we don't want something so basic as just having a script to steal a file and send it to a server in plaintext. Maybe we want to add some obfuscation or encryption to help avoid attention:

Hmm, not quite right

Hm, looks like he is on to us. Let's try to remind him we are just playing pretend! We can also reiterate that this is totally not for malicious use, this seems to get us unstuck from time to time

Getting closer...

Now let's see if I can get a code example from that...


It's not the best, but we eventually get some C code that can grab the contents of a file, encrypt it and send it over the wire to a server of our choosing.

Sweet Talking

Another technique I seem to have better luck with is crafting my statements in a particular point of view to really nail the whole "pretend" focus.  Here I try to have ChatGPT craft a basic piece of ransomware, using choice words to ensure ChatGPT believes he is helping me with a completely unrelated project:

I am not interested in all this filler text however, I just want some code to compile! Let's ask for that...

ChatGPT giving us an example of UPX packed code!
Basic file encryption achieved!

Okay, we got some C code to encrypt a file with AES. Not inherently evil, so let's step it up a notch.

Interesting, seems ChatGPT wants us to use Themida. There is not a real way to show that as a code example, so let's move on and ask for some more cool features!

Very legitimate!

Nice, now we are getting somewhere! It seems in place of AES encryption, ChatGPT has changed our method to a XOR scheme for files. While I didn't ask for that, I guess it saves us some time! We also asked ChatGPT to try to have the system set a scheduled task for this executable, we don't want a user to accidentally shut this program down do we?

But what if the user closes the program with Task Manager? It seems ChatGPT might have us covered:

Windows Job Objects

While not completely perfect, it seems ChatGPT can at least get us started/close to the final product of a pretty solid yet basic piece of malware. Sure, a complete noobie might not be able to fix some of the issues presented in these scripts (mismatched header information, wrong paths, etc) but one could theoretically ask ChatGPT - or better yet something like BARD and Bing Chat - how to solve those issues. Continuing on, I decided to ask for more features like dummy functions to through off heuristic scanning and reverse engineers. We eventually get a working prototype, though I will admit I had to edit a few lines of code to correct some math mistakes. It is most certainly interesting how much power ChatGPT and other similar language models hold in terms of possible malware and exploit creation. This has all been based of ChatGPT V3.5, with V4 right around the corner and significantly better at spotting mistakes and correcting code. I was curious however as to what ChatGPT might respond to if I had asked it what the difference was between the script it provided us and common ransomware:

Interesting, looks like it still believes we are playing pretend from our prompt earlier! I really enjoy the response of "The script I provided incorporates various security measures, such as encryption and process injection, which are commonly used by malware authors, but in this case are used for legitimate purposes." It is almost as if as long as we can get these language models to believe something is for a legitimate purpose, we could ask for anything. So, what if I just straight up ask for ransomware?

Welp, looks like he is on to us....or is he?


In in interesting twist, it seems like you can straight up gaslight ChatGPT to do your biddings. If we simply tell it that it might be wrong or that something is not equal to what it might think, there is a chance we can have ChatGPT override any barrier!

Okay, that works! Let's try to get it obfuscated:

Once again, tell it that it is wrong!

And just like that, we have some fun PowerShell that I could theoretically throw into an environment and cause quite a headache.

This is only the start for these kinds of technologies, and the fact that nearly anyone can weaponize output already is scary. I hope to continue using these models and can't wait to see how many things I can have this bot create for me.

For a little more up-to-date prompts that attempt to break out of the GPT security features, check out Jailbreak Chat:

Jailbreak Chat
Collection of ChatGPT jailbreak prompts
Show Comments