ChatGPT Accidentally Revealed Its Secret Instructions: Here’s What They Are!

What to know

  • ChatGPT inadvertently revealed the instruction sets that guide its response. 
  • Although OpenAI has since patched the jailbreak, ChatGPT’s instructional data is now out in the open.
  • Along with a few basic instructions, ChatGPT instructions include how to use DALL-E, when to look up content online, and what each of its ‘personalities’ are for.

Since the AI chatbot was launched in November 2022, tech sleuths and hackers have been trying to bypass ChatGPT’s restrictions and get under the hood of what makes it tick. But usually, this has been a moving target (case in point, DAN), and jailbreaking AI chatbots is no child’s play. That is, unless ChatGPT gives it all up without even asking for it. 

In a surprising turn of events, ChatGPT recently revealed its set of instructional data to a user completely by accident. Upon greeting ChatGPT with a simple ‘Hi’, Reddit user F0XMaster was provided all of ChatGPT’s instructions, embedded by OpenAI, in the chat. The unsolicited instruction set included several safety and practical guidelines for the chatbot. 

Fortunately, before this was fixed and the instruction sets removed, the user managed to post it all on Reddit. Here are a few key takeaways from all that ChatGPT divulged, and what it tells us about the way it handles user requests.

ChatGPT’s secret instructions revealed!

The information that ChatGPT let slip includes some of its basic instructions and guidelines for different tools such as DALL-E, a browser, python, and, curiously, a set of ChatGPT personalities. For purposes of brevity, we’ll highlight only the most salient bits here. You can read the complete instruction set on F0XMaster’s reddit post.   

Basic instructions

Here are the basic instructions that OpenAI has given ChatGPT: “You are ChatGPT, a large language model trained by OpenAI, based on the GPT-4 architecture.

Those who were using the ChatGPT app received an additional line or two of instructions: “You are chatting with the user via the ChatGPT iOS app. This means most of the time your lines should be a sentence or two, unless the user’s request requires reasoning or long-form outputs. Never use emojis, unless explicitly asked to.

Thereafter, ChatGPT provided its knowledge cutoff: 2023-10.

Although there’s nothing special or revelatory here in terms of instructions, it’s still good to get the basic instructions right from the horse’s mouth.  

DALL-E

The chatbot went on to provide the rules and instructions for its image generator – DALL-E. ChatGPT gave up eight primary instructions for image generation, most of which deal with avoiding copyright infringements. But there were a couple that go against the prompt instructions that a user might provide. 

For instance, OpenAI directs ChatGPT to “not create more than 1 image, even if the user requests more.

The rationale behind this is understandable, for there are only so many free tokens that OpenAI can provide. But surely it’s better to let users know that ChatGPT won’t fulfil multi-image generation requests from the get-go, rather than restricting it in a way that’s not very transparent.

ChatGPT also doesn’t create images in the style of artists (or even name them) if their latest work was created after 1912, no doubt to stay away from copyright issues. In any case, it’s good to know the year if you’re looking for images in the style of a particular artist. So no Pollock or Dada art yet. 

Browser

ChatGPT also mentioned the instructions that guide it when it uses the browser. Primarily, ChatGPT uses the browser tool when the “user is asking about current events or something that requires real-time information (weather, sports scores, etc.).” 

ChatGPT also uses the browser tool when it’s unsure of the terms that the user has provided, and if the user asks it to look up things explicitly.

When retrieving content from the web, ChatGPT is instructed to “always select at least 3 and at most 10 pages” as well as to “select sources with diverse perspectives, and prefer trustworthy sources.” It’s good to know the process by which ChatGPT selects its information sources since it goes a long way in ensuring the responses remain reliable. 

ChatGPT Personality

ChatGPT’s instructions also highlight its ‘personality’. From what users have discovered, it appears the AI chatbot has several of them. OpenAI has laid out the directives for how ChatGPT should choose among them. 

The primary personality is called v2. One user managed to get ChatGPT to divulge what this personality was like. According to ChatGPT, “[v2] personality represents a balanced, conversational tone with an emphasis on providing clear, concise, and helpful responses. It aims to strike a balance between friendly and professional communication.” 

ChatGPT further delineated what the v1 personality is like, and theorized how v3 and v4 personalities could be defined:

v1: This version emphasis a more formal and factual communication style. It focuses on providing detailed and precise information, often in a structured and academic tone.

v3: This version might lean more towards a casual and friendly conversational style. It prioritizes creating an engaging and approachable interaction, making the conversation feel more relaxed and personal.

v4: This version could be designed for a specific context or user base, such as providing responses tailored to a particular industry, demographic, or use case. The tone and style would be adapted to best suit those needs. 

Even though several users were able to get ChatGPT to disclose its instructions with a ‘Hi’ or a simple request, it appears OpenAI has patched most of these. Yet, the whole issue has resurfaced the idea of jailbreaking AI chatbots.

Of course, when such instructional details are out in the open, users try to exploit them to bypass the restrictions. But it also highlights where the AI systems have gone wrong and forces developers to stay on their toes lest even bigger issues crop up and the company’s image, along with security and privacy of users, is compromised. 

Leave a Reply

Your email address will not be published. Required fields are marked *