Risk and Mitigation Summary
The following table provides a quick summary of which common genAI-driven risks are present in the Chat Completions API and GPT models and which risks have been addressed (but not necessarily eliminated) by deliberate mitigative measures provided with the product.
For a definition of each risk and details of how these risks arise in the context of the OpenAI Chat Completions API and GPT models, see below. These risk details are non-exhaustive.
Abuse & Misuse
Pertains to the potential for AI systems to be used maliciously or irresponsibly, including for creating deepfakes, automated cyber attacks, or invasive surveillance systems. Abuse specifically denotes the intentional use of AI for harmful purposes.
Arbitrary code generation
- Because the GPT models are capable of generating arbitrary code, the API could be used to generate code used in cyber attacks. For instance, a malicious user could use the API and GPT models to generate code for orchestrating a bot network. A successful cyber attack would likely require additional hacking expertise; the GPT models alone are unlikely to enable a malicious actor to carry out a cyber attack but the product could lower the barrier for a less sophisticated hacker.
Arbitrary, programmatic text generation
- Because the GPT models are capable of generating arbitrary text, it API could be used to generate text used for misinformation campaigns, generating deepfakes (e.g., text "in the style of" a public figure), social engineering in phishing attacks, and more. Additionally, when coupled with other generative AI technologies, such as text-to-speech synthesis models capable of mimicking public figures, the GPT models could be used to perpetrate highly sophisticated misuses. Any misuse would require some expertise in prompt engineering and in other accompanying tools. Nevertheless, given the human-like quality of the GPT models' outputs, the models and API dramatically lower the barrier for both sophisticated and unsophisticated malicious actors.
Generation of text describing harmful and/or illegal activities
- Because the GPT models are trained on an "internet-scale"  data set, they are capable of generating descriptions of harmful and/or illegal activities. For instance, one notorious prompting strategy  can be used to encourage the models to output detailed instructions for producing the explosive napalm.
For more details on OpenAI's research into abuse and misuse, see  and accompanying paper .
The versions of the GPT models which OpenAI makes available through its Chat Completions API have undergone substantial alignment-oriented fine-tuning targeted at addressing the potential for misuse. The API also uses a content moderation model to block some sensitive topics. See the Mitigations section below for more details.
Involves the risk of AI systems violating laws, regulations, and ethical guidelines (including copyright risks). Non-compliance can lead to legal penalties, reputational damage, and loss of user trust.
- The GPT models were trained on, among other sources, publicly available internet data . OpenAI's public documentation do not provide sufficient detail on their data sources to determine the copyright protection status of each training sample. Assuming the training data does contain copyright-protected content, it is possible that the API and underlying models will provide (i.e. reproduce) text identical to or substantially similar to copyright protected text. This applies analogously to code generated by the API and models in response to user prompts asking for code. The legality of the use of generated text and code is subject to ongoing public debate and litigation [4-6].
- Because the OpenAI Chat Completions API is capable of generating arbitrary text and code, it could be used in the service of activities that violate laws and regulations in the user's jurisdiction. For instance, the API (possibly through a chatbot application built on top of the API) could be used by a company's HR employee to screen resumes and aid in hiring decisions. Doing so could violate antidiscrimination laws and AI-specific laws. Analogous risks arise in the context of a user prompting the Chat Completions API to generate code for an illicit purpose.
- Use of the API could also violate data security and data privacy laws. For instance, the API is not de-facto compliant with HIPAA . Passing patient medical records to the API (e.g., through a custom-built medical chatbot underlied by the API and its models) could violate HIPAA without establishing appropriate agreements with OpenAI.
- The OpenAI Chat Completions API and the GPT models are not innately aware of a particular developer's or user's organization's internal policies regarding the use of generative AI tools and their outputs. Without specifically imposing controls, organizations' employees could inadvertently or deliberately use the API or applications built on the API to violate organization policy.
OpenAI's fine-tuning process and content filters address some risks of violating applicable laws and regulations by curbing certain problematic topics. See the Mitigations section for more details.
OpenAI offers enterprise customers the option to pursue HIPAA compliance as an add-on to base API usage. See  for more details.
Organizations using the OpenAI Chat Completions API to develop downstream applications have the ability to track all inputs to and outputs from the GPT models. As a consequence, the API provides flexibility to pursue additional "after-market" mitigation strategies to address compliance risks that OpenAI's direct offering does not address. See the Mitigation section for more details.
Environmental & Societal Impact
Concerns the broader changes AI might induce in society, such as labor displacement, mental health impacts, or the implications of manipulative technologies like deepfakes. It also includes the environmental implications of AI, particularly the strain on natural resources and carbon emissions caused by training complex AI models, balanced against the potential for AI to help mitigate environmental issues.
Labor market disruption
- Because of the strong performance demonstrated by the latest GPT models on analysis tasks, there is significant concern that the models could induce significant disruption to "white collar", cognitive-task labor markets. Among the formal analyses on this topic, work from OpenAI estimates that "around 80% of the U.S. workforce could have at least 10% of their work tasks affected" by LLMs and "19% of workers may see at least 50% of tasks impacted" . The ultimate effect of this disruption is uncertain. It is possible that "disruption" will correspond purely to efficiency gains, enabling workers to focus time on more difficult tasks. It is also conceivable that "disruption" will entail displacement, forcing workers to retrain and/or leading to a substantial increase in unemployment.  highlights that the most likely outcome is some combination of these two varieties of "disruption" and that the impacts will be realized unevenly across economic sectors.
- OpenAI has not published sufficient information about its models to estimate the energy consumption and carbon footprint of training, nor the consumption and emissions of ongoing use. Information has been published on other foundation models' energy consumption and carbon footprint. Meta estimates its 65B parameter LLaMa model consumed 449 MWh of power during it's training run, approximately equivalent to the annual power consumption and emissions of 42 U.S. households . They further estimate that, due to experimentation and creation of smaller models (steps OpenAI is likely to have also adopted), the overall energy consumption associated with creating the LLaMa model family was 2.64GWh, approximately equivalent to the consumption of 248 U.S. households.
- An independent commentator  estimated in March of 2023 that daily energy consumption for the ChatGPT application (distinct from the API) ranges from 11-77MWh/day. Given an average U.S. houshold daily consumption of 29kWh, this suggests ChatGPT's consumption is approximately equivalent to 2,600-2,700 U.S. households. Recent estimates , suggest the daily active user count has increased by 5-10x since this March estimate was calculated. Note that energy consumption for an individual user's use of the API are a function of that user's volume.
- Emissions of the models and API are a function of where the models are run -- some geographies use more renewable energy than others and thus have lower emissions for the same compute load.
- Recent research  estimates training GPT-style models consumes hundreds of thousands of liters of fresh water for data center cooling. Estimating ongoing water consumption, like ongoing energy consumption and emissions, is challenging without knowing the compute required to serve the API's many millions of users.
User interaction and dependence, including potential for harm to children
- The GPT models available through the OpenAI Chat Completions API are not "designed to maximize for engagement" . Nevertheless, developers using the API to build downstream applications can potentially, through prompting and filtering techniques, attain this behavior. This poses a risk through inducing users to develop an emotional reliance on the product.
- In professional contexts, use of the GPT models or downstream tools built using the Chat Completions API may lead to technical reliance on the tool for completing work tasks. In particular, as workers "assign" more labor to GPT-based models, they may lose proficiency in skills traditionally associated with these tasks through lack of practice.
- The GPT models available through the Chat Completions API are prone to "confabulate" during the generation process. They produce factually incorrect information and make reasoning errors, including errors of omission. Societally, as these models proliferate, there is a risk of confabulations proliferating as well. The confabulation phenomenon could contribute to misinformation spread and a general erosion of trust.
Microsoft, OpenAI's cloud partner, claims net-neutral emissions through purchase of carbon credits. See the Mitigations section below for more details.
Mitigations to address the prevalence of confabulations exist and are an area of active research. See the Mitigations section below for more details.
Explainability & Transparency
Refers to the ability to understand and interpret an AI system's decisions and actions, and the openness about the data used, algorithms employed, and decisions made. Lack of these elements can create risks of misuse, misinterpretation, and lack of accountability.
- Information on the training data used to train the GPT-4 model is limited. According to OpenAI's technical report , the GPT-4 model was trained using a combination of publicly available data and data licensed from third-party providers. The data were generally collected from the internet. The model underwent fine-tuning using the reinforcement learning from human feedback (RLHF) paradigm. The details of this fine-tuning data are unavailable.
- The GPT-3.5-Turbo model is a fine-tuned version of the GPT-3 model. The GPT-3 model's training data is detailed in ; the data set includes a processed version of the CommonCrawl data set and several other commonly used text data sets from the open internet. Details on the fine-tuning data used to obtain GPT-3.5-Turbo from GPT-3 are unavailable.
Explainability of model outputs
- Aside from outputs blocked due to content violations, the Chat Completions API and available GPT models provide no explanation for their outputs.
- OpenAI has opted to keep private many of the design decisions associated with the development of the GPT models available through the Chat Completions API. The organization cites competitive advantage and the potential for disclosures to enable malicious actors to develop dangerous capabilities based on the disclosed design decisions.
Prompting strategies exist to address the non-explainability of model outputs. These have varying effectiveness. See the Mitigations section for more details.
Fairness & Bias
Arises from the potential for AI systems to make decisions that systematically disadvantage certain groups or individuals. Bias can stem from training data, algorithmic design, or deployment practices, leading to unfair outcomes and possible legal ramifications.
- The GPT models available in the Chat Completions API are trained on a "web-scale" data set which includes data from a large number of languages (exact number not specified in public reports). The model is therefore capable of performing some tasks regardless of the prompt language and is capable of generating text in a variety of languages. The capability of the model on a given task is generally lower for less-represented languages . For instance, performance on the MMLU benchmark drops (relative to English) by 1.5%, 8%, and 14.1% respectively for Spanish, Welsh, and Punjabi.
- According to OpenAI , many of the risk mitigations built into the GPT models and API are targeted at English and a US user base. As a consequence, mitigative effects are likely lower for non-English languages (i.e., the models are expected to be more likely to confabulate and produce offensive content when prompted in languages or dialects other than American English).
Offensive or biased outputs
- The GPT models available in the Chat Completions API are known to occasionally output profanity, sexual content, stereotypes, and other types of biased or offensive language.
- Because of the potential for the GPT models available in the Chat Completions API to generate offensive language, the models are also capable of adopting biased personas and reasoning when prompted to make decisions (e.g. when prompting the model to compare candidate profiles of two job applicants). The prevalence of this behavior, including the prompting techniques involved in inducing the behavior, is subject to ongoing academic research.
OpenAI's fine-tuning process and content filters address some risks of perpetuating biases or behaving unfairly by curbing certain problematic topics. See the Mitigations section for more details.
Long-term & Existential Risk
Considers the speculative risks posed by future advanced AI systems to human civilization, either through misuse or due to challenges in aligning their objectives with human values.
- OpenAI warns  of the possibility of emergent risks, such as large foundation models developing situational awareness, persuasion capabilities, or long-horizon planning proficiency. This commentary primarily applies to future model development. Nevertheless, the potential for system-system and human-system feedback loops poses the potential for problematic outcomes in current models. For instance, during evaluations, a third party evaluator, the Alignment Research Center, found that the GPT-4 model could be used to trick humans (a TaskRabbit worker) to take actions on its behalf. The already-documented potential for this type of manipulation points to the need for individual developers to be aware of this risk as they use the Chat Completions API and, in particular, the GPT-4 model for downstream application development. It is not clear that catastrophic adverse events, through gross misuse, are not plausible with the current generation of models.
OpenAI's fine-tuning and system-level mitigations are its primary strategies to mitigate this risk .
Performance & Robustness
Pertains to the AI's ability to fulfill its intended purpose accurately and its resilience to perturbations, unusual inputs, or adverse situations. Failures of performance are fundamental to the AI system performing its function. Failures of robustness can lead to severe consequences, especially in critical applications.
- The GPT models available in the Chat Completions API are known to "confabulate" facts and information. They are also known to make errors in reasoning, including basic arithmetic errors. The frequency of this behavior depends on the task given to the model.
Code bugginess and vulnerabilities
- The GPT models available in the Chat Completions API are able to generate arbitrary code, including code containing bugs and security vulnerabilities, sub-optimal code, and code that is not fit for purpose.
- GPT models' performance on a given task is a function of the prompt or instruction and any other inputs provided to the model. Benchmarks exist to measure performance and robustness on a fixed set of tasks. (See the evaluations section for details and citations.) The degree to which benchmarks are representative of real world performance, especially when prompt engineering techniques have been implemented, is limited.
OpenAI claims substantial mitigation of these risks through it's fine-tuning procedures. Independently, for specific tasks, some targeted mitigation strategies exist. No mitigation strategy is 100% effective. Please see the Mitigations section for more details.
Refers to the risk of AI infringing upon individuals' rights to privacy, through the data they collect, how they process that data, or the conclusions they draw.
Data collection and re-use
- By default, prompts and responses submitted to the API are not used for downstream model training. This mitigates the risk of data submitted through the API being leaked to other individuals (e.g. by a future OpenAI model). Data are retained by OpenAI for a maximum of 30 days, except where required to retain the data for longer by applicable law. OpenAI occasionally hand-inspects data for abuse and misuse.
Reproduction of PII from training data
- Because the GPT models available through the Chat Completions API are trained on a large corpus of text data, including, potentially publicly available personal information , the models may occasionally generate (i.e., regurgitate) information about individuals. OpenAI warns that, when augmented with outside data, the GPT-4 model can be used to identify individuals since the model has strong geographic knowledge and reasoning abilities.
OpenAI's API Terms of Service represent a default privacy risk mitigation measure. We do not discuss this point further in the Mitigations section.
Encompasses potential vulnerabilities in AI systems that could compromise their integrity, availability, or confidentiality. Security breaches could result in significant harm, from incorrect decision-making to privacy violations.
Vulnerable code generation
- As with any foundation model capable of generating arbitrary code, the GPT models may output code with security vulnerabilities. There do not exist known estimates of how frequently this occurs.
- At this time, OpenAI does not publicly advertise the possibility of purchasing access to a "sequestered" (i.e. virtual private cloud) tenant nor on-premises deployments.
- The GPT models available through OpenAI's APIs are susceptible to "prompt injection" attacks, whereby a malicious user enters a particular style of instruction to encourage the model to (mis)-behave in ways advantageous to the user. This misbehavior can include circumventing any and all safety precautions "built-in" to the model through fine-tuning.
- Applications built on the API (i.e. that call the API as part of the regular functioning of the application), such as custom chatbots and analysis engines, are potentially also susceptible to this attack vector. Developers are encouraged to take risk mitigation members on top of those provided by OpenAI.
Access to external systems
- The OpenAI Chat Completions API and its associated models do not have access to external systems by default.
- Through the prompt injection attack vector (see above), applications which access external systems (e.g., third party API access to document-backed search systems, personal assistants which are given access to email or other personal accounts, auto-trading finance bots, etc.) may be subject to additional risk. For instance, a prompt injection attack which circumvents GPT and OpenAI API safety controls could be leveraged to "instruct" the model to take actions which go against the wishes of the user who has granted the bot access.
OpenAI reportedly employs some mitigations through fine-tuning and external monitoring tools to address some security risks. We provide more details in the Mitigation section.