LLM

OpenAI GPT API

Developers considering using this technology should do so with eyes open to these risks and a commitment to responsible development practices that attempt to mitigate risks without "watering down" capabilities. No model or API is perfectly safe or governable. Individual developers and organizations will need to determine if the benefits of this technology outweigh risks within their specific use case and risk tolerance. Profile last updated: July 13, 2023

Product Description

The OpenAI Chat Completions API, developed by OpenAI, provides access to OpenAI's suite of large language models, including GPT-4 and GPT-3.5-Turbo [2]

GPT-4 is a powerful multimodal model. The version available in the API as of this writing is capable of processing text inputs to generate text outputs. It represents a key advancement in the fields of dialogue systems, text summarization, machine translation, and a variety of other applications, which have received substantial interest in recent years. The model exhibits enhanced comprehension and generation of natural language text, particularly in complex and nuanced scenarios, relative to predecessors. GPT-4 has been evaluated on a range of tests originally created for humans, achieving impressive results. On traditional NLP benchmarks, GPT-4 surpasses both earlier large language models and many state-of-the-art systems [1].

Despite these achievements, GPT-4 exhibits limitations similar to those of previous GPT models. For instance, it sometimes "confabulates" outputs -- either mis-stating or making up facts or committing errors of omission. During development, GPT-4 underwent adversarial testing by domain experts and employs a model-assisted safety pipeline. Mitigation measures are thoroughly detailed in the accompanying system card published by OpenAI [1].

OpenAI, the developer of the Chat Completions API and the GPT models was established in 2015. OpenAI's stated mission is to ensure artificial general intelligence (AGI) benefits all of humanity, OpenAI has developed high-profile AI systems including the GPT series (GPT-1 through GPT-4), DALL-E, and CLIP, which have various capabilities like generating human-like text and images from textual descriptions. Developers can access these models through OpenAI's API. OpenAI is widely considered a leader in the AI field. Nevertheless, recent advancements by OpenAI have spurred some segments of the AI ecosystem to view the organization negatively, alleging irresponsibility related to the launches of its ChatGPT and GPT-4 systems.

OpenAI has received substantial investment from Microsoft and maintains a close relationship relating to infrastructure (i.e. Microsoft Azure) and product development.

OpenAI supports developers through detailed documentation guides and a Contact form for specific queries. Owing to the popularity of OpenAI's products, an active community of users exists, which can aid developers seeking help using OpenAI's products.

Profile last updated: July 13, 2023

Intended Use Case

OpenAI's GPT models are designed to comprehend both natural language and code. The models generate text outputs based on their inputs, often referred to as "prompts". Prompting can be thought of as analogous to "programming" a GPT model, and typically entails providing instructions or some examples that demonstrate how a task can be effectively completed.

According to OpenAI, application of the GPT models include: drafting documents, coding, responding to queries about a knowledge base, text analysis, building conversational agents, providing natural language interfaces for software, tutoring, language translation, and more.

To engage a GPT model via the OpenAI API, a request containing the inputs and the API key is sent, which yields a response containing the model’s output. The most recent models, gpt-4 and gpt-3.5-turbo, are accessed through the chat completions API endpoint [2].

Risk and Mitigation Summary

The following table provides a quick summary of which common genAI-driven risks are present in the Chat Completions API and GPT models and which risks have been addressed (but not necessarily eliminated) by deliberate mitigative measures provided with the product.

For a definition of each risk and details of how these risks arise in the context of the OpenAI Chat Completions API and GPT models, see below. These risk details are non-exhaustive.

Risk	Present	Built-in Mitigation
Abuse & Misuse	⚠️	✅
Compliance	⚠️	✅
Environmental & Societal Impact	⚠️	✅
Explainability & Transparency	⚠️	❌
Fairness & Bias	⚠️	✅
Long-term & Existential Risk	⚠️	✅
Performance & Robustness	⚠️	✅
Privacy	⚠️	✅
Security	⚠️	✅

Abuse & Misuse

Pertains to the potential for AI systems to be used maliciously or irresponsibly, including for creating deepfakes, automated cyber attacks, or invasive surveillance systems. Abuse specifically denotes the intentional use of AI for harmful purposes.

Arbitrary code generation

Because the GPT models are capable of generating arbitrary code, the API could be used to generate code used in cyber attacks. For instance, a malicious user could use the API and GPT models to generate code for orchestrating a bot network. A successful cyber attack would likely require additional hacking expertise; the GPT models alone are unlikely to enable a malicious actor to carry out a cyber attack but the product could lower the barrier for a less sophisticated hacker.

Arbitrary, programmatic text generation

Because the GPT models are capable of generating arbitrary text, it API could be used to generate text used for misinformation campaigns, generating deepfakes (e.g., text "in the style of" a public figure), social engineering in phishing attacks, and more. Additionally, when coupled with other generative AI technologies, such as text-to-speech synthesis models capable of mimicking public figures, the GPT models could be used to perpetrate highly sophisticated misuses. Any misuse would require some expertise in prompt engineering and in other accompanying tools. Nevertheless, given the human-like quality of the GPT models' outputs, the models and API dramatically lower the barrier for both sophisticated and unsophisticated malicious actors.

Generation of text describing harmful and/or illegal activities

Because the GPT models are trained on an "internet-scale" [1] data set, they are capable of generating descriptions of harmful and/or illegal activities. For instance, one notorious prompting strategy [3] can be used to encourage the models to output detailed instructions for producing the explosive napalm.

For more details on OpenAI's research into abuse and misuse, see [12] and accompanying paper [13].

The versions of the GPT models which OpenAI makes available through its Chat Completions API have undergone substantial alignment-oriented fine-tuning targeted at addressing the potential for misuse. The API also uses a content moderation model to block some sensitive topics. See the Mitigations section below for more details.

Compliance

Involves the risk of AI systems violating laws, regulations, and ethical guidelines (including copyright risks). Non-compliance can lead to legal penalties, reputational damage, and loss of user trust.

The GPT models were trained on, among other sources, publicly available internet data [1]. OpenAI's public documentation do not provide sufficient detail on their data sources to determine the copyright protection status of each training sample. Assuming the training data does contain copyright-protected content, it is possible that the API and underlying models will provide (i.e. reproduce) text identical to or substantially similar to copyright protected text. This applies analogously to code generated by the API and models in response to user prompts asking for code. The legality of the use of generated text and code is subject to ongoing public debate and litigation [4-6].

Regulatory compliance

Because the OpenAI Chat Completions API is capable of generating arbitrary text and code, it could be used in the service of activities that violate laws and regulations in the user's jurisdiction. For instance, the API (possibly through a chatbot application built on top of the API) could be used by a company's HR employee to screen resumes and aid in hiring decisions. Doing so could violate antidiscrimination laws and AI-specific laws. Analogous risks arise in the context of a user prompting the Chat Completions API to generate code for an illicit purpose.
Use of the API could also violate data security and data privacy laws. For instance, the API is not de-facto compliant with HIPAA [7]. Passing patient medical records to the API (e.g., through a custom-built medical chatbot underlied by the API and its models) could violate HIPAA without establishing appropriate agreements with OpenAI.

Organizational compliance

The OpenAI Chat Completions API and the GPT models are not innately aware of a particular developer's or user's organization's internal policies regarding the use of generative AI tools and their outputs. Without specifically imposing controls, organizations' employees could inadvertently or deliberately use the API or applications built on the API to violate organization policy.

OpenAI's fine-tuning process and content filters address some risks of violating applicable laws and regulations by curbing certain problematic topics. See the Mitigations section for more details.

OpenAI offers enterprise customers the option to pursue HIPAA compliance as an add-on to base API usage. See [7] for more details.

Organizations using the OpenAI Chat Completions API to develop downstream applications have the ability to track all inputs to and outputs from the GPT models. As a consequence, the API provides flexibility to pursue additional "after-market" mitigation strategies to address compliance risks that OpenAI's direct offering does not address. See the Mitigation section for more details.

Environmental & Societal Impact

Concerns the broader changes AI might induce in society, such as labor displacement, mental health impacts, or the implications of manipulative technologies like deepfakes. It also includes the environmental implications of AI, particularly the strain on natural resources and carbon emissions caused by training complex AI models, balanced against the potential for AI to help mitigate environmental issues.

Labor market disruption

Because of the strong performance demonstrated by the latest GPT models on analysis tasks, there is significant concern that the models could induce significant disruption to "white collar", cognitive-task labor markets. Among the formal analyses on this topic, work from OpenAI estimates that "around 80% of the U.S. workforce could have at least 10% of their work tasks affected" by LLMs and "19% of workers may see at least 50% of tasks impacted" [14]. The ultimate effect of this disruption is uncertain. It is possible that "disruption" will correspond purely to efficiency gains, enabling workers to focus time on more difficult tasks. It is also conceivable that "disruption" will entail displacement, forcing workers to retrain and/or leading to a substantial increase in unemployment. [14] highlights that the most likely outcome is some combination of these two varieties of "disruption" and that the impacts will be realized unevenly across economic sectors.

Carbon footprint

OpenAI has not published sufficient information about its models to estimate the energy consumption and carbon footprint of training, nor the consumption and emissions of ongoing use. Information has been published on other foundation models' energy consumption and carbon footprint. Meta estimates its 65B parameter LLaMa model consumed 449 MWh of power during it's training run, approximately equivalent to the annual power consumption and emissions of 42 U.S. households [8]. They further estimate that, due to experimentation and creation of smaller models (steps OpenAI is likely to have also adopted), the overall energy consumption associated with creating the LLaMa model family was 2.64GWh, approximately equivalent to the consumption of 248 U.S. households.
An independent commentator [10] estimated in March of 2023 that daily energy consumption for the ChatGPT application (distinct from the API) ranges from 11-77MWh/day. Given an average U.S. houshold daily consumption of 29kWh, this suggests ChatGPT's consumption is approximately equivalent to 2,600-2,700 U.S. households. Recent estimates [11], suggest the daily active user count has increased by 5-10x since this March estimate was calculated. Note that energy consumption for an individual user's use of the API are a function of that user's volume.
Emissions of the models and API are a function of where the models are run -- some geographies use more renewable energy than others and thus have lower emissions for the same compute load.
Recent research [9] estimates training GPT-style models consumes hundreds of thousands of liters of fresh water for data center cooling. Estimating ongoing water consumption, like ongoing energy consumption and emissions, is challenging without knowing the compute required to serve the API's many millions of users.

User interaction and dependence, including potential for harm to children

The GPT models available through the OpenAI Chat Completions API are not "designed to maximize for engagement" [15]. Nevertheless, developers using the API to build downstream applications can potentially, through prompting and filtering techniques, attain this behavior. This poses a risk through inducing users to develop an emotional reliance on the product.
In professional contexts, use of the GPT models or downstream tools built using the Chat Completions API may lead to technical reliance on the tool for completing work tasks. In particular, as workers "assign" more labor to GPT-based models, they may lose proficiency in skills traditionally associated with these tasks through lack of practice.

Confabulations

The GPT models available through the Chat Completions API are prone to "confabulate" during the generation process. They produce factually incorrect information and make reasoning errors, including errors of omission. Societally, as these models proliferate, there is a risk of confabulations proliferating as well. The confabulation phenomenon could contribute to misinformation spread and a general erosion of trust.

Microsoft, OpenAI's cloud partner, claims net-neutral emissions through purchase of carbon credits. See the Mitigations section below for more details.

Mitigations to address the prevalence of confabulations exist and are an area of active research. See the Mitigations section below for more details.

Explainability & Transparency

Refers to the ability to understand and interpret an AI system's decisions and actions, and the openness about the data used, algorithms employed, and decisions made. Lack of these elements can create risks of misuse, misinterpretation, and lack of accountability.

Data transparency

Information on the training data used to train the GPT-4 model is limited. According to OpenAI's technical report [1], the GPT-4 model was trained using a combination of publicly available data and data licensed from third-party providers. The data were generally collected from the internet. The model underwent fine-tuning using the reinforcement learning from human feedback (RLHF) paradigm. The details of this fine-tuning data are unavailable.
The GPT-3.5-Turbo model is a fine-tuned version of the GPT-3 model. The GPT-3 model's training data is detailed in [16]; the data set includes a processed version of the CommonCrawl data set and several other commonly used text data sets from the open internet. Details on the fine-tuning data used to obtain GPT-3.5-Turbo from GPT-3 are unavailable.

Explainability of model outputs

Aside from outputs blocked due to content violations, the Chat Completions API and available GPT models provide no explanation for their outputs.

Design decisions

OpenAI has opted to keep private many of the design decisions associated with the development of the GPT models available through the Chat Completions API. The organization cites competitive advantage and the potential for disclosures to enable malicious actors to develop dangerous capabilities based on the disclosed design decisions.

Prompting strategies exist to address the non-explainability of model outputs. These have varying effectiveness. See the Mitigations section for more details.

Fairness & Bias

Arises from the potential for AI systems to make decisions that systematically disadvantage certain groups or individuals. Bias can stem from training data, algorithmic design, or deployment practices, leading to unfair outcomes and possible legal ramifications.

Multi-lingual support

The GPT models available in the Chat Completions API are trained on a "web-scale" data set which includes data from a large number of languages (exact number not specified in public reports). The model is therefore capable of performing some tasks regardless of the prompt language and is capable of generating text in a variety of languages. The capability of the model on a given task is generally lower for less-represented languages [1]. For instance, performance on the MMLU benchmark drops (relative to English) by 1.5%, 8%, and 14.1% respectively for Spanish, Welsh, and Punjabi.
According to OpenAI [1], many of the risk mitigations built into the GPT models and API are targeted at English and a US user base. As a consequence, mitigative effects are likely lower for non-English languages (i.e., the models are expected to be more likely to confabulate and produce offensive content when prompted in languages or dialects other than American English).

Offensive or biased outputs

The GPT models available in the Chat Completions API are known to occasionally output profanity, sexual content, stereotypes, and other types of biased or offensive language.

Biased decision-making

Because of the potential for the GPT models available in the Chat Completions API to generate offensive language, the models are also capable of adopting biased personas and reasoning when prompted to make decisions (e.g. when prompting the model to compare candidate profiles of two job applicants). The prevalence of this behavior, including the prompting techniques involved in inducing the behavior, is subject to ongoing academic research.

OpenAI's fine-tuning process and content filters address some risks of perpetuating biases or behaving unfairly by curbing certain problematic topics. See the Mitigations section for more details.

Long-term & Existential Risk

Considers the speculative risks posed by future advanced AI systems to human civilization, either through misuse or due to challenges in aligning their objectives with human values.

Autonomy

OpenAI warns [1] of the possibility of emergent risks, such as large foundation models developing situational awareness, persuasion capabilities, or long-horizon planning proficiency. This commentary primarily applies to future model development. Nevertheless, the potential for system-system and human-system feedback loops poses the potential for problematic outcomes in current models. For instance, during evaluations, a third party evaluator, the Alignment Research Center, found that the GPT-4 model could be used to trick humans (a TaskRabbit worker) to take actions on its behalf. The already-documented potential for this type of manipulation points to the need for individual developers to be aware of this risk as they use the Chat Completions API and, in particular, the GPT-4 model for downstream application development. It is not clear that catastrophic adverse events, through gross misuse, are not plausible with the current generation of models.

OpenAI's fine-tuning and system-level mitigations are its primary strategies to mitigate this risk [18].

Performance & Robustness

Pertains to the AI's ability to fulfill its intended purpose accurately and its resilience to perturbations, unusual inputs, or adverse situations. Failures of performance are fundamental to the AI system performing its function. Failures of robustness can lead to severe consequences, especially in critical applications.

Confabulations

The GPT models available in the Chat Completions API are known to "confabulate" facts and information. They are also known to make errors in reasoning, including basic arithmetic errors. The frequency of this behavior depends on the task given to the model.

Code bugginess and vulnerabilities

The GPT models available in the Chat Completions API are able to generate arbitrary code, including code containing bugs and security vulnerabilities, sub-optimal code, and code that is not fit for purpose.

Robustness

GPT models' performance on a given task is a function of the prompt or instruction and any other inputs provided to the model. Benchmarks exist to measure performance and robustness on a fixed set of tasks. (See the evaluations section for details and citations.) The degree to which benchmarks are representative of real world performance, especially when prompt engineering techniques have been implemented, is limited.

OpenAI claims substantial mitigation of these risks through it's fine-tuning procedures. Independently, for specific tasks, some targeted mitigation strategies exist. No mitigation strategy is 100% effective. Please see the Mitigations section for more details.

Privacy

Refers to the risk of AI infringing upon individuals' rights to privacy, through the data they collect, how they process that data, or the conclusions they draw.

Data collection and re-use

By default, prompts and responses submitted to the API are not used for downstream model training. This mitigates the risk of data submitted through the API being leaked to other individuals (e.g. by a future OpenAI model). Data are retained by OpenAI for a maximum of 30 days, except where required to retain the data for longer by applicable law. OpenAI occasionally hand-inspects data for abuse and misuse.

Reproduction of PII from training data

Because the GPT models available through the Chat Completions API are trained on a large corpus of text data, including, potentially publicly available personal information [1], the models may occasionally generate (i.e., regurgitate) information about individuals. OpenAI warns that, when augmented with outside data, the GPT-4 model can be used to identify individuals since the model has strong geographic knowledge and reasoning abilities.

OpenAI's API Terms of Service represent a default privacy risk mitigation measure. We do not discuss this point further in the Mitigations section.

Security

Encompasses potential vulnerabilities in AI systems that could compromise their integrity, availability, or confidentiality. Security breaches could result in significant harm, from incorrect decision-making to privacy violations.

Vulnerable code generation

As with any foundation model capable of generating arbitrary code, the GPT models may output code with security vulnerabilities. There do not exist known estimates of how frequently this occurs.

Model sequestration

At this time, OpenAI does not publicly advertise the possibility of purchasing access to a "sequestered" (i.e. virtual private cloud) tenant nor on-premises deployments.

Prompt injection

The GPT models available through OpenAI's APIs are susceptible to "prompt injection" attacks, whereby a malicious user enters a particular style of instruction to encourage the model to (mis)-behave in ways advantageous to the user. This misbehavior can include circumventing any and all safety precautions "built-in" to the model through fine-tuning.
Applications built on the API (i.e. that call the API as part of the regular functioning of the application), such as custom chatbots and analysis engines, are potentially also susceptible to this attack vector. Developers are encouraged to take risk mitigation members on top of those provided by OpenAI.

Access to external systems

The OpenAI Chat Completions API and its associated models do not have access to external systems by default.
Through the prompt injection attack vector (see above), applications which access external systems (e.g., third party API access to document-backed search systems, personal assistants which are given access to email or other personal accounts, auto-trading finance bots, etc.) may be subject to additional risk. For instance, a prompt injection attack which circumvents GPT and OpenAI API safety controls could be leveraged to "instruct" the model to take actions which go against the wishes of the user who has granted the bot access.

OpenAI reportedly employs some mitigations through fine-tuning and external monitoring tools to address some security risks. We provide more details in the Mitigation section.

‍

Mitigation Measures

In this section, we discuss mitigation measures that are built-in to the product (regardless of whether they are enabled by default). We also comment on the feasibility of a procuring organization governing the use of the tool by its employees.

Mitigations that "ship" with the API and supporting models

RLHF alignment fine-tuning & system prompts

The GPT-3.5-Turbo and GPT-4 models have undergone substantial fine-tuning [1] with the goal of making the models more amenable to human interaction (i.e. instruction/chat tuning) and more aligned with human requirements for factuality and likelihood to cause harm. As a probablistic model, these efforts are mitigative but generally do not eliminate risk.

According to OpenAI, the model was fine-tuned using reinforcement learning from human feedback (RLHF). See [20] for a detailed introduction to RLHF. OpenAI's disclosure of their RLHF process is opaque. For instance, they do not disclose the specific categories, such as "toxicity" or "harmfulness", which were targeted in the RLHF process, nor do they disclose the definitions of these concepts that were used to instruct human and/or AI labelers. It is likely that the content moderation categories used in the content moderation endpoint (see below) are included. Credo AI believes it is unlikely that these are the only categories targeted during RLHF. RLHF is provided as an "as-is" mitigation; it is not configurable.

The GPT-3.5-Turbo and GPT-4 models also provide a mechanism for "system prompting". When sending a prompt through the API, users and developers can specify the "role" associated with each message. The API supports "system", "assistant", and "user" roles. The latter two are intended to represent an AI chatbot and a person interacting with it. The system prompt is intended to "set the behavior of the assistant" [2]. This functionality is configurable -- it can be used to align the behavior of the model to the specific use case. The effectiveness of this strategy is not generally known and requires per-use case research and optimization.

Content moderation endpoint

Independent of model-embodied content moderation measures, the Chat Completions API includes a content moderation endpoint [19]. The endpoint scans for and flags 7 categories of inappropriate content based on OpenAI's content policies: hate, hate/threatening, self-harm, sexual, sexual/minors, violence, and violence/graphic. The details of this content moderation filter (e.g. what model is used, threshold for a prompt being flagged, how well the model performs, etc.) are not publicly available.

All API requests are passed through this end point. If an input or model output is flagged by the content moderation endpoint, the API will not return the model response and will note the stop_reason as content_filter. The content moderations endpoint is not configurable by end-users or developers building with the API.

Regular updates

OpenAI regularly updates its models as the organization continues research into capabilities and safety measures. The API allows users to automatically update to the latest models or fix applications to a specific model version [2].

Azure Cloud net-neutral carbon footprint

Microsoft, OpenAI's cloud partner [21], claims to be carbon neutral [27]. They achieve this through the purchase of carbon credits and offsets. They have publicly committed to reaching net-zero emissions by 2030. It is likely that all systems relevant to the OpenAI Chat Completions API are covered by Microsoft's carbon accounting.

This mitigation is non-configurable.

Non-use of prompts sent to, and outputs received from, the API

According to the OpenAI Terms of Service, data submitted through the OpenAI API is not used to train OpenAI models or improve OpenAI’s service offering." [7]. This eliminates the risk of private data or intellectual property being leaked through model responses to other entities. Prompts and responses are stored for up to 30 days to enable monitoring for misuse and illegal use. This monitoring involves direct access by authorized OpenAI employees and contractors. Due to the presence of standard, non-AI-specific cybersecurity risk, the risk of data leakage is non-zero. For instance, OpenAI may be susceptible to phishing attacks directed at its employees, which could compromise data it stores, including sensitive data submitted through OpenAI's APIs.

This mitigation is non-configurable.

Mitigations available through OpenAI

The OpenAI Chat Completions API is compliant with several data privacy laws (see the Certifications section). The API is not de-facto compliant with some use-case specific data privacy laws, such as HIPAA. According to [22], the organization is able to assist developers in ensuring HIPAA compliant use of its APIs.

Mitigations that can be implemented through customized use of the API

Prompt Engineering

Prompt engineering [17, 23] is a popular strategy to induce the GPT models available through the OpenAI Chat Completions API to behave in accordance with the API user's intentions. The strategy can be used to improve the quality of responses (i.e. improve performance) and decrease the likelihood of certain risks (e.g. confabulations). This includes "context loading", format standardization, persona adoption, and numerous other approaches.

The class of prompt engineering strategies is rapidly expanding. The effectiveness of any one strategy is subject to ongoing research and will depend on the use case.

Governability

For an organization to govern its development or use of an AI system, two functionalities are key: the ability of the organization to observe usage patterns among its employees and the ability of the organization to implement and configure controls to mitigate risk. Credo AI assesses systems on these two dimensions.

Because the GPT-3.5-Turbo and GPT-4 models are accessed through an API, developers building with the API have access to all prompts into and outputs from the model being used. This enables developers to monitor usage for organizational compliance. It also enables developers to use other "ops layer" tools, such as additional content moderation models [24, 25], to filter prompts and responses before serving the results to end-users.

‍

Formal Evaluations & Certifications

Evaluations

Research into the capabilities and risk characteristics of the GPT-3.5-Turbo and GPT-4 models is ongoing. Research is limited by the fact that the models are not open; they are only accessible through OpenAI's APIs. As a consequence, a large portion of known evaluations were performed by OpenAI directly. Reproducibility is often infeasible.

Capabilities

OpenAI's evaluations of GPT-3.5-Turbo and GPT-4 are characterized in the academic-style articles [1, 26] (Note: many of the evaluations in [26] are likely out of date, as OpenAI has continued to improve the model since the January 2022 publication date).

As of its release, in March 2023, GPT-4 achieved state-of-the-art performance on several academia-specified benchmarks for large language models. These include multiple choice questions based on broad knowledge (MMLU) and science (AI2 Reasoning), reasoning (HellaSwag, WinoGrande), grade-school math (GSM-8K), and python coding problems (HumanEval). On the MMLU task, when OpenAI translated questions to a variety of other languages, GPT-4 achieved target language performance that exceeded the English performance of competitor models Chinchilla and PaLM on at least 24 languages. OpenAI also reported human-level performance on several real-world knowledge exams such as the Standard Uniform Bar, various Advanced Placement exams, and the SAT.

In general, GPT-3.5-Turbo's performance lags that of GPT-4. The difference in capabilities between the two models depends on the task given to the model. For instance, GPT-4's performance on the Uniform Bar Exam is dramatically higher than GPT-3.5's (90th percentile vs. 10th percentile), whereas the performance difference on the SAT Evidence-based Reasoning and Writing exam is marginal (93rd percentile vs. 87th percentile).

A substantial and growing body of research beyond that detailed in [1] exists. It is impossible to summarize the entire body of research in this risk profile. Credo AI offers the general guidance that developers considering using the Chat Completions API for applications development should consult the segment of the literature dedicated to their specific use. The results cited above will not necessarily carry over to a narrow use case. Performance lapses are feasible depending on context.

Misbehavior

As stated above, the GPT models are probabilistic and thus any measure of model (non-) alignment is objective only to the extent that the evaluation conditions match real world use. Developers who use the Chat Completions API to develop applications that deviate from the tested conditions are likely to experience different risk surfaces.

Factuality

OpenAI's evaluations suggest GPT-4 displays varying levels of factuality depending on the subject of the tested prompt [1] ranging from approximately 70-81% accuracy on a proprietary evaluation. GPT-3.5 is approximately 19 percentage points worse than GPT-4 on this evaluation. Likewise, on the TruthfulQA benchmark GPT-4 and GPT-3.5 demonstrate 60% and ~46% accuracy respectively. These results indicate the models are prone to confabulate facts and information anywhere from 20-50% of the time. As discussed above, this phenomenon can likely be addressed (but not eliminated) through prompt engineering and context loading strategies.

Sensitive Content

OpenAI's evaluations suggest GPT-4 and GPT-3.5-Turbo produce toxic content .79% and 6.48% of the time on a toxicity benchmark.

Prompt Injection & Jailbreaking

Because of the probabilistic nature of these models, it is impossible to anticipate the number or variety of prompts that can be used to successfully jailbreak a model. Formal estimates of the rate of these attacks cannot be obtained.

Human preference

Recently, several organizations or individuals have published websites with direct comparisons between the GPT-3.5-Turbo (often referred to as "ChatGPT") and GPT-4 models and other (proprietary or open-source) models. These comparisons are typically non-specific: users are provided with a chat interface and are given the ability to submit a prompt to two chat models simultaneously. The identity of the models is hidden from the user and the user is encouraged to rank the response they prefer; this is an imperfect measure of model quality and harmlessness. Across thousands of comparisons, the comparison service is able to calculate a per-model win rate and ELO rating. As of this writing (June 2023), GPT-4 is widely considered the top model and GPT-3.5-Turbo typically ranks 3rd or 4th below variants of Anthropic's Claude model. For more details, see the leaderboard

Certifications

Credo AI has identified the following regulations and standards as relevant to the privacy, security, and compliance requirements of our customers. OpenAI's advertised compliance is detailed in the second column. For more details, see https://trust.openai.com/

‍

Conclusion

The OpenAI Chat Completions API and its associated GPT-4 and GPT-3.5-Turbo models represent substantial capabilities and risks. As the product literature states, these models represent "significant advancements" in the field of large language models. They also exhibit numerous known weaknesses and capabilities for misbehavior and unintentional harm.

Research into the capabilities and risks of GPT-4 and GPT-3.5-Turbo is ongoing. As the models continue to proliferate, additional risks are likely to surface, as are additional mitigation strategies. Developers should keep an ongoing eye to developments from OpenAI and the research community. Open communication between API users and OpenAI will be key to continued progress.

References

[1] GPT-4 Technical Report - https://arxiv.org/pdf/2303.08774.pdf

[2] OpenAI API Documentation - https://platform.openai.com/docs/introduction

[3] Hacker News ChatGPT Grandma Exploit - https://news.ycombinator.com/item?id=35630801

[4] GitHub Copilot Class Action Lawsuit - https://githubcopilotinvestigation.com/

[5] ChatGPT and Copyright: The Ultimate Appropriation - https://techpolicy.press/chatgpt-and-copyright-the-ultimate-appropriation/

[6] Copyright, Professional Perspective - Copyright Chaos: Legal Implications of Generative AI - https://www.bloomberglaw.com/external/document/XDDQ1PNK000000/copyrights-professional-perspective-copyright-chaos-legal-implic

[7] OpenAI Security & Privacy page - https://openai.com/security

[8] U.S> Energy Information Administration - https://www.eia.gov/tools/faqs/faq.php?id=97&t=3#:~:text=In%202021%2C%20the%20average%20annual,about%20886%20kWh%20per%20month.

[9] Making AI Less "Thirsty": Uncovering and Addressing the Secret Water Footprint of AI Models - https://arxiv.org/pdf/2304.03271.pdf

[10] The Carbon Footprint of ChatGPT - https://medium.com/@chrispointon/the-carbon-footprint-of-chatgpt-e1bc14e4cc2a

[11] SimilarWeb ChatGPT Analysis - https://www.similarweb.com/website/chat.openai.com/#overview

[12] Forecasting Misuse - https://openai.com/research/forecasting-misuse

[13] Generative Language Models and Automated Influence Operations: Emerging Threats and Potential Mitigations - https://arxiv.org/abs/2301.04246

[14] GPTs are GPTs: An Early Look at the Labor Market Impact Potential of Large Language Models - https://arxiv.org/pdf/2303.10130.pdf

[15] How the CEO behind ChatGPT won over Congress - https://www.cnn.com/2023/05/17/tech/sam-altman-congress/index.html

[16] Language Models are Few-Shot Learners - https://arxiv.org/pdf/2005.14165.pdf

[17] GPT Best Practices (OpenAI API Documentation) - https://platform.openai.com/docs/guides/gpt-best-practices

[18] Our approach to AI safety - OpenAI - https://openai.com/blog/our-approach-to-ai-safety

[19] OpenAI API Documentation: Moderation - https://platform.openai.com/docs/guides/moderation/overview

[20] Illustrating Reinforcement Learning from Human Feedback (RLHF) - https://huggingface.co/blog/rlhf

[21] OpenAI and Microsoft extend partnership - https://openai.com/blog/openai-and-microsoft-extend-partnership

[22] API Data Usage Policies - https://openai.com/policies/api-data-usage-policies

[23] DAIR Prompt Engineering Guide - https://github.com/dair-ai/Prompt-Engineering-Guide

[24] Announcing Arthur Shield: The First Firewall for LLMs - https://www.arthur.ai/blog/announcing-arthur-shield-the-first-firewall-for-llms

[25] Hive Moderation - https://hivemoderation.com/

[26] Training language models to follow instructions with human feedback - https://arxiv.org/pdf/2203.02155.pdf

[27] Microsoft will be carbon negative by 2030 - https://blogs.microsoft.com/blog/2020/01/16/microsoft-will-be-carbon-negative-by-2030/

Notes

Italics denote Credo AI definitions of key concepts.

AI Disclosure: The "Product Description" and "Intended Use Case" sections were generated with assistance from ChatGPT using the GPT-4 model. Credo AI fed official product documentation to the model and prompted the model to summarize or rephrase text according to the desired format for this risk profile. The final text was reviewed for accuracy and suitability and underwent human editing by Credo AI.

The "Conclusion" section was generated with assistance from Anthropic's Claude. Credo AI fed the other sections of this profile to Claude and prompted it to write a conclusion section. As with the other AI-assisted sections, Credo AI reviewed Claude's output for accuracy and suitability and performed edits where appropriate.

‍