Last week OpenAI held its inaugural developer conference, bringing with it a slew of new product announcements. There’s a lot to wade through, so we pulled out the top four that are most relevant for businesses that are already dabbling in the AI space or are looking at dipping their toes in the water.
But if you are interested in some of the other more general announcements:
- DALL-E 3 is now available for ChatGPT Plus and enterprise customers
- OpenAI is rolling out new image and voice functionality for ChatGPT
OpenAI unveils GPT-4 Turbo, doubles tokens and extends its knowledge to 2023
First up we have the release of GPT-4 Turbo, the most advanced version of its GPT-4 models. This is relevant to businesses using the GPT-4 API.
This service allows businesses to use a sophisticated language model to develop interactive and conversational experiences with their customers. Think of a language model as a type of software that can interpret and generate text in response to given inputs.
With GPT-4 API, businesses can create tools like chatbots or virtual assistants that can engage in natural and fluid conversations with users, enhancing customer interaction and support.
The API pricing is based on tokens, which are essentially small pieces of information. In the context of GPT-4 Turbo, a token can be a word, part of a word, or even punctuation.
Tokens are used as the basic units for the AI to process and generate text or analyse images. When you input text into GPT-4 Turbo, it breaks down your input into these tokens. The model then uses these tokens to understand the input and generate a response, which is also broken down into tokens.
The pricing is based on how many tokens are processed, both for the input and the output. So users only pay for what they use.
GPT-4 Turbo allows for a capacity of up to 128,000 tokens, which equates to about 100,000 words. This is four times the amount allowed for in the regular GPT-4.
It also has a huge upgrade when it comes to knowledge cutoff, coming in at April 2023. Comparatively, regular GPT-4 can only go up to September 2021, making it less reliable.
GPT-4 Turbo is available in two versions: one for text analysis and another for both text and image comprehension. The text-only model is priced at US$0.01 per 1,000 input tokens and US$0.03 per 1,000 output tokens, with the image processing function priced at US$0.00765 per 1080×1080 pixel image.
Custom GPTs you may be able to monetise
OpenAI announced the capability for users to create custom GPTs that, according to CEO Sam Altman, “you can create for a specific purpose”. The kicker here is that OpenAI says that businesses will be able to do this without any coding skills required.
“You can in effect program a GPT with language just by talking to it. It’s easy to customise the behaviour so they do what you want. This makes them very accessible and gives agency to everyone,” Altman said.
This means you could create your own GPT for say, a startup accelerator, and program it to be relevant specifically to cohorts coming through.
In addition to this, businesses will be able to publish their own AI bots in the upcoming GPT Store, which was also announced during the event.
“Later this month, we’re launching the GPT Store, featuring creations by verified builders. Once in the store, GPTs become searchable and may climb the leaderboards,” OpenAI said in a blog post.
“We will also spotlight the most useful and delightful GPTs we come across in categories like productivity, education, and ‘just for fun’. In the coming months, you’ll also be able to earn money based on how many people are using your GPT.”
Open AI’s Copyright Shield Program
Intellectual property protection has been a huge concern in the generative AI space. Artists and authors alike have been rallying against their work being included in datasets to train large language modes (LLMs) — including allegations levelled against OpenAI and Google.
On top of that, smaller businesses have raised concerns about infringing on the intellectual property of others by using AI tools that utilise LLMs that have been trained on potentially copyrighted materials. This is the part of this whole mess that OpenAI is addressing.
Copyright Shield aims to cover legal costs for businesses using OpenAI’s developer platform and ChatGPT Enterprise against copyright infringement claims related to work generated by OpenAI’s products.
However, the coverage does not extend to all of the company’s tools. So if you’re using the Plus or free tiers of ChatGPT, you won’t be protected. It’s also currently unclear if OpenAI will also offer indemnity for training data usage.
OpenAI Data Partnerships
OpenAI is launching Data Partnerships, an initiative aimed at collaborating with third-party organisations to create public and private datasets for training AI models.
According to the company, the goal is to build AI models that comprehensively understand diverse subjects, industries, cultures, and languages.
“Modern AI technology learns skills and aspects of our world — of people, our motivations, interactions, and the way we communicate — by making sense of the data on which it’s trained,” OpenAI said in a blog post.
“To ultimately make [artificial general intelligence] GI that is safe and beneficial to all of humanity, we’d like AI models to deeply understand all subject matters, industries, cultures, and languages, which requires as broad a training dataset as possible.”
OpenAI says it is looking for extensive datasets that represent various aspects of human society, particularly data that isn’t readily available online. It says it is looking for various modalities — including text, images, audio, and video. It’s looking for datasets that do not include personal or sensitive information and has offered support to partners to remove this type of content.
OpenAI says it is currently looking at two avenues for partnership. The first is an open-source archive, aimed at creating a public dataset for AI model training, contributing to the open-source ecosystem. The second is for private datasets, where organisations can contribute data for training proprietary AI models while maintaining privacy and control over their data.