AI gets smarter, safer, more visual with GPT-4 update, OpenAI says
The hottest AI technology foundation got a big upgrade on Tuesday with OpenAI’s GPT-4 release now available in the premium version of ChatGPT chatbot.
GPT-4 can generate much longer strings of text and respond when people feed it images, and it’s designed to do a better job of avoiding artificial intelligence pitfalls visible in the earlier GPT-3.5, OpenAI said Tuesday. For example, when taking bar exams that lawyers must pass to practice law, GPT-4 ranks in the top 1[ads1]0% of scores compared to the bottom 10% for GPT-3.5, the AI research company said.
GPT stands for Generative Pretrained Transformer, a reference to the fact that it can generate text on its own – now up to 25,000 words with GPT-4 – and that it uses an AI technology called transformers that Google pioneered. It’s a type of AI called a large language model, or LLM, that’s trained on large amounts of data pulled from the internet, and learns mathematically to detect patterns and reproduce styles. Human supervisors evaluate results to steer GPT in the right direction, and GPT-4 has more of this feedback.
OpenAI has made GPT available to developers for years, but ChatGPT, which debuted in November, offered a simple interface ordinary people could use. It sparked an explosion of interest, experimentation and concern about the downsides of the technology. It can do everything from generating programming code and answering exam questions to writing poetry and delivering basic facts. It is remarkable if not always reliable.
ChatGPT is free, but it can falter when demand is high. In January, OpenAI began offering ChatGPT Plus for $20 per month with secure availability and now the GPT-4 base. Developers can register on a waiting list to get their own access to GPT-4.
GPT-4 advances
“In casual conversation, the distinction between GPT-3.5 and GPT-4 can be subtle. The difference comes out when the complexity of the task reaches a sufficient threshold,” OpenAI said. “GPT-4 is more reliable, creative and capable of handling much more nuanced instructions than GPT-3.5.”
Another major advance in GPT-4 is the ability to accept input that includes text and images. OpenAI’s example is asking the chatbot to explain a joke that shows a bulky decades-old computer cable connected to a modern iPhone’s tiny Lightning port. This feature also helps GPT take tests that are not text only, but it is not yet available in ChatGPT Plus.
Another is better performance to avoid AI problems like hallucinations — erroneous fabricated answers, often offered with as much apparent authority as answers the AI gets right. GPT-4 is also better at thwarting attempts to make it say the wrong thing: “GPT-4 scores 40% higher than our last GPT-3.5 on our internal adversarial fact-finding tests,” OpenAI said.
GPT-4 also adds new “manageability” options. Users of large language models today often have to engage in elaborate “prompt engineering,” learning how to embed specific cues in their prompts to get the right kind of response. GPT-4 adds a system command option that allows users to set a specific tone or style, such as programming code or a Socratic tutor: “You are a tutor who always answers in the Socratic style. You never give the student the answer, but always try to ask exactly the right question to help them learn to think for themselves.”
“Stochastic parrots” and other problems
OpenAI acknowledges significant shortcomings that persist with GPT-4, although it also suggests progress toward avoiding them.
“It can sometimes make simple errors in reasoning … or be too gullible in accepting obvious false statements from a user. And it can sometimes fail at hard problems in the same way humans do, such as introducing security vulnerabilities into code it produces,” OpenAI said. In addition, “GPT-4 can also be sure to be wrong in its predictions, not taking care to double-check its work when it is likely to make a mistake.”
Large language models can deliver impressive results, and appear to understand vast amounts of subject matter and conversation in human-like, if somewhat stilted, language. Basically, however, LLM AIs don’t know anything. They are only capable of putting words together in statistically very sophisticated ways.
This statistical but fundamentally hollow approach to knowledge led researchers, including former Google AI researchers Emily Bender and Timnit Gebru, to warn of the “dangers of stochastic parroting” that come with large language models. Language model AIs tend to encode biases, stereotypes, and negative emotions present in training data, and researchers and other people using these models tend to “mistake … performance gains for actual natural language understanding.”
OpenAI CEO Sam Altman acknowledges problems, but he is generally pleased with the progress shown with GPT-4. “It’s more creative than previous models, it hallucinates significantly less, and it’s less biased. It can pass a bar exam and get 5s on several AP exams.” Altman tweeted Tuesday.
One concern with AI is that students will use it to cheat, such as when answering essay questions. That’s a real risk, even if some teachers actively embrace the LLM as a tool, like search engines and Wikipedia. Plagiarism detection companies are adapting to AI by training their own detection models. One such company, Crossplag, said Wednesday that after testing about 50 documents that GPT-4 generated, “our accuracy rate was over 98.5%.”
OpenAI, Microsoft and Nvidia collaborate
OpenAI got a big boost when Microsoft said in February that it is using GPT technology in its Bing search engine, including a chat feature similar to ChatGPT. On Tuesday, Microsoft said it is using GPT-4 for its Bing work. Together, OpenAI and Microsoft form one major search threat to Googlebut Google also has its own major language model technology, including a chatbot called Bard which Google tests privately.
Also on Tuesday, Google announced that it will begin limited testing of its own AI technology to boost the writing of Gmail emails and Google Docs word processing documents. “With your collaborative AI partner, you can continue to refine and edit, getting more suggestions as needed,” Google said.
This wording reflects Microsoft’s “co-pilot” positioning of AI technology. Calling it an aid to human-directed work is a common stance, given the problems with the technology and the necessity of close human supervision.
Microsoft uses GPT technology both to evaluate the searches people type into Bing and, in some cases, to offer more elaborate conversational responses. The the results can be much more informative than previous search engines, but the more conversational interface that can be invoked as an alternative has had problems that make it look unhinged.
To train GPT, OpenAI used Microsoft’s Azure cloud computing service, including thousands of Nvidia’s A100 graphics processing units, or GPUs, connected together. Azure can now use Nvidia’s new H100 processors, which include specific circuitry to accelerate AI transformer calculations.
AI chatbots everywhere
Another major language model developer, Anthropic, also unveiled an AI chatbot called Claude on Tuesday. The company, which counts Google as an investor, opened a waiting list for Claude.
“Claude is capable of a wide range of conversational and word processing tasks while maintaining a high degree of reliability and predictability,” Anthropic said in a blog post. “Claude can help with use cases including summarizing, searching, creative and collaborative writing, Q&A, coding and more.”
It is one of a growing crowd. The Chinese search and technology giant Baidu is working on a chatbot called Ernie Bot. Meta, parent of Facebook and Instagram, consolidated its AI business into a larger team and plans to build more generative AI into its products. Even Snapchat is getting into the game with a GPT-based chatbot called My AI.
Expect more improvements in the future.
“We’ve had the first tutorial of GPT-4 done for a while, but it’s taken us a long time and a lot of work to feel ready to release it,” Altman tweeted. “We hope you like it, and we greatly appreciate your feedback on its shortcomings.”
Editor’s note: CNET uses an AI engine to create some personal finance explanations that are edited and fact-checked by our editors. For more, see this post.