Slot Gacor Gampang Menang Situs Slot Gacor

Interview with OpenAI’s Greg Brockman: GPT-4 isn’t perfect, but neither are you

OpenAI shipped GPT-4 yesterday, the much-anticipated text-generating AI model, and it’s a strange piece of work.

GPT-4 improves on its predecessor, GPT-3, in important ways, such as providing more factually true statements and allowing developers to prescribe the style and behavior more easily. It is also multimodal in the sense that it can understand images, so it can caption and even explain in detail the content of an image.

But GPT-4 has serious shortcomings. Like GPT-3, the model “hallucinates” facts and makes fundamental reasoning errors. In one example on OpenAI’s own blog, GPT-4 describes Elvis Presley as “the son of an actor”[ads1];. (Neither of his parents were actors.)

To get a better handle on GPT-4’s development cycle and its capabilities, as well as its limitations, TechCrunch spoke with Greg Brockman, one of the co-founders of OpenAI and its president, via video call on Tuesday.

Asked to compare GPT-4 to GPT-3, Brockman had one word: Different.

“It’s just different,” he told TechCrunch. “There are still many problems and bugs [the model] does … but you can really see the jump in skills in things like calculus or law, where it went from being really bad in certain domains to actually being pretty good at people.”

Test results support his case. On the AP Calculus BC exam, GPT-4 gets 4 out of 5, while GPT-3 gets 1. (GPT-3.5, the intermediate model between GPT-3 and GPT-4, also gets 4.) simulated bar exam, GPT-4 passed with a score around the top 10% of test takers; GPT-3.5’s score hovered around the bottom 10%.

Shifting gears, one of the GPT-4’s more exciting aspects is the aforementioned multimodality. Unlike GPT-3 and GPT-3.5, which could only accept text messages (e.g. “Write an essay about giraffes”), GPT-4 can take a message from both images and text to perform an action (e.g. eg a picture of giraffes in the Serengeti with the message “How many giraffes are shown here?”).

That’s because GPT-4 was trained on images and text data while the predecessors were only trained on text. OpenAI says the training data came from “a variety of licensed, created and publicly available data sources, which may include publicly available personal information,” but Brockman responded when I asked for details. (Training data has gotten OpenAI into legal trouble before.)

The GPT-4’s image understanding capabilities are quite impressive. For example, fed the prompt “What’s funny about this image? Describe it panel by panel” plus a three-panel image showing a fake VGA cable connecting to an iPhone, GPT-4 provides an overview of each image panel and correctly explains the joke (“The humor in this image comes from the absurdity of connecting a large, outdated VGA connector to a small, modern smartphone charging port”).

Only a single launch partner has access to GPT-4’s image analysis capabilities at the moment – ​​an assistive app for the visually impaired called Be My Eyes. Brockman says the wider rollout, whenever it happens, will be “slow and deliberate” as OpenAI evaluates the risks and benefits.

“There are policy issues like facial recognition and how to process images of people that we need to address and work through,” Brockman said. “We need to figure out, for example, where the types of danger zones are — where the red lines are — and then clarify that over time.”

OpenAI dealt with similar ethical dilemmas surrounding DALL-E 2, its text-to-image system. After initially disabling the feature, OpenAI allowed customers to upload people’s faces to edit them using the AI-powered image generation system. At the time, OpenAI claimed that upgrades to its security system made the face-editing feature possible by “minimizing the potential for harm” from deep fakes as well as attempts to create sexual, political and violent content.

Another perennial plant prevents GPT-4 from being used in unintended ways that could cause harm—psychological, financial, or otherwise. Hours after the model’s release, Israeli cybersecurity startup Adversa AI published a blog post demonstrating methods to bypass OpenAI’s content filters and cause GPT-4 to generate phishing emails, offensive descriptions of gay people, and other highly offensive text.

It is not a new phenomenon in the language modeling domain. Meta’s BlenderBot and OpenAI’s ChatGPT have also been asked to say wildly offensive things, even revealing sensitive details about their inner workings. But many had hoped, including this reporter, that GPT-4 could deliver significant improvements on the moderation front.

When asked about GPT-4’s robustness, Brockman emphasized that the model has gone through six months of security training and that in internal tests it was 82% less likely to respond to requests for content not allowed by OpenAI’s usage policy and 40% more likely. to produce “actual” responses than GPT-3.5.

“We spent a lot of time trying to understand what GPT-4 is capable of,” Brockman said. “Getting it out into the world is how we learn. We’re constantly making updates, including a bunch of improvements, so the model is much more scalable to whatever personality or type of mode you want it to be in.”

The early real-world results aren’t that promising, frankly. Beyond the Adversa AI tests, Bing Chat, Microsoft’s chatbot powered by GPT-4, has been shown to be highly susceptible to jailbreaking. Using carefully tailored inputs, users have been able to get the bot to profess love, threaten harm, defend the Holocaust, and invent conspiracy theories.

Brockman did not deny that GPT-4 falls short, here. But he emphasized the model’s new mitigation management tools, including an API-level feature called “system” messages. System messages are essentially instructions that set the tone—and set limits—for GPT-4’s interactions. For example, a system message might read: “You are a supervisor who always answers in Socratic style. You never give the student the answer, but always try to ask the right question to help them learn to think for themselves.”

The idea is that the system messages act as a guardrail to prevent GPT-4 from veering off course.

“Really figuring out GPT-4’s tone, style and substance has been a big focus for us,” said Brockman. “I think we’re starting to understand a little bit more of how to do engineering, about how to have a repeatable process that kind of gets you to predictable results that are going to be very useful to people.”

Brockman also pointed to Eval, OpenAI’s newly opened software framework for evaluating the performance of its AI models, as a sign of OpenAI’s commitment to “robustifying” its models. Evals allow users to develop and run benchmarks to evaluate models like GPT-4 while inspecting their performance—a kind of crowdsourced approach to model testing.

“With Evals we can see [use cases] that users care about in a systematic form that we can test against,” Brockman said. “Part of why we [open-sourced] that’s because we’re moving away from releasing a new model every three months—whatever it was before—to making constant improvements. You don’t make what you don’t measure, right? When we create new versions [of the model]we can at least be aware of what those changes are.”

I asked Brockman if OpenAI would ever compensate people for testing their models with Evals. He wouldn’t commit to it, but he noted that — for a limited time — OpenAI is giving select Evals users early access to the GPT-4 API.

Brockman and I’s conversation also touched on GPT-4’s context window, which refers to the text the model can consider before generating additional text. OpenAI is testing a version of GPT-4 that can “remember” about 50 pages of content, or five times as much as vanilla GPT-4 can hold in “memory” and eight times as much as GPT-3.

Brockman believes that the expanded context window leads to new, previously unexplored applications, especially in the enterprise. He envisions an AI chatbot built for a company that leverages context and knowledge from various sources, including cross-departmental employees, to answer questions in a very informed yet conversational way.

It is not a new concept. But Brockman claims that GPT-4’s answers will be far more useful than those from chatbots and search engines today.

“In the past, the model didn’t have any knowledge of who you are, what you’re interested in, and so on,” Brockman said. “Having such a history [with the larger context window] is definitely going to make it more capable … It will supercharge what people can do.”

Source link

Back to top button