OpenAI unveils a new ChatGPT agent for ‘deep research’

OpenAI is announcing a new AI “agent” designed to help people conduct in-depth, complex research using ChatGPT, the company’s AI-powered chatbot platform.

Appropriately enough, it’s called deep research.

OpenAI said in a blog post published Sunday that these this new capability was designed for “people who do intensive knowledge work in areas like finance, science, policy, and engineering and need thorough, precise, and reliable research.” It could also be useful, the company added, for anyone making “purchases that typically require careful research, like cars, appliances, and furniture.”

Basically, ChatGPT deep research is intended for instances where you don’t just want a quick answer or summary, but instead need to assiduously consider information from multiple websites and other sources.

OpenAI said it’s making deep research available to ChatGPT Pro users today, limited to 100 queries per month, with support for Plus and Team users coming next, followed by Enterprise. (OpenAI is targeting a Plus rollout in about a month from now, the company said, and the query limits for paid users should be “significantly higher” soon.) It’s a geo-targeted launch; OpenAI had no release timeline to share for ChatGPT customers in the U.K., Switzerland, and the European Economic Area.

To use ChatGPT deep research, you’ll just select “deep research” in the composer and then enter a query, with the option to attach files or spreadsheets. (It’s a web-only experience for now, with mobile and desktop app integration to come later this month.) Deep research could then take anywhere from 5 to 30 minutes to answer the question, and you’ll get a notification when the search completes.

Currently, ChatGPT deep research’s outputs are text-only. But OpenAI said that it intends to add embedded images, data visualizations, and other “analytic” outputs soon. Also on the roadmap is the ability to connect “more specialized data sources,” including “subscription-based” and internal resources, OpenAI added.

The big question is, just how precise is ChatGPT deep research? AI is imperfect, after all. It’s prone to hallucinations and other types of errors that could be particularly harmful in a “deep research” scenario. That’s perhaps why OpenAI said every ChatGPT deep research output will be “fully documented, with clear citations and a summary of [the] thinking, making it easy to reference and verify the information.”

The jury’s out on whether those mitigations will be sufficient to combat AI mistakes. OpenAI’s AI-powered web search feature in ChatGPT, ChatGPT Search, not infrequently makes gaffes and gives wrong answers to questions. TechCrunch’s testing found that ChatGPT Search produced less useful results than Google Search for certain queries.

To beef up deep research’s accuracy, OpenAI is using a special version of its recently announced o3 “reasoning” AI model that was trained through reinforcement learning on “real-world tasks requiring browser and Python tool use.” Reinforcement learning essentially “teaches” a model via trial and error to achieve a specific goal. As the model gets closer to the goal, it receives virtual “rewards” that, ideally, make it better at the task going forward.

It said this version of the OpenAI o3 model is “optimized for web browsing and data analysis,” adding that “it leverages reasoning to search, interpret, and analyze massive amounts of text, images, and PDFs on the internet, pivoting as needed in reaction to information it encounters […] The model is also able to browse over user uploaded files, plot and iterate on graphs using the python tool, embed both generated graphs and images from websites in its responses, and cite specific sentences or passages from its sources.”

The company said that it tested ChatGPT deep research using Humanity’s Last Exam, an evaluation that includes more than 3,000 expert-level questions in a variety of academic fields. The o3 model powering deep research achieved an accuracy of 26.6%, which might look like a failing grade — but Humanity’s Last Exam was designed to be tougher than other benchmarks to stay ahead of model advancements. According to OpenAI, the deep research o3 model came in way ahead of Gemini Thinking (6.2%), Grok-2 (3.8%), and OpenAI’s own GPT-4o (3.3%).

Still, OpenAI notes that ChatGPT deep research has limitations, sometimes making mistakes and incorrect inferences. Deep research may struggle to distinguish authoritative information from rumors, the company said, and often fails to convey when it’s uncertain about something — and it can also make formatting errors in reports and citations.

For anyone worried about the impact of generative AI on students, or on anyone trying to find information online, this type of in-depth, well-cited output probably sounds more appealing than a deceptively simple chatbot summary with no citations. But we’ll see whether most users will actually subject the output to real analysis and double-checking, or if they simply treat it as a more professional-looking text to copy-paste.

And if this all sounds familiar, Google actually announced a similar AI feature with the exact same name less than two months ago.

Source link