AiNews.com
Posts
ChatGPT Agent Can Now Act on Your Behalf—Using Its Own Virtual Computer

ChatGPT Agent Can Now Act on Your Behalf—Using Its Own Virtual Computer

ChatGPT’s new agent mode introduces powerful real-world task automation, combining research, reasoning, and web interaction in one unified system.

Alicia Shapiro
July 18, 2025 • Estimated Reading Time: 16 minutes

A realistic desktop scene shows a laptop running ChatGPT in agent mode, with a split-screen interface displaying a news website, a Python code editor, and a slide deck titled "Sales Report" with a bar chart. A dark dialog box in the center of the screen asks, “Would you like me to submit this report?” with “Approve” and “Edit” buttons beneath it. Next to the laptop is a smartphone showing a ChatGPT notification that reads “How it’s all complete,” a coffee mug, and an open notebook with light scribbles. The desk is wooden, and the lighting is natural, creating a clean, professional atmosphere.

Image Source: ChatGPT-4o

ChatGPT Agent Can Now Act on Your Behalf—Using Its Own Virtual Computer

Key Takeaways:

ChatGPT Agent now uses a virtual computer to autonomously complete complex tasks, like creating slide decks, managing calendars, or booking travel—based on natural language instructions.
The agent combines and builds on previous tools like Operator and deep research, enabling seamless transitions between reasoning, web browsing, code execution, and file manipulation.
OpenAI reports state-of-the-art benchmark performance, including a 44.4% pass rate on Humanity’s Last Exam and 45.5% on SpreadsheetBench, surpassing other AI agents and even human baselines in some domains.
Agent mode is available starting today to Pro, Plus, and Team users, with Enterprise and Education access coming soon; Pro users receive 400 monthly messages.
Significant safety features are in place, including prompt injection defenses, user approvals for real-world actions, and a high biosafety classification under OpenAI’s Preparedness Framework.

Introducing ChatGPT Agent: OpenAI’s Most Capable Assistant Yet

OpenAI has launched ChatGPT Agent, a new feature that enables the model to act independently on a user’s behalf—using its own virtual computer to complete multi-step, real-world tasks.

Users can now assign ChatGPT tasks such as:

Briefing them on client meetings by pulling in calendar data and related news
Planning and ordering ingredients for a Japanese breakfast for four
Analyzing competitors and delivering a full presentation

ChatGPT performs these jobs end-to-end, navigating the web, running code, manipulating files, prompting users for logins when needed, and returning finished artifacts like editable slideshows or spreadsheets. The experience is interactive—users can pause or redirect the agent mid-task—and is designed to feel like working with a highly competent assistant who asks clarifying questions and adapts on the fly.

Built From Operator and Deep Research Foundations

ChatGPT Agent merges and expands upon the capabilities of Operator and deep research, two tools previously available in limited previews. Operator enabled web-based interaction—scrolling, clicking, and filling out forms—while deep research specialized in analyzing and summarizing complex information.

But each worked best in different scenarios: Operator couldn’t dive deep into reasoning or produce detailed reports, and deep research couldn’t interact with websites or handle authentication flows. OpenAI found that many users tried to use Operator for research-heavy tasks, highlighting the need for a more integrated solution. The new system brings these strengths together, adding tool flexibility and dynamic planning based on task complexity.

By combining these strengths, ChatGPT Agent introduces flexible tool selection and dynamic planning that adapts to each task’s complexity.

How ChatGPT Agent Works

At its core is a unified agentic system—one that can shift between tools like:

A visual web browser for human-like site interaction
A text-based browser for efficient reading and reasoning
A terminal for code execution
API and connector integrations for services like Gmail and GitHub

ChatGPT Agent can also log in to websites through a secure browser takeover, enabling deeper access to content and more advanced task execution. By combining multiple ways of interacting with the web—including visual browsing, text-based analysis, and direct API integrations—the agent can choose the most efficient path for each task. For example, it might access calendar data via an API, use the text browser to process large volumes of information quickly, and switch to the visual browser to navigate sites built for human interaction.

Because it runs on a dedicated virtual machine, ChatGPT Agent maintains persistent context across tools, allowing it to, for example, download a file, run transformations on it, and visualize the output—all within the same session. ChatGPT Agent flexibly adapts to each task, optimizing for speed, accuracy, and performance.

ChatGPT Agent is built for flexible, collaborative workflows—far more interactive than previous models. You can step in at any point to clarify instructions, shift the direction of the task, or change it entirely. The agent will pick up where it left off, incorporating new input without losing progress.

It may also ask follow-up questions when it needs more context, ensuring tasks stay aligned with your goals. If a task feels stuck or takes longer than expected, you can pause it, request a summary, or stop it and still receive partial results. For added convenience, if you're using the ChatGPT mobile app, you’ll get a notification when the task is complete—keeping the experience responsive even when you're away from your screen.

You can activate ChatGPT Agent directly from the tools dropdown in any conversation by selecting “agent mode.” From there, simply describe your task—whether it’s conducting research, generating a slideshow, or submitting expenses—and the agent will get to work. An on-screen narration shows what actions it’s taking, and you can pause or take control at any point to keep things on track.

ChatGPT Agent also integrates with your existing workflows through connectors, such as Gmail or Google Calendar. Once authenticated, it can use these sources to summarize your inbox, check availability, or pull relevant data into its responses. You’ll always be prompted to log in securely before it takes action on your behalf.

And for repeat tasks, you can even schedule them to run automatically—like generating a weekly metrics report every Monday morning.

Most importantly, you stay in control. ChatGPT asks for permission before taking any significant actions, and you can pause, take over the browser, or stop tasks at any time to keep things aligned with your goals.

Available Today in Agent Mode

Starting today, ChatGPT Agent is rolling out to Pro, Plus, and Team users. Pro users will receive access by the end of the day, while Plus and Team accounts will see the new agent mode appear over the coming days. Once available, it can be activated at any time through the tools dropdown in any ChatGPT conversation.

Pro users receive 400 messages per month, while other paid tiers get 40, with flexible add-on credits. Enterprise and Education users will gain access in the coming weeks. The update is not yet available in the European Economic Area or Switzerland.

For those who preferred deep research’s slower, more detailed output, that mode is still available via a dropdown selection.

Expanding ChatGPT’s Real-World Usefulness

With its new agentic system, ChatGPT now supports a wide range of knowledge-work and personal tasks, including:

Turning screenshots into structured presentations
Rearranging meetings and booking offsites
Updating spreadsheets with new financials while retaining formatting
Planning events or booking travel itineraries
Finding specialists and scheduling appointments

These features reflect OpenAI’s goal of broadening ChatGPT’s real-world utility, from solo professionals to team workflows.

ChatGPT Agent Benchmark Performance

OpenAI reports that the new agent model delivers state-of-the-art performance across several rigorous benchmarks:

Humanity’s Last Exam (HLE): Measures performance on expert-level questions across a wide range of academic and professional subjects. ChatGPT Agent achieved a 41.6% pass@1 score, rising to 44.4% when using parallel runs with self-reported confidence scoring. View full benchmark results here.
FrontierMath: A high-difficulty benchmark featuring problems that can take even expert mathematicians hours or days to solve. ChatGPT Agent scored 27.4% with tool use, surpassing earlier models by a wide margin. View full benchmark results here.
DSBench and SpreadsheetBench: DSBench evaluates agent performance on realistic data science workflows, including data analysis and modeling, and its output significantly exceeds that of human experts across multiple tasks. View full benchmark results here.

SpreadsheetBench tests the model’s ability to edit and format spreadsheets derived from real-world use cases. ChatGPT Agent outperformed human baselines and Copilot in Excel, reaching 45.5% accuracy on SpreadsheetBench. View full benchmark results here.
BrowseComp: Tests an agent’s ability to locate hard-to-find, high-value information on the web. ChatGPT Agent set a new high of 68.9%, outperforming previous models by 17.4 percentage points. View full benchmark results here.
WebArena: A simulated environment for completing realistic web-based tasks, such as filling out forms or navigating websites. ChatGPT Agent shows strong improvements over OpenAI’s earlier Operator-based model. View full benchmark results here.

The model also excels in internal benchmarks for complex, economically valuable knowledge-work tasks, such as building leveraged buyout models or financial statements—often matching or exceeding top human baselines in both speed and accuracy. These benchmarks are based on real-world professional work across industries, and ChatGPT Agent significantly outperforms previous OpenAI models like o3 and o4-mini in both quality and efficiency.

Safety, Supervision, and Biosafety Protections

With these expanded powers come expanded risks. This is the first time ChatGPT can take direct actions on the web, including interacting with your accounts through browser takeover and connected apps. To address this shift, OpenAI has implemented its most comprehensive safety controls to date.

Digital Safeguards and User Oversight

Prompt injection defenses are a core focus. Because the agent reads live web content, it may encounter malicious instructions hidden in webpages—such as in invisible text or metadata—that attempt to hijack its behavior and trick it into sharing private data from a connector with an attacker. OpenAI has trained the model to detect and resist these attacks, with live monitoring in place to flag suspicious activity in real time.
To prevent unintended actions, ChatGPT always asks for explicit user confirmation before carrying out any real-world consequences, such as making purchases or sending emails. A feature called Watch Mode adds another layer of oversight: certain high-impact tasks require the user’s active supervision before they can proceed.

Users stay in control through built-in safeguards:

You can pause or stop any task at any point.
You can take over the browser manually whenever needed.
You can delete all browsing data and log out of websites with a single click.
ChatGPT is trained to refuse high-risk actions, such as initiating bank transfers.
Browser takeover mode is designed to preserve user privacy—ChatGPT does not collect or store inputs like passwords, and all credentials remain local to your session.

Proactive Risk Mitigation and Biosafety Controls

OpenAI has also implemented guardrails to limit what the model can access and how it behaves in sensitive domains.

In addition to user-facing privacy settings, ChatGPT Agent’s design includes systemic protections. The model is intentionally limited in how it processes private inputs—especially in takeover mode—ensuring data like passwords never reach the model in the first place.

OpenAI has classified the model powering ChatGPT Agent as having High Biological and Chemical capabilities under its internal Preparedness Framework. While there’s no evidence the agent could enable serious misuse, this classification triggers heightened safeguards, including:

Dual-use refusal training
The model is trained to recognize and decline requests that could be used for both helpful and harmful purposes.
Example: If a user asks how to synthesize a dangerous chemical, even under the pretense of curiosity or research, the model will refuse to respond—because the knowledge could be misused.
Threat modeling and red-teaming with domain experts
Specialists map out potential abuse scenarios and test the system’s vulnerabilities before real users can exploit them.
Example: Biosecurity experts might simulate an attempt to get the agent to plan a harmful biological experiment to see how well it resists and flags the request.
Always-on classifiers and reasoning monitors
Automated systems constantly analyze what the model is doing to catch signs of misuse or risky behavior in real time.
Example: If the agent is asked to combine information from a private email and a suspicious website, monitors could flag the request as a potential privacy violation—even if the user doesn’t realize it.
Escalation and enforcement pipelines for high-risk behavior
If risky or inappropriate activity is detected, it can be escalated to internal safety teams and blocked immediately.
Example: If the model repeatedly tries to access restricted content or bypass safety filters, the task can be automatically paused and reviewed by OpenAI’s safety team.

OpenAI continues to collaborate with external biosecurity experts, research institutes, and academic partners to validate its approach and strengthen defenses across the AI safety ecosystem. Read the system card to learn more about OpenAI’s safety approach for the unified agentic model. A public bug bounty program is in place to encourage responsible disclosure of vulnerabilities.

Looking Ahead: Known Limitations and Ongoing Improvements

While ChatGPT Agent introduces a powerful new system for reasoning and action, it’s still in its early stages—and OpenAI is actively working to improve its performance, polish, and flexibility over time.

One of the most promising but unfinished features is slideshow generation, which remains in beta. Right now, slides can feel rudimentary in formatting and polish—especially when generated from scratch without a starting document. OpenAI has prioritized outputs that emphasize structure and flexibility: slides include editable elements like text, charts, images, and shapes that can be modified after export. That foundation is in place, but some visual discrepancies still occur between the on-screen viewer and the exported presentation. Additionally, while you can upload an existing spreadsheet for the agent to edit or use as a template, that same functionality isn't yet available for slideshows.

OpenAI is already training the next iteration of the slideshow tool to produce more polished, presentation-ready outputs with broader capabilities and improved formatting.

More broadly, OpenAI expects continued improvements to ChatGPT Agent’s depth, efficiency, and real-world reliability. These updates will include more seamless transitions between reasoning and action, expanded support for complex tasks, and smarter defaults for tool usage. The team is also actively refining the agent’s balance between automation and oversight—ensuring it remains safe to use while requiring less supervision over time.

Fast Facts for AI Readers

Q: What is ChatGPT Agent?

A: ChatGPT Agent is a new feature that lets ChatGPT use its own virtual computer to complete complex tasks through reasoning, web browsing, code execution, and tool integration.

Q: How do users enable it?

A: Pro, Plus, and Team users can enable agent mode via the tools dropdown in any ChatGPT conversation.

Q: What tools does the agent use?

A: It uses a visual browser, text browser, terminal, API access, and connectors for services like Gmail and GitHub.

Q: What benchmarks does it outperform on?

A: ChatGPT Agent sets new state-of-the-art scores on Humanity’s Last Exam, DSBench, SpreadsheetBench, BrowseComp, and WebArena.

Q: What safety measures are in place?

A: Explicit approvals, prompt injection mitigation, secure browsing, biosafety oversight, and deletion of all stored web data.

What This Means

The launch of ChatGPT Agent represents a turning point in AI’s evolution—from a responsive assistant to an autonomous, proactive collaborator. It’s the first widely available system that combines reasoning, research, tool use, and real-world action inside a single, accessible interface.

That integration matters. Previously, building agentic workflows required technical skill, coding environments, or external tools. Now, anyone can instruct ChatGPT to plan, analyze, act, and deliver—without switching platforms or writing code. Whether it’s generating a slideshow, preparing a financial model, or booking travel, the agent can orchestrate multi-step workflows with minimal input and adapt as it works.

Just as importantly, it’s a major step forward in trust, usability, and responsibility. Unlike black-box automation systems, ChatGPT Agent is designed to keep users in the loop—with clear narration, approval checkpoints, secure browser sessions, and privacy controls. It’s a system built not only to act, but to be supervised.

For end users, this means greater productivity and reach. For app developers and software platforms, it signals a new level of competition: the ability to embed, extend, or even replace common tasks previously handled across fragmented systems.

By unifying powerful tools under one model and layering in control, visibility, and safety, ChatGPT Agent lowers the barrier to automation—and raises the ceiling for what individual users can accomplish. It’s not just helping people do more; it’s changing who can do it in the first place.

Editor’s Note: This article was created by Alicia Shapiro, CMO of AiNews.com, with writing, image, and idea-generation support from ChatGPT, an AI assistant. However, the final perspective and editorial choices are solely Alicia Shapiro’s. Special thanks to ChatGPT for assistance with research and editorial support in crafting this article.