Are there security risks with an AI controlling screens?

Yes, there are significant security and operational risks. An AI with screen control could, if misconfigured or compromised, potentially access sensitive data, execute unauthorized transactions, or cause irreversible changes to systems. Google notes that enterprise safeguards designed to prevent these issues, such as confirmation steps for irreversible actions, are currently optional for organizations.

How does this differ from older automation tools like RPA?

Traditional Robotic Process Automation (RPA) tools are typically script-based, following rigid, pre-programmed rules. Gemini 3.5 Flash, with its integrated 'computer use' and reasoning capabilities, is designed to be more intelligent and adaptive. It can understand context, make decisions, and adjust its actions in real-time based on what it 'sees' on the screen, making it more flexible for complex, variable tasks than traditional RPA.

Is Gemini 3.5 Flash with screen control available to everyone?

Yes, Gemini 3.5 Flash is generally available through Google Antigravity, the Gemini API (in Google AI Studio and Android Studio), and the Gemini Enterprise Agent Platform. It's also accessible to individual users via the Gemini app and AI Mode in Google Search as of June 24, 2026.

Image: courtesy of Thenextweb

techJune 25, 2026By Veridact EditorialUpdated Jun 25

Google's Gemini 3.5 Flash Can Now See and Control Your Screen, Raising Both Automation Potential and Enterprise Security Questions

Google has rolled out a significant update to its Gemini 3.5 Flash AI model, giving it the native ability to 'see' and control computer screens. This means the AI can interact with graphical interfaces, execute tasks, and navigate digital environments much like a human user. While Google is pushing this capability towards enterprise customers, offering it through its Gemini API and Enterprise Agent Platform, the company has made crucial security safeguards for these powerful new agents optional, creating a tension between advanced automation and the need for robust control within corporate environments. The update, made generally available on June 24, 2026, marks a notable step in AI's integration into daily digital workflows, but also introduces complex considerations for businesses weighing efficiency against operational risk.

Outlook

The core change is that Gemini 3.5 Flash now includes 'computer use' as a built-in function. Previously, developers building AI agents that needed to interact with graphical interfaces often had to call upon a separate, dedicated model for such tasks. With this update, that capability is integrated directly into Flash. Developers can activate this screen interaction feature as one of several tools within the model, alongside its existing abilities for code execution, search, and function calling.

What does this mean in practice? Google product manager Mateo Quiros described it as giving Flash the capacity to observe a screen, understand what it's seeing, make decisions based on that understanding, and then execute actions on that screen. This could range from filling out complex forms, navigating software applications, or even automating multi-step processes that typically require human input across various digital platforms.

This enhanced version of Gemini 3.5 Flash became generally available on June 24, 2026. It is accessible through several Google platforms, including Google Antigravity, the Gemini API within Google AI Studio and Android Studio, and crucially, the Gemini Enterprise Agent Platform and Gemini Enterprise offerings. For individual users, the updated Flash model is also available directly in the Gemini app and through AI Mode in Google Search.

For enterprise users of the Gemini Enterprise app, there's an important operational detail: the feature management toggle for Gemini 3.5 Flash was removed after June 8, 2026. This means that as of that date, Gemini 3.5 Flash is enabled by default and cannot be individually turned off by users within the enterprise app. This default-on approach signals Google's intent to rapidly integrate these capabilities into business workflows.

Background

The introduction of native screen control within Gemini 3.5 Flash represents a significant evolution in how AI models are designed and deployed. Historically, automating tasks on a computer interface often involved robotic process automation (RPA) tools, which are typically script-based and follow rigid, pre-programmed steps. While effective for repetitive, predictable tasks, RPA struggles with variations or unexpected changes in an interface.

AI models with 'computer vision' capabilities could 'see' screens, but their ability to 'reason' about what they saw and then 'act' on it was often limited or required extensive custom development. Google's move with Gemini 3.5 Flash aims to bridge this gap, offering a more intelligent, adaptive form of automation. By integrating screen interaction directly into a large language model, Google is positioning Flash as a more versatile 'agent' that can understand context and adapt its actions, moving beyond simple scripting.

This development comes as Google faces intense competition in the enterprise AI space. Rivals are also pushing advanced AI agents capable of automating complex workflows. Google's strategy appears to be about offering a comprehensive, integrated suite of AI tools that can handle a broader range of tasks natively, without requiring developers to stitch together multiple specialized models. The company's internal testing with partners like Armadin, which reported a 19.6% improvement over Gemini 3 Flash on Box’s enterprise work evaluation set, suggests a focus on real-world business performance.

However, the decision to make enterprise safeguards for this screen-controlling AI optional introduces a layer of complexity. While Google aims to build enterprise trust, allowing organizations to skip confirmation steps for irreversible actions presents a clear operational exposure. This choice reflects a balancing act: offering maximum flexibility and speed for rapid deployment, while also placing the burden of risk management squarely on the adopting enterprise.

Precedents

The tension between technological power and institutional control is a recurring theme in the history of enterprise software. From the early days of networked computing to the adoption of cloud services, businesses have consistently grappled with the trade-offs between enhanced capabilities and the potential for new vulnerabilities.

When new, powerful technologies emerge, companies often rush to adopt them for competitive advantage. The initial phase typically sees a focus on raw functionality and speed of deployment. Security and governance, while acknowledged, can sometimes become secondary considerations or optional add-ons, particularly if they are perceived to slow down innovation or implementation. This was evident in the early adoption of public cloud infrastructure, where many companies initially prioritized agility and cost savings, only later grappling with complex data sovereignty and security challenges.

Similarly, the rise of Robotic Process Automation (RPA) over the past decade saw businesses rapidly deploy software robots to automate mundane tasks. While RPA offered significant efficiency gains, early implementations sometimes lacked robust controls, leading to instances where 'bots' executed incorrect transactions or inadvertently exposed sensitive data. The lessons learned from RPA highlighted the critical need for comprehensive auditing, human oversight, and clear exception handling when automating processes that touch core business operations.

Google's approach with Gemini 3.5 Flash, where powerful screen control capabilities are offered with optional safeguards, echoes these historical patterns. The default-on nature for enterprise app users after June 8, 2026, combined with the option to bypass confirmation for irreversible actions, creates an environment where rapid adoption could potentially outpace robust governance. This mirrors previous cycles where the 'move fast' mentality of technology providers has collided with the 'control risk' imperative of large enterprises. The industry has a history of eventually converging on more stringent default security, often driven by high-profile incidents or regulatory pressure.

This update to Gemini 3.5 Flash carries significant implications for how businesses operate and for the broader trajectory of AI adoption within the enterprise. At its core, the ability for an AI to 'see' and control screens opens up vast new possibilities for automation that were previously difficult or impossible to achieve with existing tools.

For businesses, this could mean a dramatic increase in efficiency for tasks that span multiple applications, require human-like navigation, or involve complex data entry and extraction from graphical interfaces. Imagine an AI agent that can automatically process invoices across different vendor portals, reconcile financial statements by logging into various accounting systems, or manage customer support queries by interacting directly with CRM software. The potential for cost savings and accelerated workflows is substantial.

However, the real stakes lie in the balance between this newfound power and the inherent risks. The fact that enterprise safeguards are optional means that organizations, if not careful, could deploy AI agents capable of taking 'irreversible actions without a confirmation step' in environments that touch sensitive systems. This is not a theoretical concern; it's a direct operational exposure. An incorrectly configured or malfunctioning AI agent could, for example, mistakenly delete critical data, approve fraudulent transactions, or inadvertently expose confidential information, all without human intervention to stop it.

This places a heavy burden of responsibility on enterprises to thoroughly understand, configure, and monitor these AI agents. It challenges IT departments and compliance officers to develop new governance frameworks for AI that can operate autonomously. The question for many will be: how much trust can truly be placed in an AI that can act on its own, especially when the very controls designed to prevent errors are not mandatory?

What this changes is the fundamental relationship between human operators and AI. It moves AI beyond being a sophisticated tool or assistant and closer to being an autonomous digital employee. This shift demands a re-evaluation of cybersecurity protocols, data privacy policies, and even the legal liability associated with automated actions. Google's move pushes the boundary of what AI can do in the enterprise, but it also forces companies to confront the hard questions about how much autonomy they are willing to grant these powerful new digital agents.

Scenarios

Analysis

The introduction of Gemini 3.5 Flash's screen control capabilities could lead to several distinct outcomes across the enterprise technology landscape.

One likely outcome is a rapid acceleration of AI-driven automation projects within businesses. Companies that have been waiting for more sophisticated, adaptable AI agents to handle complex, multi-application workflows will likely view this as a significant enabler. This could lead to increased investment in Google's AI platforms and a push to integrate these agents into various operational silos, from finance and HR to customer service and IT support. The initial adopters may prioritize speed and efficiency, potentially overlooking some of the optional safeguards in their eagerness to gain a competitive edge. This could drive significant productivity gains for those who implement effectively.

Conversely, a second outcome could be a period of heightened scrutiny and cautious adoption, particularly among larger, more regulated enterprises. The inherent risks associated with an AI taking irreversible actions without confirmation, coupled with the optional nature of safeguards, may prompt compliance and security teams to implement rigorous internal policies and testing protocols before widespread deployment. This could slow down adoption in sectors like financial services, healthcare, or government, where regulatory compliance and data security are paramount. These organizations may demand more robust, mandatory safeguards or seek third-party solutions that offer stricter control mechanisms, potentially pushing Google to re-evaluate its default security posture in future iterations.

A third possibility involves increased competition and innovation in the AI agent market. Other major cloud providers and specialized AI companies will likely respond by developing or enhancing their own screen-controlling AI capabilities, aiming to offer more robust security features, better performance benchmarks, or more tailored solutions for specific industries. This competitive pressure could drive down costs, improve the accuracy and reliability of AI agents, and ultimately lead to a more diverse ecosystem of automation tools, each with varying levels of autonomy and control. We may also see new categories of 'AI governance' or 'AI auditing' software emerge to help enterprises manage the risks inherent in these powerful agents.

Timeline

2026-05-20

Google NotebookLM Enterprise: Podcast API Deprecated

Google announced the deprecation of the Podcast API for its NotebookLM Enterprise platform, signaling ongoing adjustments to its enterprise service offerings.

2026-06-08

Gemini 3.5 Flash Default-Enabled for Enterprise

After this date, the feature management toggle for Gemini 3.5 Flash was no longer available, meaning the model became enabled by default and could not be turned off for users in the Gemini Enterprise app.

2026-06-24

Gemini 3.5 Flash General Availability with Screen Control

Google officially made Gemini 3.5 Flash generally available, integrating native 'computer use' capabilities that allow it to see, reason about, and take action on computer screens. It became accessible via Google Antigravity, the Gemini API, and the Gemini Enterprise Agent Platform, among others.

Frequently Asked Questions

'Screen control' means Gemini 3.5 Flash can visually interpret what is displayed on a computer screen and then interact with it, much like a human user. This includes clicking buttons, typing text, navigating menus, and performing actions within any graphical interface. It allows the AI to automate tasks across different software applications and web pages.

Discussion

Be the first to share your thoughts.