top of page

A feature to effortlessly activate or schedule maintenance mode for On-premises physical servers, vCentres, and ESXi hosts with simple user interactions and an efficient solution to minimise manual efforts.

Context

Cohesity is a data security and management company that does not traditionally provide primary storage; instead, it focuses on secondary data workloads, helping enterprises protect/backup, restore, and manage copies of production data optimised for cost, security, and scale across cloud-native, virtual, physical, and SaaS environments. Additionally, provide disaster recovery, data security, AI insights, vaulting, governance, compliance and other value-added services, all under a single architecture.​

With 85 of the Fortune 100, three of the top‑five U.S. banks, and U.S. health insurers, ​Cohesity serves over 12,000 enterprise customers, including public-facing customers such as Nasdaq, Broadcom, Nationwide, Salesforce, Cisco, and Siemens, etc, illustrating adoption across public & private enterprises, finance, tech, healthcare, education, insurance, and aerospace, etc.

image.png
image_edited.jpg

Backup and Recovery are the two core activities of the Cohesity Helios product:

  • Backup ensures that enterprise data from diverse environments—cloud, virtual, physical, and SaaS—is regularly and efficiently protected, creating secure copies for disaster scenarios or data loss.

  • Recovery enables organisations to quickly restore data, minimising downtime and ensuring business continuity after events like cyberattacks, accidental deletions, or system failures.

The Problem

Managing Secondary Storage for on-premises clusters presents its own set of maintenance-related challenges for customers. Ensuring data backup & recovery integrity and minimising Downtime/Blackout Window during these operations can be challenging, especially as data volumes grow.​

 Stakeholders // End Users 

The Storage Architect lead the technical design & architecture of backup infrastructure, ensuring it aligns with organisational data protection goals and strategic deployment of Cohesity systems at scale across hybrid data environments. Develop blueprints, sizing, integration, and automation strategies Additionally, the backup infras

Backup Admin sets up and manages day‑to‑day infrastructure operations such as backup(policy configurations), recovery, compliance, error troubleshooting and cluster health monitoring. Oversee maintenance operations infrastructure consistency, coordinate audits and compliance checks for seamless backup operations.

 Pain-Points 

1

Lack of Maintenance-Aware Source Management

Backup and recovery failures occur because sources cannot be safely taken offline during planned or on-demand maintenance activities.

2

Manual and Error-Prone Workarounds

Administrators are forced to delete and re-add sources during maintenance, making the process labour-intensive, risky, and unscalable—especially in large or frequently maintained environments.

3

Increased Downtime and Data Risk

These manual workflows increase downtime windows and the likelihood of data loss, directly impacting business continuity and operational reliability.

4

Maintenance as an Operational Bottleneck

Routine activities such as software patching, OS upgrades, hardware replacements, and migrations require excessive coordination across backup, storage, and IT teams, slowing down execution and increasing failure points.

 Key Insights 

1

Maintenance Is a Team Sport, Not a Solo Action

Maintenance isn’t owned by one role. Backup Admins, Storage Architects, and Infra/App teams must coordinate before, during, and after maintenance. Users expect the system to understand and support this shared responsibility—especially when  restarting backups or recoveries are required to meet SLAs.

2

Maintenance Is Duration-Driven, Not Date-Driven

Maintenance rarely fits neatly into fixed start and end times. It’s more often defined by how long an activity takes—sometimes ending early, sometimes running longer than planned. In unplanned or reactive scenarios, users prioritise speed over scheduling, opting for immediate action.

3

Backup Windows Dictate Maintenance Decisions

Users plan maintenance around predefined backup windows to avoid failures and workload impact. Since backups can overrun, maintenance is often delayed, shortened, or squeezed in—indicating hesitation and lack of confidence in scheduling maintenance upfront.

4

Fear of Failure Drives Maintenance Avoidance

Non-critical maintenance is frequently postponed due to fear of cascading backup failures, extended downtime, and complex recovery—especially in large, multi-source environments. Maintenance is seen as risky, not routine.

5

Workarounds Have Become the “Normal” Way of Working

Deleting and re-adding sources is widely accepted as a maintenance workaround, despite being inefficient and error-prone. Users also switch across multiple screens to assess impact, increasing cognitive load—highlighting a gap between real operational needs and system support.

 Touch Points 

Design Brief

 "Design a solution that enables Backup Admins and Storage Architects to safely place sources under maintenance to support both planned and on-demand maintenance activities without relying on manual workarounds—while ensuring zero data loss and uninterrupted protection integrity." 

 The experience should be seamlessly integrated into existing protection and recovery workflows, providing clear visibility, easy monitoring, reporting and a smooth transition back to normal operations once maintenance is completed. 

Proposed UX Flow

Explorations

Prototype

 Quick Maintenance 

 Schedule Maintenance 

 Edit Scheduled Maintenance 

 Status Tooltips // Notifications 

 Reporting Modules 

 New Protection Group Flow 

 Existing Protection Group 

 New Recovery Flow 

 Recovery In Progress 

User Feedback

Preference for Explicit Status Indicators and GuardrailsAdmins rely heavily on visual cues—banners, warnings, filters, and tooltips—to ensure sources are correctly under maintenance and not accidentally protected or recovered during this period.

Hard to know if instances of a databases are under maintenance without opening up the hierarchy level. Basically if Physical servers are under maintenance inside of a Esxi Hosts, It is not reflected in the main Sources listing page. Leading to missing out on identifying it quick or unaware of it for other backup admins.

1

Improve Hierarchy-Level Maintenance Visibility

It is currently difficult to identify when lower-level resources—such as ESXi host under an vCentre or database instances within a source—are under maintenance without drilling down through multiple hierarchy levels. This enhancement will makes it easy to overlook active maintenance states from the main Sources list, especially in large environments with multiple administrators. 

1.png

2

Provide a Centralized Maintenance Overview

A centralized dashboard widget or global summary panel displaying all sources currently under maintenance would significantly improve situational awareness. This would allow backup administrators to quickly understand ongoing maintenance activities across clusters without navigating through multiple modules.

2.png

3

Introduce Maintenance Notifications and Role-Based Alerts

Scheduled maintenance reminders (start and end notifications) would help prevent maintenance windows from being unintentionally extended. Additionally, enabling role-based alerts—such as email or Slack notifications—would ensure that all relevant administrators are informed when maintenance actions are scheduled, started, or ended, improving cross-team coordination.

3.png

Next Steps

Add a Title

Change the text and make it your own. Click here to begin editing.

Add a Title

Change the text and make it your own. Click here to begin editing.

Add a Title

Change the text and make it your own. Click here to begin editing.

1

Broader Source Coverage

Extend Maintenance Mode support across additional source types such as Databases, SQL workloads, AWS services, and MongoDB—ensuring a consistent and scalable maintenance experience across the entire data ecosystem.

2

Intelligent Auto-Closure of Maintenance

Introduce rule-based and condition-driven auto-closure of Maintenance Mode, tightly aligned with patch updates or system readiness signals, to eliminate manual intervention and reduce the risk of extended downtime and backup failures.

3

AI-Assisted Maintenance Scheduling

Enable smarter planning by factoring in background activities like backups, replication, recovery, and threat scans. Leverage AI to recommend optimal maintenance windows based on workload behaviour and the nature of the maintenance task.

bottom of page