Business Continuity Management / Disaster Recovery , Governance & Risk Management , Security Operations
Microsoft Experiences Second Major Cloud Outage in 2 Weeks'Recent Changes' Blamed as Outlook.com Webmail and Calendar APIs Left Inaccessible
Microsoft has suffered its second major cloud outage in less than two weeks.
Late Monday night in Redmond, Washington, where the technology giant is based, it reported that some services, including Outlook.com webmail, became inaccessible for users in North America and beyond. The outage has continued into Tuesday.
"Users primarily located in the North American region attempting to access Outlook.com may be unable to send, receive, or search email. Additional functionality such as the calendar consumed by other services such as Microsoft Teams would also be affected," Microsoft's Office.com service status page read.
The Downdetector site, which crowdsources reports of website and service outages, shows a spike in user reports of Outlook problems beginning at 3:24 a.m. UTC.
The outage appears to only affect Microsoft's consumer-focused services. Outlook.com is its free webmail service formerly known as Hotmail, and not the same as Outlook for Web, or OWA, which is corporate-focused webmail.
Microsoft says, "Outlook.com functionality such as Calendar APIs consumed by other services such as Microsoft Teams are also affected." This appears to be a reference only to its consumer version of Teams.
Microsoft last suffered a major outage just 13 days ago, when "a wide-area networking routing change" that its internal teams made led to a worldwide disruption for Microsoft 365 users. Specifically, numerous Azure cloud services became inaccessible, including Outlook, Microsoft Teams, SharePoint Online, OneDrive for Business and more (see: Microsoft 365 Cloud Service Outage Disrupts Users Worldwide).
'Access and Service Issues for Outlook'
Microsoft first confirmed its latest outage Tuesday at 4:04 a.m. UTC, tweeting 20 minutes later: "We're investigating access and service issues for Outlook."
Shortly thereafter, Microsoft said the problem appeared to involve "recent changes" that had had an impact on "service functionality." After confirming that the unspecified changes were to blame, it began "targeted restarts to portions of our infrastructure that are impacted by the recent change" to try and resolve the problem.
"Our targeted resources are progressing, and we've seen slight improvement in some environments," Microsoft tweeted at 6:46 a.m. UTC. "Alternatively, we're exploring additional steps to expedite the resolution."
While the problem appears to involve North American infrastructure, disruptions are still being seen globally. "Users in additional regions beyond North America may experience some residual impact due to the affected portions of infrastructure in North America," Microsoft reported.
But as Microsoft continued restarting numerous systems, it reported seeing "gradual improvement from this issue for users located in some of the additional affected regions."
As of 9:37 a.m. UTC, Microsoft reported that services had yet to be fully restored. "We're applying targeted mitigations to a subset of affected infrastructure and validating that it has mitigated impact. We're also making traffic optimization efforts to alleviate user impact and expedite recovery," it said.
Later on Tuesday, Microsoft reported that the problem appeared to have largely been fixed, about 12 hours after it began. "Current status: We can see from telemetry that the majority of impact has been remediated, with service availability at 99.9%," Microsoft said. "We're continuing to monitor the environment and perform targeted restarts on back-end mailbox components which show residual impact to ensure recovery for all users."
Feb. 7, 2023 15:53 UTC: This story has been updated to include Microsoft's assessment that the incident has been almost fully remediated.