2024-03-18 - AMER & EMEA - DNSWatch DNS resolution issues
Incident Report for WatchGuard Technologies
Postmortem



Initial Event Summary: The WatchGuard DNSWatch DNS Resolution service experienced a service disruption in the Americas and Europe regions between approximately March 18th 09:55 UTC and March 18th 14:49 UTC. This resulted in the DNSWatch DNS Resolution service failing to process DNS requests. The event is now resolved, and all DNSWatch services are now operating normally for all users.



Initial Event Findings: At approximately 09:55 UTC on March 18th, 2024, our on-call engineers were alerted to a potential issue in the EMEA Gateway service and found that the resolvers in our Americas region were affected as well.  During our investigation, high CPU usage was observed on several backend servers resulting in an inability to process DNS requests.  The high CPU load was related to a recently updated service component.  Our teams worked to mitigate the impact caused and immediately began to rollback the changes, and at approximately 13:40 UTC, the Europe DNS Resolution service was restored, and at 14:50 UTC, the Americas DNS Resolution service was restored.



We're working on completing an in-depth analysis of this event.



We sincerely apologize for the impact on our affected customers, and we know the stability of the DNSWatch is important to you and your business. At WatchGuard, we will never be satisfied with anything less than perfect operational performance and will continue to do everything we can to drive improvements across our services.

Posted Mar 22, 2024 - 14:15 UTC

Resolved
We are no longer experiencing issues with DNSWatch resolution and this incident is now resolved. We apologize for any impact this may have had on you or your customers.
Posted Mar 18, 2024 - 15:35 UTC
Monitoring
DNSWatch resolvers are operational in EMEA and AMER regions, we're monitoring to ensure system stability. We'll post our next update in 60 minutes, if not sooner.
Posted Mar 18, 2024 - 14:31 UTC
Update
Our teams have deployed a fix in EMEA region and is now operational, DNSWatch is still experiencing intermittent resolution issues in AMER region and a fix is underway. We apologize for the impact this event has caused. We'll post our next update in 1 hour, if not sooner. As a workaround, customers experiencing errors can switch off DNSWatch.
Posted Mar 18, 2024 - 14:02 UTC
Update
We continue to work on intermitent resolution issues in DNSWatch in EMEA and AMER regions. Our teams are working to restore normal operations and we apologize for the impact this event has caused. We'll post our next update in 1 hour, if not sooner. As a workaround, customers experiencing errors can switch off DNSWatch.
Posted Mar 18, 2024 - 12:37 UTC
Identified
We continue to work on DNSWatch DNS resolution issues in EMEA. Our teams are working to restore normal operations and we apologize for the impact this event has caused. We'll post our next update in 1 hour, if not sooner. As a workaround, customers experiencing errors can switch off DNSWatch.
Posted Mar 18, 2024 - 12:02 UTC
Update
AMER resolvers are operational. We continue with issues in EMEA. Please check back in 1 hour for updates. As a workaround, customers experiencing errors can switch off DNSWatch. Thanks for your patience.
Posted Mar 18, 2024 - 11:01 UTC
Investigating
We are currently experiencing issues with our DNS server resolvers in EMEA and AMER. As a workaround, customers experiencing errors can switch off DNSWatch.
Posted Mar 18, 2024 - 10:26 UTC
This incident affected: DNSWatch:::AMER (DNS:::AMER) and DNSWatch:::EMEA (DNS:::EMEA).