Inbound and outbound call failures

Incident Report for Gradwell Communications Ltd

Postmortem

Description of outage and impact:

On Friday 30th May, an alert was triggered indicating a problem on our core platform. Immediately, we began experiencing failures in both inbound and outbound call setup across the platform, resulting in a Major Service Outage. During the incident, customers would have experienced delays or timeouts when attempting to place or receive calls. Importantly, in-flight calls remained unaffected. The incident was immediately escalated to a priority one status, and Gradwell engineers were fully engaged in the investigation and resolution process.

Cause & Resolution:

The initial cause of the outage was traced to a deadlock in the call handling process. This deadlock rendered part of the system unresponsive, which led to significant delays and timeouts in call setup. The resolution occurred automatically when our tooling identified the failure and terminated the connections, allowing services to recover without manual intervention. This action restored call traffic in both directions. However, after restoring core functionality, it was observed that a small subset of customers continued to experience issues with inbound call traffic. Further investigation revealed that the issue stemmed from an upstream supplier who had begun routing traffic through new IP addresses without notifying Gradwell. Once the new IPs were identified and whitelisted, normal service was restored for the affected customers.

Root Cause:

The root cause of the incident was a deadlock in the call handling process that led to system-wide timeouts. A contributing factor to the extended impact for some customers was the unannounced change by an upstream supplier, who began sending traffic through new IP addresses that had not been pre-approved or communicated.

Prevention of recurrence:

Our alerting systems operated as expected and will remain in place. However, to further reduce the risk of recurrence, development work is already underway to redesign the post-call processing components of the call handling system, thereby preventing future deadlocks. Additionally, steps are being taken to improve coordination with upstream suppliers to ensure any changes in routing, such as new IP addresses, are communicated in advance and properly integrated into our systems.

Please accept Gradwell’s sincere apologies for the service disruption and the impact it has had on your business.

Posted Jun 02, 2025 - 16:28 BST

Resolved

This incident has been resolved.
Posted May 30, 2025 - 17:40 BST

Monitoring

Hello,

We are seeing inbound traffic return to normal and calls are successfully routing across our platform.

We sincerely apologise for any inconvenience caused by these issues.

If you continue to experience any issues, please contact our support team at support@gradwell.com or call 01225 800888.

Kind Regards,
Gradwell Communications
Posted May 30, 2025 - 13:27 BST

Update

Hello,

We are continuing to work on resolving this and apologise for the continuing issues seen.

A further update will be provide in 30 minutes.

Kind Regards,
Gradwell Communications
Posted May 30, 2025 - 13:15 BST

Investigating

Hello,

Whilst we're seeing outbound traffic as normal across the platform currently, we are seeing further inbound issues and are investigating these inbound call failures presently.

We apologise for the inconvenience caused by these.

We will provide a further update shorty.

Kind Regards,
Gradwell Communications
Posted May 30, 2025 - 12:40 BST

Monitoring

Hello,

We are seeing traffic return to normal and calls are successfully routing out across our platform.

We sincerely apologise for any inconvenience this may have caused. To provide full transparency, we will publish a Reason for Outage (RFO) report within ten working days.

If you continue to experience any issues or have questions, please don’t hesitate to contact our support team at support@gradwell.com or call 01225 800888.
Posted May 30, 2025 - 12:27 BST

Investigating

Hello,

We are currently investigating reports of inbound and outbound call failures.

We apologise for the inconvenience this is causing.

We will provide a further update shortly.

Kind Regards,
Gradwell Communications
Posted May 30, 2025 - 12:12 BST
This incident affected: Voice & Calls Services (Multi User VoIP, Outbound SIP Trunking, Outbound IAX Trunking, Inbound SIP trunking, Inbound IAX Trunking, Single User VoIP, International Phone Numbers, Wave, Teams Direct Routing).