Inbound Call Failures

Incident Report for Gradwell Communications Ltd

Postmortem

Description of outage and impact:

On Friday, We had reports from customers experiencing intermittent issues with inbound SIP calls beginning around 14:15. We isolated this to one of our specific carriers and we escalated upstream.

While numbers with failover routing remained functional, the sporadic nature of the issue initially obscured its severity. Our upstream supplier issued a service notice followed by an initial restoration of service and an all-clear notice at 16:34. However, the issue recurred intermittently.

Later, at approximately 17:50, additional issues were reported, including number translation targets being out of sync and some outbound calls failing. These affected a smaller group of clients but were accompanied by a return of the inbound SIP call failures. Full resolution was achieved by 19:48, with continued monitoring leading to a final all-clear issued at 21:00

Cause & Resolution:

The primary cause of the outage was a memory leak in the monitoring and error reporting systems. This leak depleted memory available to core network routing systems, resulting in their malfunction. The issue was exacerbated by the fact that the fault lay within the monitoring infrastructure itself, hindering early detection and resolution. Additionally, because the problem was intermittent, it did not affect all four independent systems equally, complicating diagnosis.

A separate, minor fault also occurred during this period, affecting the propagation of changes made via the numbering API. This was traced to a new network component in the final stages of testing and was not resolved promptly due to focus on the more critical routing issue.

Prevention of recurrence:

To prevent a recurrence, immediate corrective actions were taken to address the memory leak and ensure monitoring systems no longer pose a risk to routing operations. Furthermore, alternative methods are being developed to detect similar issues in other parts of the network more effectively. The separate issue with the numbering API has been isolated, and further testing will be conducted to ensure stability before deployment.

Our upstream supplier passes on their apologies.

Gradwell sincerely apologises for the inconvenience caused. If you are still experiencing inbound call failures, please contact our support team on support@gradwell.com.

Posted May 06, 2025 - 09:21 BST

Resolved

Hello,

The issue affecting inbound calls now appears to be resolved. Our upstream carrier are continuing to investigate and will share a full Reason for Outage (RFO) as soon as it is available.

We sincerely apologise for the disruption this may have caused you and your customers this afternoon, and we appreciate your patience and understanding.

Kind Regards,
Lisa
Posted May 02, 2025 - 16:43 BST

Update

Hello,

I’m afraid the intermittent issue with our inbound services is ongoing. The upstream carrier is still investigating.

Please accept our sincerest apologies. We will provide a further update by 5pm.
Posted May 02, 2025 - 16:11 BST

Investigating

Hello,

We are currently aware of an ongoing issue affecting inbound calls via one of our upstream carriers. They have reported a disruption impacting inbound traffic to their network.

They are actively investigating the issue, and we are in contact with them for further updates. At this time, the incident has not yet been resolved.

Please accept our sincerest apologies. We will provide a further update by 4pm.

Kind Regards,
Gradwell Communications
Posted May 02, 2025 - 15:31 BST
This incident affected: Voice & Calls Services (Inbound SIP trunking, Inbound IAX Trunking).