On Friday, We had reports from customers experiencing intermittent issues with inbound SIP calls beginning around 14:15. We isolated this to one of our specific carriers and we escalated upstream.
While numbers with failover routing remained functional, the sporadic nature of the issue initially obscured its severity. Our upstream supplier issued a service notice followed by an initial restoration of service and an all-clear notice at 16:34. However, the issue recurred intermittently.
Later, at approximately 17:50, additional issues were reported, including number translation targets being out of sync and some outbound calls failing. These affected a smaller group of clients but were accompanied by a return of the inbound SIP call failures. Full resolution was achieved by 19:48, with continued monitoring leading to a final all-clear issued at 21:00
The primary cause of the outage was a memory leak in the monitoring and error reporting systems. This leak depleted memory available to core network routing systems, resulting in their malfunction. The issue was exacerbated by the fact that the fault lay within the monitoring infrastructure itself, hindering early detection and resolution. Additionally, because the problem was intermittent, it did not affect all four independent systems equally, complicating diagnosis.
A separate, minor fault also occurred during this period, affecting the propagation of changes made via the numbering API. This was traced to a new network component in the final stages of testing and was not resolved promptly due to focus on the more critical routing issue.
To prevent a recurrence, immediate corrective actions were taken to address the memory leak and ensure monitoring systems no longer pose a risk to routing operations. Furthermore, alternative methods are being developed to detect similar issues in other parts of the network more effectively. The separate issue with the numbering API has been isolated, and further testing will be conducted to ensure stability before deployment.
Our upstream supplier passes on their apologies.
Gradwell sincerely apologises for the inconvenience caused. If you are still experiencing inbound call failures, please contact our support team on support@gradwell.com.