Broadband Authentication Issues
Incident Report for Gradwell Communications Ltd
Postmortem

Please find an abridged version of the reason for outage [RFO] supplied in relation to the Broadband Authentication outage. We are working with our supplier to ensure they mitigate the risk or recurrence and apologise for any inconvenience this may have caused.

A full copy of the RFO is available upon request.

Description of outage and impact:

Our upstream supplier Wavenet experienced an outage on one of their core routing devices, this impacted Gradwell customers with TalkTalkBusiness or Zen Internet connections in that the lines dropped authentication and were unable to reauthenticate until resolution of the issue(s).

The supplier outage lasted between 11:05 GMT 30th January and 13:15 GMT 30th January.

Cause & Resolution:

Wavenet network operations centre [NOC] engineers performed full diagnostics and following a thorough investigation they identified a routing change to a BGP peer on an edge router, which caused the BGP process to stop running. Wavenet NOC engineers immediately made the correction to restore the BGP process on the affected device at 11:13 GMT. Monitoring identified that over 75% of sessions instantly re-connected following the restoration work.

We received further alerting at 11:20 GMT, identifying slow responses from the primary authentication server due to an increased demand. Load balancing was altered to steer more authentication requests to the secondary authentication server and at approximately 11:50 GMT our NOC engineers confirmed that the primary RADIUS server had stabilized.

Root Cause:

The root cause has been identified as human error during a standard network change to an edge router.

Prevention of recurrence:

A detailed review of the process within the Wavenet network engineering team will be completed and will
include a review of whether it is appropriate to continue use of manually executed commands moving
forward.

Posted Feb 04, 2020 - 13:00 GMT

Resolved
This incident has been resolved.
Posted Jan 30, 2020 - 16:46 GMT
Monitoring
We have had confirmation from our upstream supplier that all services have been restored.
If you are still experiencing issues with your broadband service, please contact our support team on 01225 800888.

We sincerely apologise for the inconvenience & impact this has caused you.

An RFO will be posted in due course.
Posted Jan 30, 2020 - 14:09 GMT
Identified
Our upstream supplier has confirmed broadband customers affected by the major incident connecting into one of their Manchester Data Centres are slowly recovering.

We expect full restoration to all services affected at 14:00.

We sincerely apologise for the inconvenience caused.

A further update will at 14:30.
Posted Jan 30, 2020 - 13:19 GMT
Update
Investigations are still ongoing between our suppliers NOC engineers. Their Major Incident team are continuing to work hard to resolve the issue which is service to Broadband customers connecting into one
of the Manchester Data Centres.

A further update will be provided at 13:30 pm.

We sincerely apologise for the inconvenience caused.
Posted Jan 30, 2020 - 11:46 GMT
Investigating
We have received reports from customers who are experiencing issues with their broadband services.

We are currently investigating the issue and a further update will be provided at 11:45.

We apologise for the inconvenience caused.
Posted Jan 30, 2020 - 11:16 GMT
This incident affected: Connectivity (ADSL Broadband, FTTC).