Gradwell Internet for business people

Gradwell Service News

RESOLVED: Customer MySQL DB server and PBX updates

PROBLEM DESCRIPTION

voip-manager and mysqldb are currently experiencing an outage.

At approximately 18:00 BST, our systems team became aware of an issue affecting both our provisioning and one of our customer-facing database servers.  This fault appears to have started at around 17:00, and was a progressive problem, as the relevant servers became less responsive over time.  We are presently investigating the fault, and working with our Telehouse operations team to rectify the problem as soon as possible.

Affected Services:

PBX provisioning

Customer databases on mysqldb.gradwell.com

Customer Impact:

Whilst all VoIP services are up, changes cannot be processed, so any updates performed via the control panels will not become live.

All customer databases hosted on the mysqldb.gradwell.com service. This will not affect other servers, such as mysql5db-1 or our other two MySQL 4-based servers.


Estimated Resolution Time:

We are presently awaiting on-site engineers restarting the affected services and will provide an update within an hour.

***Update 19:38***

Our on-site ops team have been unsuccessful in attempts to restart the machine so this would appear to be a hardware failure. Our sysadmin and VoIP dev team are currently building a replacement for this machine. At present we are unable to offer a concrete ETA but we will update again as soon as possible with any updates. Our next scheduled update will be in approximately one hour.

***Update 21:26***

VoIP: Our team have now restored our VoIP provisioning systems.  If any further issues are seen with VoIP-related updates, please contact our support team.

Hosting: Due to this hardware failure, we are restoring database access to a recent backup (approximately 1AM this morning).  Changes to databases on mysqldb.gradwell.com today will have been lost.  Other database servers are not affected, and we apologise for any inconvenience this may cause you.  This work should complete shortly.

*** Update ***

This issue was resolved on the day the alert was announced on www.gradwellstatus.com  It has now been closed as a historical problem.  Please note that the replacement MySQL server runs MySQL 5

Scheduled Maintenance for DSL Platform; 10/06 @ 02.00

This is advanced notice that we will be performing maintenance to our DSL platform on 10/06/2010 at 02.00 am.

We do not expect this maintenance to last for more than 2 hours.

Thursday, 10th June, 2010

2AM to 4AM BST

Affected Services:

DSL Platform, customers may not be able to establish a connection if their PPP session terminates.

Customer Impact:

The window for this maintenance is scheduled to last 2 hours, during this time NEW connections to our DSL platform may not be possible, existing connections SHOULD not be affected.

Maintenance Action:

This work is being performed to expand the capacity on the DSL platform and also to introduce additional QoS services on our network.

In order to benefit from these enhancements that we are making, you will need to re-establish your DSL session after the maintenance window closes by simply power cycling your router.

Connections that are not re-established will continue to work on the existing service.

Resolved: Call drops on hosted VoIP platform - SIP only

Affected Services:

VoIP - SIP Intermittent call drops using hosted VoIP platform (sip.gradwell.com/sip.trunk.gradwell.com)

Customer Impact:

Customers may be seeing some intermittent call drops. IAX trunk users should be unaffected

Action:

We are implementing a code fix to the live platform and are rolling out these changes now.

***Update 12:20***

We have rolled out the code fix to all live servers and this seems to have worked as expected. We will continue to monitor for the next few hours to make sure there are no further problems

***Update 15:10***

We have observed no further issues with the SIP hosted VoIP platform and hence are happy to close off this status update. We apologise for any problems these earlier issues may have caused you

Resolved:Customer services telephone line problem

Affected Services:

All services are available. However, due to an onsite hardware failure at our Bath office our customer services line is currently unavailable.

Customer Impact:

Customers will not be able to contact the customer services team by telephone, email is working as normal.

Action:

We are investigating the hardware failure and will update again before 1000.

***Update 10:20***

All telephone lines into our support office have been restored and are now functioning correctly. We apologise for any problems this has caused you.

RESOLVED: POP3 outage for Mailboxes on Glacier

Affected Services:

Email - Customers accessing their mailboxes by POP3, if accessing by IMAP then the connections work

Customer Impact:

Customers will receive an error when trying to access their mailboxes. If you need access to the mailbox you may do so by logging into our webmail service here: https://webmail.gradwell.com/horde/imp/login.php

This is only affecting customers who are hosted on our Glacier file storage. You can check what File Storage you are hosted on in your control panel in the email standard section.

Action:

Our System Administration Team are currently investigating the cause of this and hope to restore service as soon as possible.

UPDATE:

Our System Administration Team have identified the cause of the problem and are now working on a solution for this issue.

Next Update: 10:00

Closing Update:

The issue has now been resolved by our system admins and the service is accepting pop3 connections correctly. We apologise for the inconvenience caused by this and wish to assure that we are deploying additional monitoring abilities to stop this from happening in the future.

COMPLETED:Scheduled maintenance for master databases Friday 4th June 2010 01:00 - 03:00

Scheduled maintenance for master database Friday 4th June 2010 01:00 - 03:00
We will be performing maintenance to our master database during the above window.
We do not expect this maintenance to last for more than 2 hours.

Affected Services:
Customers may not be able to make prepay credit calls for a short (10 minute) period when the master database is offline.  All other services are potentially affected by this maintenance, including our web hosting, email and voice platforms.

Customer Impact:
The window for this maintenance is scheduled to last 2 hours but we expect customers to only experience 10 minutes partial loss of service.

Maintenance Action:
The database server must be taken offline to allow us to manually repair a corrupted table within an internal database.

COMPLETED: Hosted PBX/single line registration maintenance

Starts: 2010-06-01 23:00 Ends: 2010-06-01 23:59

We will be performing maintenance on the VoIP registration platform to ensure that our state databases are consistent this evening.  These checks should not interrupt the service for customers, although the network should be considered ‘at risk’ at this time.  These consistency checks do not affect SIP or IAX trunk customers.

This maintenance was completed successfully.

Virtualisation SAN outage

We have been alerted to a fault on one of our virtualisation SANs which is affecting a large amount of our infrastructure. An engineer is on his way to repair the fault and we hope to update this notice shortly with further information.

UPDATE 1727: Phones using the gradwell SIP platform should now be working again. If you are continuing to have problems please try rebooting your device.

UPDATE 1957: Most services should have come fully back to normal within the last few moments. Secondary DNS customers may find that due to some zonefile corruption their zones will not be served from our autoritative servers yet. The zone files are rebuilding at the moment and we will update when this is completed.

UPDATE 2024: Some residual issues with outbound call setup times have now been resolved.

UPDATE 0035: All services appear to have been running normally for some time. If you are experiencing any continuing issues please raise a ticket with support.

RESOLVED: empty home folders

It has been brought to our attention that following the storage issues we experienced yesterday that some customers have been left with empty home folders. This was due to a backend script creating duplicate home folders. These duplicates have now been removed and all home folders should show the correct data. If you are still having issues please contact support.

RESOLVED: Problems with fileserver flathead

We are currently experiencing ongoing issues with one of our fileservers flathead. Currently the server is offline completely, however, we are hoping to bring it back online shortly.

When this happens some customers may find that their files are out of date. We will be initially starting the server off our latest backup. During the day we will by attempting to restore more up to date data, we will update further on this as more details are available.

Please accept our apologies for any inconvenience this causes, unfortunately the situation is unavoidable. Engineers have been working through the night trying to restart the old partition but this is not possible under full traffic.

***Update 15:04***

‘Flathead’ is now back up and running and has been for a while, websites should be displaying correctly and no errors should be shown. Our sysadmin team are still onsite monitoring this server closely and we will update again shortly.

Again our apologies for any inconvenience this has caused you.

*** Update 17:20 ***

We were aware of a number of intermittent web page access problems. we have spent the last hour checking individual sites and are confident we have ironed out any remaining issues in our web cluster and that this problem is now resolved.