Gradwell Internet for business people

Gradwell Service News

Email

RESOLVED: Power outage in Telehouse North

PROBLEM DESCRIPTION

Telehouse Power Failure

Several rooms in Telehouse North lost power at approximately 14:10 today.
This means that :

Some networks, or portions of networks with particular dependency on
Telehouse North will be off the air.
Your connection to some of your ISPs may terminate on equipment in
Telehouse North which has failed.
All networks that are unaffected will be handling much higher volumes
of traffic,leading to higher latency and packet loss even on connections
that are still available.

Affected Services:

All services

Customer Impact:

Dependant on users ISP peering/routing

Estimated Resolution Time:

We will update again by 16:30

***Update 15:13***

All systems should now be clearing and getting back to normal in Telehouse. Some routing issues will still remain for a little while longer and links will be seeing a much higher throughput as traffic has been re-routed.

***Closing Update 15:51***

Most peering links are now stable and disruption is minimal but please do be aware there still may be slight congestion. Our systems remained online and are not experiencing any problems caused by the earlier power outage.

RESOLVED: Email delays

PROBLEM DESCRIPTION

A backlog of mail is queued on our internal mail delivery servers, causing long delays in delivering mail to customer mailboxes.

Affected Services:

Email.

Customer Impact:

Email which has been sent to your mailboxes may be delayed or appear to be missing.

Estimated Resolution Time:

We are currently investigating this problem and will update by 13:00 with more information. Resolved at 11:42

***Update 09:55***

Two of our four servers have now cleared their backlog of email. We are working to clear the backlog on the remaining two servers. Any new mail now being received should not be experiencing any delay.

***Update 11:10***

The two remaining mail servers are continuing to clear their mail queues. We are currently investigating why this is taking longer than expected.

***Update 11:42***

All mail queues are now fully cleared and mail should be flowing as expected. Some users may be seeing some duplicate messages being delivered.This is caused by inbound mail servers redelivering partially deferred messages resulting in multiple copies. We apologise for any problems caused by these delays.

RESOLVED: Loss of storage SAN causing network wide problems

PROBLEM DESCRIPTION

At approximately 16:35 BST, Gradwell’s system administrators were alerted to a problem with one of our storage SANs

Affected Services:

All services

Customer Impact:

All services may be affected, including VOiP and Hosting. Loss of phone registration and inability to call out or receive calls. Some websites will be offline.

Estimated Resolution Time:

We are investigating this now and will update by 17:45 with more information

***Update 17:00***

All lines into our support office are also offline at present


***Update 17:20***

Services are now starting to return to normal as we are restarting many affected services and servers. Our support line is now back online

***Update 18:15***

Most servers have now been restarted and most services are back on line. We are continuing to resolve any remaining issues.

***Update 18:31***

All services should now be available.

RESOLVED: Hosting and VoIP platform issues

PROBLEM DESCRIPTION

At approximately 6:50  BST, Gradwell’s system administrators were alerted to some system issues affecting a multitude of Gradwell services.

Affected Services:

Web services

Control panels

VoIP services

Customer Impact:

Affected customers will be seeing errors when accessing hosted websites/control panels and will be seeing errors when attempting to make outbound calls.

Estimated Resolution Time:

Our system admin team are working on this now and will update again at or before 09:30.

***Update*** 9:35

VoIP services and control panels should now be working as expected, we are still working on the web clusters and expect to have these back online shortly. We will update again at or before 10:30

***Update*** 10:36

The web cluster is now back online and all services should be running as expected.

There may be some slowdown on control panels as systems are busy processing any backlogs.

We are continuing to monitor and will update again at 11:30

***Update*** 11:29

All systems are running correctly and remain stable. We will continue to monitor closely for the next few hours and update/close this status at 13:30

***Update*** 13:20

All systems are now running correctly and we are now closing this status update.

The problem has been identified as being one of our DNS cache servers. This cache server, 193.111.200.191, stopped responding and this in turn caused our master MySQL server to effectively lock up. This then failed to respond to queries correctly. The majority of our infrastructure relies on this database, hence parts of it became unstable.

We apologise for any problems this has caused you.

UPCOMING: Mail Server Maintenance

Scheduled Maintenance for mail servers; 29/06 @ 10:00pm

This is advance notice that we will be performing maintenance to the mail-server “glacier” on 29/06/2010 at 10:00pm.

We do not expect this maintenance to last for more than 1 hour.

Tuesday, 29th June, 2010

Affected Services:

Customers whose mailboxes are stored on the “glacier” mail server.

Customer Impact:

Customers on the affected mail server who may not be able to access their mailboxes for a short period during the maintenance window. Customers on all other mail servers will not be affected.

Maintenance Action:

This work is being performed to increase the storage space available on the mail server.

RESOLVED: POP3 outage for Mailboxes on Glacier

Affected Services:

Email - Customers accessing their mailboxes by POP3, if accessing by IMAP then the connections work

Customer Impact:

Customers will receive an error when trying to access their mailboxes. If you need access to the mailbox you may do so by logging into our webmail service here: https://webmail.gradwell.com/horde/imp/login.php

This is only affecting customers who are hosted on our Glacier file storage. You can check what File Storage you are hosted on in your control panel in the email standard section.

Action:

Our System Administration Team are currently investigating the cause of this and hope to restore service as soon as possible.

UPDATE:

Our System Administration Team have identified the cause of the problem and are now working on a solution for this issue.

Next Update: 10:00

Closing Update:

The issue has now been resolved by our system admins and the service is accepting pop3 connections correctly. We apologise for the inconvenience caused by this and wish to assure that we are deploying additional monitoring abilities to stop this from happening in the future.

Virtualisation SAN outage

We have been alerted to a fault on one of our virtualisation SANs which is affecting a large amount of our infrastructure. An engineer is on his way to repair the fault and we hope to update this notice shortly with further information.

UPDATE 1727: Phones using the gradwell SIP platform should now be working again. If you are continuing to have problems please try rebooting your device.

UPDATE 1957: Most services should have come fully back to normal within the last few moments. Secondary DNS customers may find that due to some zonefile corruption their zones will not be served from our autoritative servers yet. The zone files are rebuilding at the moment and we will update when this is completed.

UPDATE 2024: Some residual issues with outbound call setup times have now been resolved.

UPDATE 0035: All services appear to have been running normally for some time. If you are experiencing any continuing issues please raise a ticket with support.

RESOLVED:Control panel and gradwell.com issues

We are currently seeing some issues with our control panels and our website, gradwell.com.

Our sysadmin team are working on this now and will implement a fix as soon as possible. We will update here with as soon as possible. We apologise for any problems this is causing.

***Update 16:18***

This also seems to be affecting our SMTP outbound only users ability to send mail. Our server admin team are still looking into this and we will post an update as soon as possible.

***Update 17:15***

All SMTP outbound mail is now flowing correctly and service has been restored to gradwell.com. Our server admin team are still working in the control panel issues.

***Update 17:40***

All control panels are now functioning correctly but we will continue to monitor closely to ensure no further problems develop. We will close off this status update once we are happy the control panels are totally stable.

***Update 15:35***

We are seeing some further issues with the control panels at present, our server admin team are working on this now and will have service restored shortly.

***Update 15:55***

All control panels have now been returned to full operational status. We apologise for any problems this might have caused you.

COMPLETED:Filestore maintenance - Friday 28th May 00:01 to 02:00

Update 06:00: The filesystem on flathead is offline for a consistency check

Update 04:55: All services are currently online. We will continue to monitor the situation.

Update 03:30: We are seeing some unusual errors on the home file partition on sawtooth that are causing performance issues with the customer web clusters. The home partition on sawtooth is currently offline for a full consistency check.

Update 02:00: All services are back online

Update 01:41: Work on flathead is not going to complete before 2am, we will return flathead to service within the next few minutes and schedule a further window to complete this work at a later date.

Update 00:51: Work on all servers except flathead is now complete. Flathead will be unavailable for some time longer while it completes essential filesystem maintenance.

We are performing some maintenance on all of our user mail and home filestore servers during the above window. We have increased the storage available to each filestore and because of this each server needs to be rebooted to pickup its new disk quota.

Each filestore will be offline, one by one, for approximately 10 minutes each apart from ‘flathead’ which is being moved to another host. This server may be offline for a little longer.

We apologise for any inconvenience caused by this work.

***Update***

The majority of the work has completed.

Please see http://www.gradwellstatus.com/2010/05/28/ongoing-problems-with-fileserver-flathead/ for further updates

RESOLVED: Yellowstone and Denali mail store problems

Mailboxes hosted on servers ‘Yellowstone’ and ‘Denali’ are currently unavailable. Customers are able to see if they are affected by this by logging into the hosting control panel, and clicking on Email Standard. In this menu it will show which server each mailbox is hosted on.

Our sysadmin team are looking into the issue now and we will post further updates here as they become available.
We apologise for any inconvenience caused.

***Update 10:49***

To fully restore service to these servers, our sysadmin team will have to reboot an entire blade. This may cause some control panels to stop functioning correctly for a few minutes. It may also affect our hosted exchange platform briefly.

***Update 12:11***

The affected blade has been fully rebooted and all systems are now running as expected and have been for a while. We will continue to monitor closely and only close off this status update once we are happy everything will remain stable.

***Update 16:30***

All servers are still running as expected and no further problems have been seen. We are now happy to close off this gradwellstatus. If you are still seeing any problems, please raise a support ticket and we will deal with these on a case-by-case basis.