We are currently experiencing a problem with one of our VMware hosts. To return full functionality we need to reboot one of our blades. This will cause an interruption to several services including shell.gradwell.net, some websites hosted on ‘dixie’ and hosted exchange services.
We will post here again upon completion of this emergency work. We apologise in advance for any problems caused by this.
***Update 13:54***
We have fully powered off the problem blade and are restarting services now. Some virtual machines will require a filesystem check so these will take slightly longer to complete.
***Update 14:12***
We have now restored all services. If you are still seeing any problems, please do raise an incident with our support team and we will investigate on a case by case basis. Our apologies again for the loss of service.
Please be aware we are currently seeing an outage one of our back end systems. Our system admins are currently looking the issue and we are trying to get this back up as soon as possible.
This will be affecting some of our customers websites and home directories while we attempt to restore full service again.
***Update 14:24***
This issue was cleared in full yesterday shortly after posting the initial status update. We have been closely monitoring the storage since then and have seen no further errors.
We are now happy to close this status notification and we apologise for any problems this might have caused you.
Our network engineers are investigating an outage with our connectivity to Sovereign House, which may prevent customers from reaching our network and services, although we have not yet received indepdendent confirmation of this. We will keep this page updated as soon as further information is available.
Update 05:51 : This problem was first detected by our monitoring systems just after 4AM. We have confirmed that this is likely to cause VoIP registration issues for some customers. Our network engineers are working with our transit providers to restore connectivity. We are not able to provide an ETA for restoration of our link with Sovereign House at the moment. We apologise for any inconvenience this will be causing customers are are are working to ensure the link is restored by UK business hours.
Update: 06:14: We have opened an internal issue - 3108 for linking customer incidents where problems are proven to be related to this fault. Typical problems involve not being able to send or receive email, and phones not registering. You are still able to reach our web-site at www.gradwell.com
Update 06:50 - This issue appears to be resolved, however, we are awaiting an update from our network team to confirm the status of the resolution.
Update: This issue was resolved just before 07:00 and was related to one of our transit providers’ routers due to overnight maintenance. If you are experiencing any issues with phones this morning, please try to reboot them, many phones will not continually retry if there is a network issue. If problems persist, please contact our support team, who will be happy to advise.
Some users and partners have seen a slow down in control panel access, to combat this we have added in another CPU to the server that handles control panel access for customers and partners.
This should result in a quicker, more stable service.
We are currently investigating a file-system problem which is preventing some of our control panels from functioning correctly. Apologies for any inconvenience caused. We aim to have this resolved very shortly.
Update 13:47: The file-system issue with our primary control panel cannot be automatically repaired. Our server team is continuing to work to try to bring this server back online, however, we are also working to provision a new machine to ensure we can restore control panel services, should the original control panels not be recoverable.
Update: 14:57: Hosting and VoIP control panels have been restored. Logins have been tested. However, we now need to ensure all partner portals are functioning and restore internal monitoring. If customers continue to see any issues with the control panels, please submit an incident via our portal and we will look into the issue for you. Again, we apologise for any inconvenience caused by this issue.
We are currently experiencing an issue with our backend CRM that is preventing customers from using our control panels. Customers may recieve an error “Unable to connect to backend CRM database” when attempting to login to or use our control panels. This issue has been escalated to our engineering team.
Update (10:40): We have now confirmed this appears to be a fault with our CRM platform, and we have raised a support ticket with the vendor.
Update (20:40): We have still not had any satisfactory response from our vendor. We have however applied a workaround which seems to have fixed the immediate problem of control panel access failing, and will continue to monitor the situation, and work with our vendor for a full solution.
Update (23:40): Our vendor has now fixed this issue at their end and all services are working normally.
Our server team are currently investigating an issue with our back-end storage which is affecting access to the portal, Mailman connectivity and customer scheduled jobs (cron). Further information will be posted as soon as it is available.
Update 17:00: We are currently experiencing a problem with inbound calls due to this issue, call setup may be affected. We are working to bring back the affected VoIP SIP routers.
Update:17:15: This issue has now been resolved. We have restarted several services which rely on the ‘Io’ back-end storage. We apologise for any inconvenience caused.
Our server team are currently investigating an issue with our DNS update system which updates customer zone files after control panel changes. Please note that this issue also affects secondary DNS. Our servers will continue to answer with older zone files, however any chances will not be honoured. We apologise for any inconvenience this will cause. We are currently following this issue up with our virtualisation vendor.
This issue was resolved by our systems team Sat Oct 31 02:01:29 GMT and was related to a VMware fault, which we are following up with the vendor.
We are in the process of rebooting one of our ESX machines ‘Mars’, this will affect several services including our public DNS resolvers.
VoIP Customers may experience DNS related issues if phones are configured to use 193.111.200.91 and 193.111.200.191 as their DNS Servers. DSL Customers may experience an outage where web sites cannot be reached.
Our server admin team are working on this now and will try to keep disruption to a minimum. Our apologies for any problems experienced because of this
Update 12:51: Our DNS resolvers are now back online and working correctly. This issue will have resulted in a problem setting up calls. We are currently bringing back other services such as control panels are working as quickly as possible to restore any affected services.
Update 13:16: Control panel access has been restored.
We are seeing some ISPs having routing issues to our old NAT proxy lon-ppc-3.gradwell.net, our network admin team are working on this now and we will post any updates here as soon as they are availiable.
If you are seeing any errors, please change your NAT proxy to nat.gradwell.net:5082 and this will return service.
***UPDATE - 11:24***
We have now resolved the routing issue and traffic from all sources is routing without problem. We apologise for any problems caused by this