We are experiencing further storage issues with our iSCSI-2 SAN (its second outage today) and as such we are moving all services off this onto our new HP storage servers. This may take some time however as many of our services are clustered we are focusing on moving the more important services first such as email file servers.
UPDATE 13:30 1st Feb: We are currently experiencing storage issues with our iSCSI-3 SAN, and have multiple services down. We are currently recovering affected services.
As of 10am on Sat 31st, We are currently experiencing recurrences of the issues from the beginning of last week. Our server admins are working on this as a matter of urgency, currently access to some mailbox’s will be unavailable for customers.
UPDATE?: Customers may be experiencing problems trying to call fixed or mobile destinations via our VOIP service, this is due to the above fault, devices will be returning 603 error codes. Our technical team are aware of this issue and further updates will be posted once information is available.
UPDATE: As of 13:00 hours all systems are now back online and services are available again, if however you are still continuing to experience problems, please raise a support ticket either via email or the portal and a member of our technical team will investigate.
We have restarted our outbound IAX load-balancer due to a fault and our developers are investigating. We apologise for any inconvenience caused.
Starts: 2009/01/29 22:00 Ends: 2009/01/29 23:00
Tonight we are installing a configuration update on our NewSIP VoIP platform which should resolve problems some customers have been experiencing with attended transfers. This maintenance may cause problems with call setups however no active calls should be dropped.
January 29th, 2009, 04:10 pm by Ben Smithurst Uncategorized
Customers should be aware that if websites are hosted on the legacy PHP 4 cluster, then the log files used to generate Webalizer / Analog reports will report incorrect IP addresses for visitors.
There is no current ETA for this issue to be resolved, and we encourage customers to migrate their websites to either the PHP 4.4 cluster or alternatively the PHP 5.2 cluster. To do so, please raise a ticket with our Customer Services.
We will shortly be announcing an end-of-life date for the legacy PHP 4 cluster.
We have just encountered a problem with our VoIP Centrex Server lon-pbx-11, which has been restarted due to a resource utilisation issue. If customers are still experiencing an issue with this server, please restart any phones which are affected before concacting our customer services team. Thank you.
As at noon on Sat 24th, We are currently experiencing recurrences of the issues from the beginning of the week. Our server admins are working on this as a matter of urgency. The initial issue has been resolved and now services are being brought back online. The additional storage units mentioned before are in place and data is being migrated to improve service quality.
Update: 13:10. All systems have returned to normal.
Update: 14:50. We are continuing to see a number of failures occur whilst we are implementing our changes to stabilize the system for next week. We are currently working to resolve these outages, but there may be further interuption this afternoon.
Update: 20:15. We have completed the major part of our SAN reconfiguration, and all services are currently online. However, we have further work underway overnight on Saturday and on Sunday to migrate some of the load from one SAN to the others, and therefore, services continue to be at risk.
Update 20:05 29/01/09 We have been experiencing another outage this evening across services from 7:15pm onwards. Our server admins have been looking into this and are working to restore the service.
Update 22:45 29/01/09. We have restored all affected services.
I want to let our customers know about our progress in dealing with the service disruption that has been experienced since Monday. This disruption has been caused by issues between VMWare ESX and our iSCSI storage, and we’ve taken the following steps to reduce and stop the service disruption.
- We have upgraded the RAM and firmware inside our storage unit called ‘iSCSI-2′, and it is now back in service. We are currently moving some services across from iSCSI-3, and once we have reduced the load on iSCSI-3 enough, there should be no more disruption. (The root cause of the disruption is trying to do too much with iSCSI-3, because we lost iSCSI-2 for a while).
- We have resolved the other configuration issues with our ESX cluster.
- We were unable to install additional storage on Wednesday. Unfortunately, the hardware that we were reprovisioning didn’t work reliably enough during our testing; we’ve had to abandon this plan for now.
- Due to stock shortages, it will be Wednesday 28th Jan before we take delivery of the 4 x HP DL160 servers we’ve ordered to use as even more additional storage. We’re aiming to install these a week on Sunday (Sun 1st Feb).
We will keep you informed as we make more progress with building and installing the additional storage.
We are conducting a lessons learned exercise early next week, and we will publish the results of this on here once that exercise is complete.
January 24th, 2009, 11:15 am by Stuart Herbert News
We are currently experiencing an outage with customer websites and have restarted our customer shell server ’shell.gradwell.net’. Our alternative shell servers, newred and ochre remain available.
Our server team are aware of the issue and are working to restore service as soon as possible.
Update 14:40: This issue has been resolved and is related to the network problems we have been seeing within the last few days.
January 23rd, 2009, 12:19 pm by David Palmer Uncategorized
We have experienced a further interuption to service on our vmware platform this morning, and are currently completing the recovery from that.
We expect all services to be fully operational by 8am.
Thank you for your patience during this time. We would like to reassure customers that good progress has been made on our plans for preventing these problems and additional equipment has arrived and is being configured.