Gradwell Internet for business people

Gradwell Service News

Monthly Archive July, 2009

RESOLVED: Internal DNS issue affecting VoIP calls

We are presently experiencing an issue with internal name resolution which is preventing inbound outbound calls from working.  Our server team are investigating this issue and are aiming to restore service as soon as possible.

Update: This issue has now been resolved, it was detected by our monitoring systems at 15:48 and corrected within six minutes of that time.

RESOLVED: Customer home storage repair

Our monitoring systems have detected a file-system write problem with the customer home directory storage virtual server - Dixie.

Web-sites on this server will be unavailable temporarily.  We apologise for any inconvenience caused, we aim to restore service within an hour.

Update 00:54: This issue has now been resolved and Dixie has been brought back online.  Web-sites may be slow for a few minutes as queued accesses are processed by the server.

**Update** 9:15

We are seeing further problems with the filesystem on Dixie this morning. Our server admin team are working on this now and we will post an update as soon as possible.

We apologise for any problems these filesystem issues are causing.

**Update** 12:25

To get Dixie back to stable working levels our server admin team need to remove it from active service to perform some block level disk checking. This will cause a disruption for any customer who is serving a website or is accessing file(s) within their home filespace if located on Dixie. We apologise for the continuing interruption and we will post an update here as soon as possible.

Update 15:05: Due to intermittancy of the home file server; Dixie, our server team will be restoring the file system in a read-only state.  This will allow the majority of sites to work normally until the service can be fully restored on an alternative server.

Update 20:05: Backups for fiel storage on Dixie have been recovered, and we are now recovering available data from the last 24 hours from Dixie.  We will shortly be switching over to the new server.

Update 13:06: This issue was resolved at midnight, however, shell and our legacy cluster could not reach the new server, this was resolved before 10:00 today.

RESOLVED: Annex M DSLAM Upgrades

Starts: 2009/07/29 01:00 Ends: 2009/07/29 05:00

Our ADSL carrier has informed us that a number of exchanges (102) will experience intermittent connectivity issues while the DSLAM software is upgraded , this will allow us to expand the availability of our Premier Plus broadband package.

Users connected to the affected exchanges will be disconnected for upto 20 minutes while the software is upgraded, customers routers should reconnect automatically once the upgrade has been performed.

RESOLVED: Mail and web issues

15:50: We have had to take one of our core mail servers (Badlands) offline temporary due to a high load which we are currently investigating. Customers with mail on this server will not be able to view or download new emails temporarily. We will update customer as soon as further information is available and apologise for any inconvenience.

Update 16:35 The Badlands server was brought back onine after a hardware upgrade at 16:10 and is now processing a backlog of emails. We have tested customer mailbox access and this is working correctly. If you are continuing to experience issues which are not related to delayed mail, please contact our customer services team.

Update 17:42: We are currently experiencing issues with our PHP 4.4 web load balancers and continuing to experience a back log of mail which cannot be processed. Our server team are currently upgrading hardware to ensure that more resources are available and will continue to update customers via this website.

Update 20:55: Our server team are still investigating this issue. As an emergency measure, some load intensive services such as AV scanning have been switched off. We are aiming to restore these services as quickly as possible, however, the performance of mail and web in general will need to be resolve first. We apologise for the ongoing inconvenience.

Update 21:25: Our technicians have now diagnosed and resolved a networking configuration issue which was impacting on the performance of our back-end storage SANs, causing our mail and web systems to have difficulty serving requests and handling mail in a timely fashion.

We are monitoring traffic and will restore AV services as soon as we confirm all queues are cleared.

Update Sat 25th July 11:00: We have made an emergency fix to flush outbound mail queues, and these queues have now been cleared, we are aware that an issue remains with our core network and are awaiting a fix as soon as possible.  Further details will follow as soon as they are available.  At this time we believe all customer-facing issues are resolved, however, should you be encountering an issue, please submit an incident via our support portal or email support@gradwell.com

Update 16:41: Full Anti-virus and spam scanning services have now been restored to the network, our server and network teams will be working next week to re-balance load and set up additional network monitoring to provide faster answers should an issue like this arise in future.

RESOLVED - Mail Delays

Earlier today we were seeing some mail queues forming, the underlying issue has now been fully rectified and mail queues are reducing. As a result of this some customers may still be experiencing delays on inbound and internally relayed mail as the queues drop back down to normal levels.

We apologise for any problems these delays are causing

RESOLVED: Mail issues

We are currently seeing some issues with both POP3 connections and IMAP over SSL connections. Our server admin team are working on this now. Normal IMAP use will be unaffected.

We will post an update here as soon as one is available.

Sorry for any problems this is causing.

Update 12:41: The service has now been restored, we have re-configured and restarted our mail routing load balancer on to our new virtualised infrastructure.

RESOLVED: lon-pbx-1 and control panels

We experienced a power issue with one of the server hosts which runs several important services such as lon-pbx-1 (VoIP) and our customer control panels at 22:20. Although the machine recovered quickly these services could not be made to work as quickly as we hoped on our next generation of servers and so have been rolled back as an emergency measure. We apologise for any inconvenience caused by this. The problem was resolved and all services brought back online by 23:30. Our backup server for VoIP operated for lon-pbx-1 during this time. Customers may need to restart their VoIP equipment to ensure they are using the primary server again.

RESOLVED: lon-pbx-5 restarted

Our automatic monitoring has picked up a call quality and responsiveness issue with our VoIP Centrex server lon-pbx-5 which has now been automatically restarted.  We apologise for any inconvenience caused.

This issue was seen at 15:06 and our systems made an automatic decision to restart the server at 15:23.

RESOLVED - lon-pbx-5

We are seeing some issues with one of our legacy PBX servers - lon-pbx-5.gradwell.net

Users on this server may be seeing problems registering VoIP devices and intermittent call problems.  Our server admin team are working on this now and will post all updates here as soon as possible

We apologise for any problems caused by this.

**Update** 11:42

Our admin team have restarted lon-pbx-5 and it is now handling calls correctly.

If you are still seeing problems, please reboot your device and it should re-register. If problems still continue after this, please do contact support.

RESOLVED - NewSIP

We are currently seeing some problems with the machines in the NewSIP cluster. Customers using this server may be seeing some issues making and receiving calls.

Our server admin team are working on this now and will post an update here as soon as possible.

We apologise for any problems this may be causing you.

Update 12:51: Our server cluster is now running and our server team are investigating the hardware fault is intermitently causing the service to fail.

Closed issue - we are continuing to investigate the root cause of this issue, however, it is not currently seriously affecting service.