Some users on ‘glacier’ are seeing issues collecting mail via POP3/IMAP this morning. Our system admin team are working on this now and we will update here as soon as further information is available.
We apologise for any problems this may be causing you.
***Update 13:55***
We have implemented a fix for the earlier mail problems and have rebalanced all mail stores. We also have further capacity coming online within the next two weeks to ensure these problems do not reoccur.
We apologise again for any problems caused by these earlier issues
We are currently experiencing a problem with one of our VMware hosts. To return full functionality we need to reboot one of our blades. This will cause an interruption to several services including shell.gradwell.net, some websites hosted on ‘dixie’ and hosted exchange services.
We will post here again upon completion of this emergency work. We apologise in advance for any problems caused by this.
***Update 13:54***
We have fully powered off the problem blade and are restarting services now. Some virtual machines will require a filesystem check so these will take slightly longer to complete.
***Update 14:12***
We have now restored all services. If you are still seeing any problems, please do raise an incident with our support team and we will investigate on a case by case basis. Our apologies again for the loss of service.
We have discovered that Microsoft Hotmail/Yahoo/Live networks have decided to block our mailservers from sending outbound mail directly to their MX servers. We have requested removal from this blocklist and are awaiting Microsoft to action.
We apologise for any inconvenience this causes and would like to remind anyone using us to forward mail to one of these accounts to not ‘report as spam’ if the mail is only forwarded through us. This blacklists our network, not the originating network.
***Update 10:10***
We are now only subject to the normal Microsoft ‘throttling’ for all of their inbound servers so the vast majority of emails should now be reaching their intended target. We apologise for any problems resulting from this and we have asked for our sending limits to be raised across all Microsoft servers.
We are currently seeing a massive influx of mail through our inbound mail servers and this is causing delays on mail delivery across the network. Our system admin team are working on this now and we will update here with further details as soon as possible. We apologise for any problems caused by this.
UPDATE: All mail queues on our systems have now cleared and mail delivery is proceeding as normal. Our investigations have highlighted a number of mail sources delivering abnormally large volumes of email whom we have taken action against. Further to this we have highlighted some architectural improvements within our own systems in order to mitigate these types of attacks in the future.
***Update 13:20 12/2***
We are once more seeing a massive amount of inbound mail causing some queues on our mail platform. Our system administration team are working on this now and hope to have any backlog cleared as soon as possible. Again we apologise for any problems caused by this and will update here again once the queues have been fully cleared
***Update 15:58***
Our queues have held at a low level for quite a while now but we are still seeing a huge amount of inbound connections to our edge nodes. Our system admin team are bringing extra resources online to help keep delays to a minimum. We are continuing to monitor the situation and will update here again.
***Update 13/02/10 13:00***
Systems are currently processing new mail well, however, we have identified two servers which have messages stuck in their local delivery queues from Friday. We are manually flushing those messages at the moment so customers may see some delayed messages arriving in their inbox.
***Update 15:40 16/2/10***
We have been closely monitoring the mail platform and the previous delays have been fully cleared and no further delays have been seen.
This was partially caused by header corruption breaking our mail loop detection coding. This has now been recoded and fully tested. We apologise again for any problems caused by these delays.
Our network engineers are investigating an outage with our connectivity to Sovereign House, which may prevent customers from reaching our network and services, although we have not yet received indepdendent confirmation of this. We will keep this page updated as soon as further information is available.
Update 05:51 : This problem was first detected by our monitoring systems just after 4AM. We have confirmed that this is likely to cause VoIP registration issues for some customers. Our network engineers are working with our transit providers to restore connectivity. We are not able to provide an ETA for restoration of our link with Sovereign House at the moment. We apologise for any inconvenience this will be causing customers are are are working to ensure the link is restored by UK business hours.
Update: 06:14: We have opened an internal issue - 3108 for linking customer incidents where problems are proven to be related to this fault. Typical problems involve not being able to send or receive email, and phones not registering. You are still able to reach our web-site at www.gradwell.com
Update 06:50 - This issue appears to be resolved, however, we are awaiting an update from our network team to confirm the status of the resolution.
Update: This issue was resolved just before 07:00 and was related to one of our transit providers’ routers due to overnight maintenance. If you are experiencing any issues with phones this morning, please try to reboot them, many phones will not continually retry if there is a network issue. If problems persist, please contact our support team, who will be happy to advise.
We are seeing some mail delays at present. This was caused by a backlog caused by spam and our system admin team are clearing mail queues now.
We apologise for any problems this is causing you.
***UPDATE 11:05***
We have cleared all spam mail from our mail cluster and our mail queues are now returning to normal
We are currently experiencing an issue where email delivery to our mailing list servers is failing.
This will affect customers who run mailing lists under the subdomain “list.yourdomainname.dom”
An oncall system engineer has been alerted to this issue by our internal monitoring and further updates will be posted once available.
Update: this issue was resolved at 11:45 this morning and was caused by a low-level MTA port conflict on our Mailman server. We cannot be sure why this happened, however, we have changed the configuration to ensure only Exim runs in future. We apologise for any inconvenience caused. Mailman is now processing the backlogged queue and weill attempt re-delivery for all unprocessed messages.
Mailboxes stored on our ‘Yellowstone’ mail store have temporarily been unavailable due to a memory problem. The server has been restarted and is now operational. Customers may be prompted by their email clients for a password when sending or receiving. Our engineers will be performing further maintenance to increase memory of this server this evening at 22:00. We do not expect this maintenance to last more than a few minutes, during which mailboxes will be temporarily unavailable.
Update: This maintenance was successfully completed between 22:25 and 22:35.
We are currently investigating an issue related to outbound email and will update customers as soon as further information is available.
This problem has now been resolved and was identified as a temporary issue with our auditing database. We apologise for any inconvenience.
We are currently investigating an issue with mail forwarding rules. Our mail forwarding system reloads customer changes periodically and no updates to mail forwardings have been processed since mid-day. Our server team are investigating this in order to process changes and will update gradwellstatus as soon as possible. We apologise for any inconvenience caused.
Update: This issue was resolved by 20:00 yesterday evening. Our engineering team made some emergency code changes to our mail system, due to difficulties talking to our databases, which was causing rules not to update. the problem was active between 12:00 and 20:00.