We will be switching off parts of our core network in the very early hours of Friday morning to perform some upgrades and to increase power resiliency. Some parts of our network will be unavailable for short periods of time between 00:00 and 05:00. During this period all phones will re-register at some point, some websites will be offline and all Gradwell DSL lines will lose connection for up to 20 minutes.
Whilst the whole network will be ‘at risk’ for the entire period, the main windows for major network outages are specific per device. Our work plan will be as follows:-
Core Switching - 01:00-2:00
ADSL Routers - 02:00-02:30
Border Routers - 02:30-03:30
Peering Routers - 03:30-04:00
We apologise in advance for any loss of services caused by this and if you experience any problems after this work has been completed, please contact support@gradwell.com
*** UPDATE 07:05 ***
We are investigating a problem with the web cluster following this maintenance. Status will be updated here as soon as further information is available. We apologise for any inconvenience caused.
***Update 13:38***
We have been monitoring all upgrades and they are performing as expected. All work was successfully completed. We apologise again for any loss of service during these upgrades.
Some users who are connecting via our NAT proxy are seeing some registration issues. Our server admin team are working on this now and we will post an update as soon as possible.
We apologise for any problems this may be causing you
***Update 12:30***
Our system admin team have fully restored service and all of our NAT proxies are fully functioning. We again apologise for any problems this has caused.
We are currently seeing a massive influx of mail through our inbound mail servers and this is causing delays on mail delivery across the network. Our system admin team are working on this now and we will update here with further details as soon as possible. We apologise for any problems caused by this.
UPDATE: All mail queues on our systems have now cleared and mail delivery is proceeding as normal. Our investigations have highlighted a number of mail sources delivering abnormally large volumes of email whom we have taken action against. Further to this we have highlighted some architectural improvements within our own systems in order to mitigate these types of attacks in the future.
***Update 13:20 12/2***
We are once more seeing a massive amount of inbound mail causing some queues on our mail platform. Our system administration team are working on this now and hope to have any backlog cleared as soon as possible. Again we apologise for any problems caused by this and will update here again once the queues have been fully cleared
***Update 15:58***
Our queues have held at a low level for quite a while now but we are still seeing a huge amount of inbound connections to our edge nodes. Our system admin team are bringing extra resources online to help keep delays to a minimum. We are continuing to monitor the situation and will update here again.
***Update 13/02/10 13:00***
Systems are currently processing new mail well, however, we have identified two servers which have messages stuck in their local delivery queues from Friday. We are manually flushing those messages at the moment so customers may see some delayed messages arriving in their inbox.
***Update 15:40 16/2/10***
We have been closely monitoring the mail platform and the previous delays have been fully cleared and no further delays have been seen.
This was partially caused by header corruption breaking our mail loop detection coding. This has now been recoded and fully tested. We apologise again for any problems caused by these delays.
We will be performing some further maintenance to one of our hosted unified comms nodes tonight from now until 23:30. This might cause a few minutes downtime whilst we failover the primary node. We will post here again when the maintenance work is completed
We apologise for any problems this might cause you, but would like to assure all users that any downtime will be kept to a minimum and in most cases wont even be noticed due to our clustered server setup.
Users of other hosted VoIP products including Centrex/multi user/SIP/IAX trunks will be unaffected.
***Update 22:50***
We have finished the maintenance ahead of schedule and our hosted unified comms platform is now running in full high availability. We apologise if anyone experienced a dropped call due to this.
Starts: 2010/02/09 22:00 Ends 2010/02/09 23:00
On the evening of Tuesday 9th February we are performing some maintenance on our VoIP NAT proxy cluster to improve its resilience to hardware and software failures.
We do not expect there to be any significant downtime during this work, however if you do experience problems making or receiving calls, we recommend you reboot your phone in the first instance.
We apologize for any inconvenience this work may cause you.
Update (Wednesday, 10:23): This maintenance was completed last night with minimal downtime, and the NAT proxy is functioning normally today.
Some users may be seeing problems making or receiving calls using the Unified Comms platform. Our system admin team are looking at this now and we will post any updates here.
We apologise for any problems caused by this
***Update 13:55***
The Unified Comms platform is now running at 100% We apologise again for any problems caused.
We are performing some routine maintenance on our Unified Comms Platform at around 22:00 this evening. This will cause a downtime of approximately 20 minutes, during this time calls may not be routed correctly.We will update here when the work is started and again when the work is completed.
Users of our hosted VoIP platform, and SIP/IAX trunk users will be unaffected by this maintenance.
***Update 22:00***
We are about to start the planned maintenance on schedule. The Unified Comms platform is now offline
***Update 22:12***
We have finished the maintenance a little early. Our Unified Comms platform is now back up and running in full High Availability mode.
We apologise for any inconvenience caused by this maintenance.
Our engineering team are currently investigating an issue with our web portal, we apologise for any inconvenience caused. Access to our VoIP and hosting control panels are not affected.
Update: 18:15: This issue has now been resolved. A new revision of the portal was due for Sunday, and in order to minimise downtime, this has been brought forward. If customers experience any problems with the new portal, please raise an incident with our support team. Most functionality is also available in our VoIP and hosting control panels, with the exception of broadband controls.
One of our customer facing caching DNS resolvers, with address 193.111.200.91 was not responding to DNS queries from around 04:00 until just after 07:00 today. Customers are advised to configure computers and phones with at least two resolver IP addresses:
193.111.200.91 and 193.111.200.191
We have corrected the problem and reported this fault back to the engineers who set up these machines. We apologise for any inconvenience caused.
Some customers are seeing problems making and receiving calls using our IAX platform. Our server admin team are working on this now and we will update here as soon as possible.
SIP and centrex users are unaffected by this.
We apologise for any problems this is causing
***Update 14:00***
We have removed one of the IAX machines from the cluster and rebooted. We have now put this machine back into the cluster and tested. It is now handling calls as expected. We have opened an internal issue for both our VoIP development and sysadmin teams to investigate the cause so we may better protect against problems like this in future.
We apologise again for any problems this has caused you