Our Catalog of Ideas, Sweat, Inspiration, observations and Mea Culpas for IT that just didn't work out, aka: It works Awesome Except it Didn't Work (for a moment, at least).
Post Mortem of Outage – Thursday 12/26/2019
Description of Outage:
At 8:00 AM ET on 12/26/2019 we identified through our morning testing that SureDesk users were unable to login when trying to login from the website so5. Anyone attempting login would receive a message that login failed. ...
Post Mortem of Outage – 12/23/2019
Description of Outage:
At 12 PM ET on 12/23/2019 we identified that users on one of our Standard server pool servers had an unexpected disconnect from the file server. Anyone on that server were unable to connect to their files and all ...
Post Mortem of Outage – 09/06/2018
Description of Outage:
At 5 PM ET on 9/6/2018 we identified that one of our Standard server pool servers had an unexpected shutdown. Anyone on that server immediately had a session closed and all work in progress was closed as well. ...
Post Mortem of Outage – Monday 07/02/2018
Description of Outage:
At 6:30 AM ET on 7/2/2018 we identified through our morning testing that SureDesk users were unable to login. Anyone attempting login would receive a message that apps and desktops were unavailable.
Outage Review:
We determined ...
Post Mortem of Outage – Friday 08/18/2017
Description of Outage:
Standard SureDesks and some Custom Server SureDesks rebooted without warning. Overall outage was 30 minutes. Customers were working post reboot without any major problems.
Outage Review:
From the review of Logs and talking with Citrix the issue seems ...
Dear SureDesk™ Users,
At 11:15 am ET on June 13, 2017 we identified an overload on our File Server. We immediately messaged users requesting an emergency log off from the server at 11:30 am ET. All users were fully logged off by 11:45 am ET. We then performed an ...
Dear SureDesk™ Users,
At 1:45 pm ET this morning we identified a processing issue on the server causing perfomance issues which required a server reboot. Adobe changed their autoupdate process for Adobe Reader, which required a server reboot to affect update and this may be the root cuase ...
Dear SureMail™ Users,
January 12, 2016 starting at approximately 3:25pm we identified an issue on our SureMail Exchange servers that is causing intermittent disconnections from Outlook. Email access from smart phones and webmail is performing as usual.
While the issue is being addressed, we recommend you use ...
Dear SureDesk™ Users,
At 7:45 am ET this morning we identified an issue affected QuickBooks usage. We investigated the issue and resolved the application processing issue by 11:15 am ET. We determined the underlying cause was due to an internal Quickbooks applciation service that was in a ...
Dear SureMail Customers,
We are writing to give a further update on the "Undeliverable Message" alerts some of you many be experiencing today.
We have determined that some outbound emails were affected by an intermittent gateway issue between Friday, December 11 at 3:43 pm ET and Monday, December ...
Dear SureDesk™ Users,
At 8:45 am EST this morning we identified an issue interfering with SureDesk usage for some users. A pop up message from our systems reported a “Server Certificate out of date. Connection dropped” error.
Our engineers researched and were able to identify an issue with one ...
Dear SureMail™ Users,
On February 10th, 2015 we had connectivity slowness from 2 pm EST to 3:30 pm EST.
The slowness on the system was caused by network hacking attempts which slowed down the server farm. Security is our utmost priority and we are pleased we were able to ...
Dear SureMail™ Users,
On November 5th, 2014 latencies were observed on eight Exchange 2013 servers around 8:00 am EDT.
These latencies reduced the speed of the mail flow and affected the Outlook Web Application (OWA).
We diagnosed the issue by working with our equipment vendor and Microsoft. A maintenance ...
Dear SureMail™ Users,
On October 30th, 2014 we partially lost connectivity to our site 2 datacenter at 3:39 pm EDT. Four minutes later, at 3:43 pm EDT, connectivity was completely lost. Network connectivity went back to normal at 3:52 pm EDT but services took longer to recover.
The network ...
Dear SureDesk™ Users,
This morning at 1:32am the web server which hosts the portal for our citrix login access to the servers had an overload on the CPU. This caused a failure to launch the so4.suretech.com portal for citrix and rdc services. Additionally we had a systems ...
Dear SureDesk™ Users,
This morning at 9:33 am we identified an issue for some users logging in to the SureDesk environment due to an issue with Citrix license deployment. We rebooted the servers at 10:15 am to resolve the issue and confirmed all logins were renabled by 11 ...
Dear SureMail™ Users,
On May 28, 2014 we experienced downtime of our MS Exchange 2010 servers starting at approximately 10:50 am.
The issue started at around 10:50 am EST and the issue was resolved by 11:30 am EST.
There was a problem with a piece of the firewall equipment ...
Dear SureMail™ Users,
On April 7, 2014 we experienced outbound and inbound delay for some users on our MS Exchange 2010 servers starting at approximately 1:30 pm.
The issue started at around 1:30 pm EST and we identified the issue by 2:45 pm EST.
There was an issue with ...
Dear SureMail™ Users, we experienced MS Outlook connectivity issues for some users on our SureMail™ MS Exchange 2013 from March 1 to March 6, 2014.
From March 1 to March 5 some of our Hosted Exchange 2013 users experienced difficulty in accessing their mailboxes through MAPI (Outlook) or ActiveSync protocol. The ...
Dear SureMail™ Users, On February 4, 2014 we experienced MS Outlook connectivity issues for some users on our SureMail™ system.
The issue started at about 12:30 PM EST and services were fully restored for all users by 1:40 PM EST.
The issue came from one of the SureMail™ Front ...
Dear SureMail™ Users, On November 13, 2013 we experienced intermittent connectivity issues for some users on our SureMail™ system.
The issue started at about 12:00 PM EST and services were fully restored for all users by 12:10 PM EST.
There was hardware failure on one of our Exchange servers ...
Dear SureMail™ Users,
On August 5, 2013 we experienced intermittent connectivity issues for some users on our SureMail™ system.
The issue started at 9:45 AM EST and became more consistent at 10:02 AM EST. Services were fully restored for all users by 11:10 AM EST.
There was an issue ...
Dear SureDesk™ Users,
On April 22, 2013 we experienced a service outage for some users on our SureDesk™ environment.
At approximately 8:15 AM EST we began experiencing log-in issues for some users due to a Citrix networking issue with the Netscaler device, which enables log-ins to the hosted desktop and all ...
As amazing as software power and features have become, we are sometimes amazed in the wrong way.
We recently spent more than 20 hours troubleshooting OneNote on our SureDesk™ in an attempt to address crashes and notebook corruptions. We used several different repositories for the OneNote files including, ...
Dear SureDesk™ Users,
On February 25, 2013 we experienced a service outage for some users on our SureDesk™ environment.
At approximately 12:45 PM EST we began experiencing a service slowness for some users on our SureDesk environment. Issues were found with the network hubs in Houston, Texas where the was ...
Dear SureMail™ Users,
On February 14, 2013 we experienced a service outage for some users on our SureMail™ system.
The issue started at 8:45 AM EST and services were resumed by 8:55 AM EST. There was an issue with one of our application firewalls which resulted in Exchange connectivity loss ...
Dear SureMail™ Users,
On February 1, 2013 we experienced a service outage for all users on our SureMail™ system.
The issue started at 7:22 AM EST and services were largely resumed by 7:59 AM EST. There was a residual issue with 1 server that caused some service connectivity problems to ...
Dear SureDesk™ Users,
From October 5th through October 6th we had a confluence of unrelated issues causing downtime for some users on our SureDesk environment.
We had a File Server Corruption which caused File Access issues for some users. In addition, several of our streaming Apps including LaCerte, QuickBooks and ...
Dear SureMail™ Users,
On October 4th, 2012 we experienced slowness for all users on our SureMail™ system.
The issue started at 11:12 AM EST and subsided at approximately 5:16 PM EDT. We are continuing to troubleshoot and have opened an incident with Microsoft to determine the root cause of the ...
Dear SureMail™ Customers,
Some users may be experiencing periodic slowness in their email connection this morning.
We wanted to let all users know that we are aware of the issue and are in the process of troubleshooting and will be determining a full fix asap. We will have a status update on the resolution within the next 2 hours.
If you have any questions do not hesitate to contact us at 1-800-882-8701 x1 or This email address is hidden from email harvesters via JavaScript
-Your SureTech™ Solutions Team
Dear SureMail™ Customers,
We are happy to report the SureMail slowness issues some users experienced earlier today has been resolved.
The problem was caused by an update to 1 of our 5 anti-virus engines triggering a condition on 2 of our Exchange 2010 mailbox servers that resulted in a service outage for mailboxes located on those servers. The condition was first reported at 10:45 AM EST and due to the complex nature of the issue, it took until 11:23 AM EST to resume normal mailbox access on one of the servers and until 11:38 AM EST to normal mailbox access on the second server. There may have been some lag time for some users to feel the full effects of the fix.
For the future, we have fixed the way these updates are triggered. Furthermore, we are adding several system monitors so that we can detect and resolve this type of issue right away should it come up again in the future.
If you are still experiencing any delays or issues with your email, please contact our HelpDesk support so we can help you right away at 1-800-882-8701 x1 or This email address is hidden from email harvesters via JavaScript
-Your SureTech™ Solutions Team
Dear SureDesk Customer:
Some users may have experienced issues connecting to the gateway to login to your SureDesk from https://so3.suretech.com starting at approximately 2 pm EST on September 10, 2012.
During this time, the SureDesk service itself was fully functioning and for any users ...
We experienced a load-balancing issue on one of our application firewalls today which resulted in some clients networks being unable to access our hosted Exchange servers. It did not affect al users or any in-bound mail coming to our servers. The issue was identified at 6:25 AM and was resolved at 7:45 AM.
We are taking corrective actions to develop a monitor that can detect the problem proactively in the future.
Should you have any questions about the above, please let us know at This email address is hidden from email harvesters via JavaScript
- Your SureTech.com Solutions Team
When our services are failing it hurts. It gives our stomachs knots. When we can't immediately tell our customers what the problem is, we want to pull out our hair.
We work hard everyday planning and managing risk for our services not to fail. But they do sometimes, and when ...
A switch was installed in our SureDesk™ Data Center on Sunday 5/8/2011 in order to expand capacity and reliability.
Monday morning (5/9) some SureDesk™ users were experiencing sluggishness and intermitted connection instability which was determined to be caused by the new switch.
Configuration troubleshooting and failover systems were not responsive to a fix and the entire internet ...
As part of our continued efforts to provide a secure SureMail environment we updated the configuration of a firewall on the Exchange Server today at approximately 4:20 AM EDT. This update went smoothly and our monitoring indicated no issues with the new configuration.
However, some SureMail clients did begin to experience connectivity issues at that time; this was due to a load-balancing problem caused by our updates. This issue was resolved at 8:20 AM EDT; no mail was lost only connectivity to local clients was affected.
We are currently updating our monitoring to detect this type of load-balancing issue in the future to prevent further connection issues of this kind.
Should you have any questions about the above, please let us know at This email address is hidden from email harvesters via JavaScript
- Your SureTech.com Solutions Team
Friday afternoon, June 17th, our server administrators noticed that the hard disk on one of our web servers was showing signs of potential failure. In the process of transferring to a new web server using our backup drive, we discovered the backup drive was compromised as well. This double effect caused our sites to be down intermittently between 4:30 pm Friday and 2 am Saturday as we restored data from our most recent back ups. This is the first hardware failure to impact operations in 8 years. We're happy that no backup data was lost, though please check any posts or events updates made between Friday, June 17 at 3 am and Saturday, June 18 at 2 am as some of these edits may not have been retained. We apologize for any inconvenience this causes. If you have any questions, please do not hesitate to contact us. Sincerely, Your Solutions Team at SureTech.com
This email address is hidden from email harvesters via JavaScript
We experienced a new failure mode today with one of our application firewalls. It started returning errors to some customer requests from the Internet at approximately 11:46 AM EST. The issue, while seriously affecting some customer organizations, was not detected via our multiple monitoring systems. However based on some issues we were able to see, we re-started the affected application firewall at 12:09 PM EST. The resulting re-convergence of load balancing that occurred affected the other application firewall starting at approximately 12:12 PM EST and ending by 12:19 PM EST. All services returned to normal production availability via the originally affected application firewall by 12:22 PM EST. The aggregate time during which any customer organizations were affected by the issue was 36 minutes.
We are taking corrective actions to detect the memory fragmentation issue that caused inbound requests to fail on the affected application firewall. We will update our monitoring systems to alert us of this issue prior to inbound requests being rejected, so that we can remediate the issue without customer organizations being affected.
Should you have any questions about the above, please let us know at This email address is hidden from email harvesters via JavaScript
- Your SureTech.com Solutions Team
Your MS Exchange is fully recovered after Tuesday's service outage. All data has been fully recovered to your mailbox with no loss of data.
The issue we experienced was precipitated by a problem with an HP Storage Area Network that caused an enclosure with 12-drives to fail. Our work to recover from this was followed by an additional drive failure during the rebuild of the degraded array that hosted your mailbox, causing a catastrophic failure in the array.
Please note we regularly deal with upgrades, maintenance and occasional hardware failures transparently and without affecting your service. In this case, however, all data and data redundancy in the production environment was lost causing us to rely on our disaster recovery backup and log systems.
According to this procedure we made your mailboxes available in a 'dial-tone' configuration where you were able to send and receive emails online, but not able to work with older data offline. Then, the original mailbox data was restored, and the 'dial-tone' emails were merged into the mailboxes providing a full data recovery as of 6:30am Wednesday (yesterday).
We fully realize the interruption this caused and will make additional changes to improve our ability to survive a similar enclosure failure in the future without a similar (or idealy any) service interruption.
We appreciate your patience and cooperation during the resolution of this issue. We continue to work to improve the way we manage all our services, including during emergencies and appreciate your feedback.
As always, if you have any questions, suggestions or need support please drop us a line at This email address is hidden from email harvesters via JavaScript or call us at 609-688-1111
- The Solutions Team
We strive to ensure that all our products are reliable and consistent. Whenever services are interrupted we work hard to get to the bottom of the cause and solutions so that such an event does not happen again.
On Monday, December 28 we experienced an outage for selected clients from 8:58 a.m. to 9:57 a.m. EST.
The underlying cause:
An error in the Storage Area Network (SAN) supporting mailboxes hosted on the MAIL34 server removed client access to the mailboxes. We troubleshooted the issue and were able to bring the SAN and server back online within an hour.
Steps taken to prevent reoccurence:
We have implemented additional monitoring of the SAN in order to be informed quickly of this specific condition, so that if this issue ever occurs again, the downtime associated will be much shorter.
We are researching this issue further in an effort to eliminiate the possibility of it occuring again.
Network Stability:
Overall, the entire SureMail™ environment has enjoyed a 99.902% availabilty rate over the past 365 days, along with very few scheduled maintentance periods. The MAIL34 mailbox server has experienced an availability rate of 99.906% since it was brought into production approximately 7 months ago.
Should you have any questions about the above, please let us know at This email address is hidden from email harvesters via JavaScript
Best regards,
- The Technical Support Team at SureTech.com
We strive to ensure that all our products are reliable and consistent. Whenever services are interrupted we work hard to get to the bottom of the cause and solutions so that such an event does not happen again.
On Tuesday, July 22 we experienced an outage for selected clients from approximately 5 a.m. to 11 a.m. EST.
The underlying cause:
An error in the behavior of clustering services led to the offlining of a number of mailbox stores which prevented access to those mailboxes. The same event also introduced inconsistencies into the log files that are generated for these mailbox stores which made bringing them back online a lengthy process with some element of risk. Once we had taken steps to ensure that incoming mail would continue to be accepted by our incoming mail servers we made copies of all affected mailbox stores to ensure that existing data was secure before beginning the process of rebuilding the mailbox stores. The rebuild process is resource intensive and to minimise the downtime for our customers we allocated additional hardware resources to the recovery process. Recovery of mailboxes began 3 hours after the initial problem and was complete 9 hours later. Other dependent services were brought up on completion of this work.
Steps taken to prevent reoccurrence:
Should you have any questions about the above, please let us know at This email address is hidden from email harvesters via JavaScript
Best regards,
- The Technical Support Team at SureTech.com
At 1:49 AM EDT on 6/30/09, a brief power interruption in our Data Center appears to have severely damaged one of the four UPS's in one of our racks. (This UPS had not exhibited any symptoms of issues going into the power interruption.) The damaged UPS resulted in half of the rack's AC power supply being removed.The infrastructure in that rack was designed to continue to function in this type of partial power outage, but several limitations in this design were exposed yesterday, resulting in the queuing of all inbound email of organizations using the Ultimate Anti-Spam Protector option, an outage of BlackBerry service, and issues with one of our two infrastructure monitoring systems. At 8:42 AM, we re-routed inbound email from the queues to the Ultimate Anti-Spam Protector service, and the inbound email resumed processing. Due to a configuration issue with the re-routing, some organizations' inbound email was 'bounced' back to the email sender, instead of being successfully delivered. We were able to restore most email service by 9:30 AM. Some isolated issues with email and BlackBerry service remained until everything was fully resolved at 12:40 PM.
To prevent this type of issue in the future, we have taken corrective actions so that the Ultimate Anti-Spam Protector processing and BlackBerry services will continue functioning in the event of this type of issue in the future. We are in the process of updating the affected infrastructure monitoring system so that it too will operate properly during this type of issue. And we are replacing the affected UPS with a model that will provide our monitoring system with more diagnostic information, to help reduce the probability of a UPS-caused AC power outage occurring again.
We apologize for the service interruption, and we will build on the corrective actions we have already taken, as we continue to strive to provide the highest possible service level on a proactive basis. If you have any questions or concerns please do not hesitate to contact us at
This email address is hidden from email harvesters via JavaScript
or 1-800-882-8701.
- The SureTech.com Solutions Team
On June 22, 2009 we experienced a serious outage on our SureDesk™ systems. While attempting a minor stability upgrade, our systems admins encountered an unfortunate irreversible bug that crashed the connection service to our SureDesk™ Gold environment
As it happens we also had a parallel upgrade standing by for release this weekend that we were able to move up to be in effect today and include when we restored service.
Service was down from 7:30am to 3:07pm and we sincerely regret the inconvenience to all affected SureDesk™ Gold users. Going forward we have adjusted our upgrade policy for bugs to take less risks while system upgrades are also being rolled out. Please note SureDesk™ Platinum users were not affected. Our Gold services don’t have a fully redundant failover standby which contributes to the difficulties in restoring service we saw today.
Also please note in addition to policy changes, we are streamlining work arounds if this were to happen again (which we do NOT expect) including old-school “Terminal Services” access and streamlined local synchronization of SureFiles™. Feel free to contact us for more information.
On the good news Toot-Toot side you should find a number of benefits from the upgrade now that we suffered through the service interruption:
General reliability and performance improvements:
· multi-monitor support
· Additional intelligent printing and reliability
· graphics and color resolution improvements - certain video such as youtube.com now works better on the SureDesk™
Thanks for your patience and please let us know if we can do anything to be of help or if you need help restoring or upgrading your connection.
We strive to ensure that all our products are reliable and consistent. Whenever services are interrupted we work hard to get to the bottom of the cause and solutions so that such an event does not happen again.
On Monday, August 2 we experienced an outage for selected clients from 3:27 p.m. to 5:20 p.m. EST.
The underlying cause:
The partial outage today was caused by a problem with one of the application firewalls. It failed in such a way as to 'lock' those sessions that had been using it, and it required an on-site intervention to correct the issue.
Steps taken to prevent reoccurence:
We have taken action to prevent this failure mode from happening again, and also to enable remotely correcting this issue so that if this issue ever occurs again, the downtime associated will be much shorter.
Network Stability:
Overall, the entire SureMail™ environment has enjoyed a 99.902% availabilty rate over the past 365 days, along with very few scheduled maintentance periods. The MAIL34 mailbox server has experienced an availability rate of 99.906% since it was brought into production approximately 7 months ago.
Should you have any questions about the above, please let us know at This email address is hidden from email harvesters via JavaScript
Best regards,
- The Technical Support Team at SureTech.com
13 Hours and $1,400.00 To upgrade my Hard Drive?!?
We’ve always said that Managed Services for IT is usually a flawed business model. Pretty much the better job you do the less you make. Kinda like lawyers, I guess, except at least we talk about ...
Xobni which is inbox spelled backwards is an absolutely terrific plug in for Microsoft Outlook except for the small fact that it doesn't work... - view comments
American based, for american customers - and email.
That's pretty much the price of excellent service these days. If you outsource your service to a place that doesn't care about your customers ... - view comments