Incident log

We are doing our best to keep our services always up and running. Despite all commitment and caution it can happen that you experience a disruption of our services. While you and we rely on a number of third party services, the problem (and solution) may even be beyond our power. On this page we will log incidents with some background information of what happened and once the cause is known, the details will be shared.

Interrupted Uptrends service (April 6, 2022)

On 6 April 2022, the Uptrends platform experienced two unrelated issues, both affecting check execution, alerting and access to the platform.

The first issue started around 8:15 UTC and lasted until around 9:00 UTC. It was caused by a problem in the underlying infrastructure of AMS-IX, affecting a large number of their customers, including both of Uptrends' datacenters. For more information about this incident, we refer to this outage report published by AMX-IX.

The second issue started around 13:30 UTC and was resolved at 14:30 UTC. This issue was caused by a software bug that was introduced during the release of a new version earlier on the same day. As a result of this, the performance of the Uptrends database was severely impacted. As soon as the issue became apparent, software engineers worked to identify the problem, and released a version to mitigate it.

Incomplete waterfall caused by Chrome service worker issues (November 16, 2021)

Note (16/03/2022): the issues with service workers in Chrome have since been resolved. There will no longer be missing elements in waterfall charts.

With the release of Chrome 96, service workers no longer install correctly, which can result in elements missing in the waterfall of Full Page Check monitors. Uptrends always runs its checkpoints on the latest stable version of Chrome, which can sometimes expose the Uptrends application to bugs like these.

Background: Service workers

A service worker is a script that runs in the background of your browser, independently of a web page. It allows you to implement features such as caching, push notifications or synchronizing data in the background. Service workers are able to intercept network traffic and programmatically retrieve results from caches.

What is the problem?

Since Chrome 96, Chrome has changed the way to register for service worker-related events. However, this has not been properly implemented in ChromeDriver. ChromeDriver, also maintained by the Google team, is the tooling Uptrends uses to automate browser checks. As a result, the service worker hangs while being installed, which causes missing elements in the Full Page Check waterfall. We are working with the relevant teams to fix this issue.

Impact and mitigation

Most web sites using service workers will still load the page correctly. However, some or many elements will be missing in the waterfall, or the behavior of the page can change. The result is that information may not be available, e.g. to debug issues. Also, it may effect the total time reported. As a mitigation for incorrect total times, you can switch to load times based on the W3C event. To see more elements, you can switch the browser to for example Firefox.

More information

See issue https://bugs.chromium.org/p/chromium/issues/detail?id=1270761.

Let’s Encrypt certificate issues (April 30, 2021)

On Friday April 30, 2021, around 7:40 PM UTC, a significant number of HTTPS monitors of multiple Uptrends customers started reporting errors, stating that the HTTPS certificate could not be validated. Not all monitors showed this problem; it happened only for sites using a TLS certificate issued by the Let’s Encrypt certificate authority.

Background: HTTPS monitors perform certificate checks

HTTPS monitors check the availability of the specified URL. They also check the validity of the HTTPS certificate presented by the server, if the option Check SSL certificate errors on the Advanced tab of the monitor settings is active. Certificates are only valid if they haven’t expired yet. Aside from expiring automatically at some point (typically after a year), certificates can also be revoked by the certificate authority. Therefore, in order to perform a solid check and to ensure that the certificate can be trusted, the HTTPS certificate check also needs to verify that the certificate hasn’t been revoked. Without that, the check is essentially inconclusive.

What was the problem?

The revocation checks happen in two ways: through OCSP, and through a certificate revocation list (CRL). Several hours after the incident started, reports by Let’s Encrypt staff revealed that they had been serving an expired CRL, which caused CRL checks to fail and report errors. Consequently, Uptrends monitors reported a possible insecure situation as the validity of these certificates simply couldn’t be determined.

This wasn’t affecting only Uptrends monitors: anyone using .NET or Java code to access sites and APIs would have run into this issue. The problem was solved by Let’s Encrypt on Saturday May 1, 2021, at 12:04 AM UTC.

Browsers did not report this problem

Browsers often use their own internal certificate revocation lists, which do not rely on certificate authorities. As a result, affected web sites showed up OK in a browser.

Conclusion, recommendations and follow-up

There was a genuine problem. Therefore, the error messages (errors or alerts? both?) generated by the Uptrends HTTPS errors were correct, since we could not guarantee the validity of certificates, and the security they are meant to provide.

However, we realize that it was virtually impossible for you to take any action to solve the issue, as the disruption was entirely caused by external factors. To give you more options in the future, our engine teams will consider adding additional settings that let you decide the level of certificate checks you want to execute (including revocation checks or not).

When a problem like this happens, and you’re certain that you want to temporarily ignore this type of error, you can bypass certificate checks by deactivating the Check SSL certificate errors on the Advanced tab of the monitor settings.

The Let’s Encrypt status report for this issue is posted at https://letsencrypt.status.io/pages/incident/55957a99e800baa4470002da/608c9dd384a5cf052fc6ed24.

By using the Uptrends website, you consent to the use of cookies in accordance with our Cookie Policy.