Incident log | Uptrends

We are doing our best to keep our services always up and running. Despite all commitment and caution it can happen that you experience a disruption of our services. While you and we rely on a number of third party services, the problem (and solution) may even be beyond our power. On this page we will log incidents with some background information of what happened and once the cause is known, the details will be shared.

Issue with push notifications on the Android mobile app (October 2024)

The Uptrends mobile app for Android is currently experiencing an issue where push notifications may not be functioning correctly. The iOS app is unaffected. We are aware of a potential cause, and will continue to investigate. In the meantime, reinstalling the application should resolve the problem.

If you encounter this issue after a reinstall of the application, please do not hesitate to let us know by contacting our Support team.

Monitor screenshots security incident (June 2024)

On the 28 May, we started deploying a rolling update across Uptrends' monitoring checkpoint locations. The deployed version of the monitoring software contained a problem that caused some web page screenshots created by the software to be displayed to other Uptrends customers. This issue affected only HTTP/HTTPS monitors, so no browser monitors were affected.

Following the discovery on 30 May, we deployed a fix that was effective immediately across the entire checkpoint network. We were able to quickly identify which screenshots were created by this faulty version of the software, and those screenshots were removed immediately, eliminating any further risk of exposing screenshot information to other people.

Analysis

We have full visibility on the screenshots that were affected by this error, and the monitors they relate to. We also have full visibility on the access logs for all monitoring data and underlying data, including screenshots. We have meticulously analyzed which affected screenshots were subsequently accessed or not. At no point was there any ability for customers or third parties to access anyone’s account data, other than the screenshots mentioned earlier. At no time did anyone have access to operator passwords or to credentials stored in the vaults of the respective accounts. We can also confirm that financial data including credit card information was not compromised.

Going forward

We have reviewed our procedures and put in place appropriate safeguards to ensure that this type of incident cannot be repeated. We would like to reassure you that, following our investigation, this issue poses no further risk to our customers or the Uptrends software.

Please feel free to contact us if you have any questions or concerns regarding this matter.

Delayed email delivery (March 2024)

An incident started occurring on March 12, 2024, that impacted our email service, which affected the timely delivery of some of the alert emails or reports you were expecting from us.

Incident Details

Affected Service: Email delivery system
Start Time: March 12, 2024, 22:00 CET
End Time: March 13, 2024, 11:00 CET
Impact: Delay in the sending of alert emails, report emails and any other emails sent by the Uptrends system.

Note: This problem did not necessarily affect all of your emails: a majority of the messages was delivered on time as usual, but some were delayed.

Resolution

As of 11:00 CET on March 13, all alert emails or other emails triggered after that time were sent on time again. At the same time, we initiated the process of sending out all unsent emails.

Next Steps

We recommend checking your inbox for any delayed emails that may have arrived after the resolution of this incident. Additionally, if you have any concerns or questions about specific alerts or reports, please do not hesitate to contact us.

We sincerely apologize for any inconvenience this may have caused and appreciate your understanding as we worked to resolve this issue. Our team is committed to providing reliable services, and we are taking steps to prevent such incidents from occurring in the future.

Incident affecting transaction monitors (March 2024)

An incident occurred on March 12, 2024, that has impacted a specific subset of our transaction monitoring service. This issue was isolated to transactions utilizing the ‘Chrome Standard’ setting and resulted in the incorrect reporting of Navigation errors, specifically error code 7001.

Incident details

Affected Service: Transaction Monitors (Chrome Standard setting only)
Start Time: 14:00 CET
Resolution Time: 18:20 CET
Impact: Intermittent or continuous errors due to incorrect Navigation error reporting, potential unwarranted alerts, and inaccurately recorded downtime.

We understand the importance of accurate monitoring and the inconvenience this may have caused. The problem was fully resolved by 18:20 CET, and we have taken steps to prevent such incidents in the future.

Were you affected?

If your transactions run using the ‘Chrome Standard’ setting, it’s possible your monitoring was impacted. The issue did not immediately affect all monitor checks, which may have resulted in intermittent errors.

Next steps for affected customers

Uptime Recalculation: We acknowledge the inaccuracies in downtime reporting due to this incident. For more information about clearing errors and recalculating your uptime, please see our knowledge base.
Alert reviewing: We recommend reviewing any alerts received during this time frame for accuracy.

We sincerely apologize for any inconvenience this may have caused and appreciate your understanding as we worked to resolve this issue.

Delayed email delivery (December 2023)

An issue was recently identified in our email delivery system. Over the past several days, a technical anomaly caused some emails generated by our service to be queued rather than being sent immediately. This issue was resolved on December 20, 2023, when our team successfully reinstated the normal email delivery process, ensuring the prompt dispatch of all queued emails.

As a result of this incident, you may have experienced a delay in receiving certain emails. We have since enhanced our monitoring protocols for the email delivery process to prevent similar occurrences in the future.

We apologize for any inconvenience this may have caused and appreciate your understanding. Should you have any concerns or require further clarification regarding this matter, please contact our Support team.

Issue with timeline screenshots (October 2023)

To best represent your end users, Uptrends strives to remain up to date with the browser versions used to execute the browser-based monitoring in your account. To this end, we follow the Chrome release cadence. Unfortunately, the rollout of Chrome 118 introduced a bug with the capture of screenshots within the Chrome dev tools. In Uptrends, that meant that any browser checks executed by checkpoints running Chrome 118, would be missing timeline screenshots. Instead of a series of screenshots capturing various phases of the page load, the monitor result shows a single blank screenshot.

Since new Chrome releases roll out slowly, that means that since the release of Chrome 118 earlier this month, gradually more and more of our checkpoints were affected by this issue. As a result, more and more browser check results will have missing timeline screenshots.

We have identified a fix, and are currently rolling out updates across our checkpoint network.

Alerting outage (August 21-22, 2023)

Between August 21-22, 2023, the Uptrends platform encountered an issue that prevented us from sending alert messages over any of the available integrations. The issue started on 22 August 01:47 CEST (21 August 7:47 PM EDT) and was mitigated on 22 August 02:52 CEST (21 August 8:52 PM EDT). During this time, no alert messages were sent. Any alerts generated during this period will still be visible in the alert history in your account, as only outgoing messages were affected. Monitoring was not impacted.

As a result of this outage, any platform that handles incoming Uptrends alert messages (such as incident management or automation tools, communication platforms, etc.) may not have received the alert message that should have triggered a reaction of some kind, such as the creation of a ticket, incident, or notification, while subsequent ‘Ok’ messages were received. Vice versa, alert messages may have been received for which the ‘Ok’ message was never sent, meaning the alert or incident may still register as ongoing in external platforms. Refer to the alert status overview in your account for the real-time status of your Uptrends alerts.

Interrupted Uptrends service (April 6, 2022)

On 6 April 2022, the Uptrends platform experienced two unrelated issues, both affecting check execution, alerting and access to the platform.

The first issue started around 8:15 UTC and lasted until around 9:00 UTC. It was caused by a problem in the underlying infrastructure of AMS-IX, affecting a large number of their customers, including both of Uptrends' datacenters. For more information about this incident, we refer to this outage report published by AMX-IX.

The second issue started around 13:30 UTC and was resolved at 14:30 UTC. This issue was caused by a software bug that was introduced during the release of a new version earlier on the same day. As a result of this, the performance of the Uptrends database was severely impacted. As soon as the issue became apparent, software engineers worked to identify the problem, and released a version to mitigate it.

Incomplete waterfall caused by Chrome service worker issues (November 16, 2021)

Note (16/03/2022): the issues with service workers in Chrome have since been resolved. There will no longer be missing elements in waterfall charts.

With the release of Chrome 96, service workers no longer install correctly, which can result in elements missing in the waterfall of Full Page Check monitors. Uptrends always runs its checkpoints on the latest stable version of Chrome, which can sometimes expose the Uptrends application to bugs like these.

Background: Service workers

A service worker is a script that runs in the background of your browser, independently of a web page. It allows you to implement features such as caching, push notifications or synchronizing data in the background. Service workers are able to intercept network traffic and programmatically retrieve results from caches.

What is the problem?

Since Chrome 96, Chrome has changed the way to register for service worker-related events. However, this has not been properly implemented in ChromeDriver. ChromeDriver, also maintained by the Google team, is the tooling Uptrends uses to automate browser checks. As a result, the service worker hangs while being installed, which causes missing elements in the Full Page Check waterfall. We are working with the relevant teams to fix this issue.

Impact and mitigation

Most web sites using service workers will still load the page correctly. However, some or many elements will be missing in the waterfall, or the behavior of the page can change. The result is that information may not be available, e.g. to debug issues. Also, it may effect the total time reported. As a mitigation for incorrect total times, you can switch to load times based on the W3C event. To see more elements, you can switch the browser to for example Firefox.

More information

See issue https://bugs.chromium.org/p/chromium/issues/detail?id=1270761.

Let’s Encrypt certificate issues (April 30, 2021)

On Friday April 30, 2021, around 7:40 PM UTC, a significant number of HTTPS monitors of multiple Uptrends customers started reporting errors, stating that the HTTPS certificate could not be validated. Not all monitors showed this problem; it happened only for sites using a TLS certificate issued by the Let’s Encrypt certificate authority.

Background: HTTPS monitors perform certificate checks

HTTPS monitors check the availability of the specified URL. They also check the validity of the HTTPS certificate presented by the server, if the option Check SSL certificate errors on the Advanced tab of the monitor settings is active. Certificates are only valid if they haven’t expired yet. Aside from expiring automatically at some point (typically after a year), certificates can also be revoked by the certificate authority. Therefore, in order to perform a solid check and to ensure that the certificate can be trusted, the HTTPS certificate check also needs to verify that the certificate hasn’t been revoked. Without that, the check is essentially inconclusive.

What was the problem?

The revocation checks happen in two ways: through OCSP, and through a certificate revocation list (CRL). Several hours after the incident started, reports by Let’s Encrypt staff revealed that they had been serving an expired CRL, which caused CRL checks to fail and report errors. Consequently, Uptrends monitors reported a possible insecure situation as the validity of these certificates simply couldn’t be determined.

This wasn’t affecting only Uptrends monitors: anyone using .NET or Java code to access sites and APIs would have run into this issue. The problem was solved by Let’s Encrypt on Saturday May 1, 2021, at 12:04 AM UTC.

Browsers did not report this problem

Browsers often use their own internal certificate revocation lists, which do not rely on certificate authorities. As a result, affected web sites showed up OK in a browser.

Conclusion, recommendations and follow-up

There was a genuine problem. Therefore, the error messages (errors or alerts? both?) generated by the Uptrends HTTPS errors were correct, since we could not guarantee the validity of certificates, and the security they are meant to provide.

However, we realize that it was virtually impossible for you to take any action to solve the issue, as the disruption was entirely caused by external factors. To give you more options in the future, our engine teams will consider adding additional settings that let you decide the level of certificate checks you want to execute (including revocation checks or not).

When a problem like this happens, and you’re certain that you want to temporarily ignore this type of error, you can bypass certificate checks by deactivating the Check SSL certificate errors on the Advanced tab of the monitor settings.

The Let’s Encrypt status report for this issue is posted at https://letsencrypt.status.io/pages/incident/55957a99e800baa4470002da/608c9dd384a5cf052fc6ed24.