Technical Details:

The Scalyr Agent acts as a client to the Scalyr SaaS log platform. The Scalyr Agent connects to Scalyr server side APIs via TLS v1.2. However, the Scalyr agent does not check the host name in the server’s certificate. An attacker could set up an endpoint with the same certificate trust chain as Scalyr but for an alternative domain. If the attacker were able to set the DNS for agent.scalyr.com (DNS hijacking) to point to their API endpoint, the Scalyr agent would receive the certificate from this malicious endpoint and assume that the endpoint it connected to is a legitimate Scalyr API endpoint.

Scalyr agent has 3 ways of establishing a TLS connection to the the API endpoint:

  1. Utilizing native Python modules (ssl, socket)
  2. Utilizing third party tlslite Python library
  3. Utilizing requests Python library

The Scalyr Agent is used in a wide variety of environments, some of which are quite old. These three connection techniques are needed to support the many permutations of openssl, TLS, and/or Python versions our customers rely on.

When using the second approach (tlslite) we have two ways for verifying the CA and the server hostname:

  1. Using certvalidator library
  2. Shelling out to openssl binary

Initially, we found the issue in the tlslite + certvalidator code path. Our code incorrectly fell back to the "shell out to openssl" approach in case certvalidator code threw an exception.

That openssl code path was vulnerable since it didn't pass the "-verify_hostname" flag to the openssl binary.

Additionally, the way we vendor / bundle the cffi dependency used by certvalidator made it very likely that certvalidator would throw an exception. (We bundled the library incorrectly, with the result that the certvalidator library would only work on systems where the same version of the cffi library we bundle is already installed).

This issue is specific to our usage of tlslite. However, if connection establishment using native Python modules (ssl, slocket) fails for any reason, the Scalyr Agent would always fall back to tlslite.

After further auditing, we discovered that the native Python implementation is also vulnerable, because it didn't verify that the hostname the client is connecting to matches the one returned in the server certificate (commonName, subjectAltName).

We recognized that multiple code paths for performing the same security sensitive task increases the surface area of vulnerability, making it more likely for bugs to be introduced and possibly go undetected. We also removed the tlslite + openssl code path and now have a single way of securely and correctly establishing a TLS connection.

As part of those fixes, we also made code changes to use more secure defaults when agent is running on Python >= 2.7.9 (it’s highly recommended that people who still run Python 2.6 and 2.7 upgrade to the latest version in Python 2.7.x release series). When running on Python 2.7.9 or newer, the Scalyr Agent now requires TLS v.1.2, and will refuse to connect if the server only supports older versions of TLS.

In addition to that, we plan to drop support for Python 2.6 and Python 2.7 < 2.7.9 in the future, since those versions are old and ship with known insecure defaults.