Measuring Performance in Software Engineering

Always add measures to prevent the system from degrading

Ignacio Chiazzo
Better Programming

--

In my previous posts, I described two types of Performance optimizations: Macro and Micro Optimizations and the 6 steps required to improve performance. A fundamental step in the process is to prevent performance degradation from happening in the future. In this post, we will dig into this essential step.

How often have you seen a significant performance improvement, and months or years later, the performance is worse than before the optimization?

Example: the team improved the system by implementing a new caching system. The team improved the median latency from 40ms to 20ms and added great dashboards. Two years later, the latency is 50ms. What could have prevented this from happening? Does the team know that the performance is worse than two years ago? Do they know about the fix that was added? Probably not. The team could be an entirely new team, or worse, there could be no owner of that area.

Well, this is a widespread issue. As developers, we must ensure we add measures not to degrade the performance. In this post, you will learn actions we can take to prevent this. I will use examples in Ruby, but the same applies to most languages.

Measure Performance

Developers should know how fast the area/software is (latency) and how much traffic it could support (throughput). This could require monitoring tracking applications such as Datadog, Rollbar, Splunk or Sentry.

Image 1. Example Datadog dashboards for Products API latency.

The previous image shows Datadog dashboards for API latency of a personal project, notably p50 and p95. Remember to never use average; use percentiles.

Add Monitors and Alerts

Dashboards are essential but become more valuable with monitors. Alerts will notify developers when the performance surpasses a threshold or when there is an anomaly. It can alert developers via email or slack like the following example:

Image 2. Datadog alert sent to Slack example.

Add baselines to all the dashboards so that developers know the expected latency window.

Tests are key

Each performance optimization change should have tests. To illustrate this, let’s see some examples of the most common performance changes and the actions we can take to prevent performance degradation:

Performance change: Reduced the object’s allocations of a hot path function.

Test: Add tests that count the object’s function allocations and compare it against a threshold. Many libraries provide test helpers.

Example of a function memory_allocations that asserts the number of allocations is below a threshold.

Performance change: Reduced the number of database queries executed.

Test: Add tests asserting the number of expected queries or the queries themselves (be careful with flakey tests!)

Example of a function that asserts the number of database queries executed.

Performance change: Used a better Database query (e.g. added a database index hint).

Test: Add tests asserting the query executed.

Example of a test that asserts the MySQL database hint is used.

Performance change: Reduced the number of calls to the cache by batching the queries

Test: Add tests proving that the cache is called with multiple keys in a single cache call.

Example of a function that executes a fetch_multi cache call instead of a single fetch.

Performance change: Reduced the number of database queries by introducing a cache.

Test: Add tests to prove when there is a cache hit and/or miss.

Example of a function that uses cache instead of a relational database

Performance change: Don’t block the main thread by pushing load to background jobs.

Test: Add a unit test checking that the intended jobs are queued.

Performance change: Reduced the frontend bundled size:

Test: Add tools that measure the bundle size and add tests that fail when the size is bigger than the baseline.

Bundle size check on Github using the Bundlesize library.

Performance change: (Frontend) reduced page load by loading fewer data/UI and referring queries:

Test: Add unit tests checking the data and objects in the DOM loaded.

Performance change: Throughput improved by adding more caching:

Test: Run periodic stress tests and ensure the number matches the baseline established.

Conclusion

Always add measures to prevent the system from degrading. Ensure you have the dashboards with expected performance and the right tools, and consider adding tests.

Implementing these strategies will save you time and ensure the performance goals are met forever or adjust the baseline accordingly. A performance improvement is incomplete if there aren’t tools and measures to prevent performance degradation.

Thanks to Alex Watt and Siddhant Bajaj for reviewing this post.

--

--