While most organizations say they continuously monitor the health of their systems, they often underinvest in this critical part of the DevOps process.
Maybe it’s because monitoring is boring and for years the status quo has sufficed.
Monitoring isn’t as collaborative and engaging as planning, with aesthetically pleasing Kanban boards, or as value-driving as development. Monitoring doesn’t come with the fascinating tooling of deployment.
As organizations increasingly opt for cloud-based infrastructure through microservices, the ability to observe and monitor a rapidly growing application stack is becoming more challenging. The failure of traditional monitoring techniques has given rise to a new discipline: Observability. This discipline has brought along great innovations, tools, and platforms to help monitor and analyze the health of systems wholistically.
In this article, we analyze the reviews of seven top observability tools that are reinvigorating the world of monitoring.
What is observability?
The concept of observability has been around since the 1960s and refers to the extent to which you can understand the relationship, health, and behavior of systems by observing external outputs. Developed by Hungarian-American engineer, Rudolf E. Kálmán, observability is heavily rooted in applied mathematics, and relies heavily on control theory and systems theory.
When it comes to cloud computing and application management, observability refers to the ability to measure and monitor a distributed system holistically. An observability tool provides users with the ability to collect, process, and analyze telemetry data, like logs and traces (external outputs), providing insights into system behavior.
Monitoring vs. observability
What is the distinction between monitoring and observability? Well, for one, monitoring and observability are not at odds with each other, they’re related. You can’t have observability without monitoring capabilities. In short:
- Monitoring is the action of capturing system output data to identify, diagnose and manage system health. It is reactive.
- Observability is a state. Observability is achieved through continually monitoring all output data throughout the application landscape, and to not only monitor, but to analyze, making inferences about and determining the state of the system. It’s proactive.
So, what are the outputs that are needed for creating a state of Observability? Those would be metrics, events, logs and traces (handily abbreviated as M.E.L.T.). Let’s take a quick look at each of these:
- Metrics: Metrics are numerical measurements. Metrics are typically defined to be captured ahead of time. For example, a Point of Sale (POS) might be configured to capture an aggregate such as an average transaction amount in a particular hour.
- Events: An event is a specific action that occurs in a moment of time. Continuing with the POS system example, an event could capture the time, date, and other metadata about a specific order placed.
- Logs: The most well-known (and dreaded) type of telemetry data, logs are text entries that indicate when and what specific code blocks are executed. Most classic monitoring is focused on logs—an issue happens and engineers comb through lines and lines of text entries until they find some sort of insight into what occurred.
- Traces: Traces are discrete, causal chains of events between different components of the system. Traces connect special events called ‘spans’, such as the ‘span’ of time between two or more microservices. Use trace data when you want to know the relationship between services or entities.
Monitoring could involve reviewing one or more of these data types for a particular system. Considering that each application in your stack could produce this data, imagine how difficult it would be to manage as your application landscape grows! Observability provides better context by analyzing all telemetric data generated through the entire system.
Microservices, macro view
As the adoption of microservice and service-oriented architecture becomes more prevalent, competitive businesses are beginning to deliver more services more often through a decoupled and disparate application landscape.
The larger the application landscape, the more logs, trace files, and analytic data are left to be reviewed, managed, and optimized. Naturally, companies begin growing their DevOps teams to support, monitor and debug the rapidly increasing and independent applications increasingly added to the mix.
And here lies the problem: We want decoupled architecture and all the benefits of micro service architecture, but we want a holistic view of application health. This is where adding observability tools to the DevOps tool chain provides value.
What does an observability tool do?
Put simply, observability tools allow organizations to practice a state of observability. What this means is that an observability tool typically does the following things:
- Provides reports and dashboards that display system health
- Monitors key metrics around performance and business objectives
- Provides automated alerts that identify potential threats or issues
- Provides tools to generally help trace, resolve and collaborate around solving issues
Seven top-rated observability tools
With that in mind—and in no specific order—let’s take a look at seven top observability tools, including what makes them special, what users like about them, what they don’t, and how their pricing stacks up in comparison to one another.
Founded in 2010, Datadog is a New York-based company recently named as a 2022 Gartner quadrant leader for observability and APM. Datadog offers over 500 rich, vendor supported integrations and a host of ready-to-use application monitoring visualizations.
What makes Datadog special?
Robust AI and predictive machine learning. Datadog’s Watchdog AI includes root cause analysis and log anomaly detection, automatically identifying causal relationships between events across and enterprise stack, capable of identifying, even the exact users impacted.
What do users like about Datadog?
Users can enjoy a quick and simple setup process, getting up and running with limited support. If customers need support, Datadog performs well on meeting the support needs of their customers. They also point to the rich selection of dashboards and visualizations into the health of their stack
What do users dislike about Datadog?
Users note that integration configuration can be challenging and may require considerable help from the Datadog support team. Another common source of gripe is the price point – specifically the way contracting is modeled.
While being the leader of the industry, Datadog is one of the more expensive Observability tools. Pricing is done on a per host/per month basis and offers a tiered level of functionality and support.
2. New Relic
Another Gartner Quadrant leader, New Relic is San Francisco-based and founded by Lew Cirne. New Relic offers a deep view into network infrastructure and applications by utilizing a variety of machine learning models
What makes New Relic special?
Powerful Real-Time Monitoring. Whereas some Observability tools, like Datadog don’t offer real-time monitoring, New Relic does. New Relic is also especially equip to handle web and mobile application monitoring.
What do users like about New Relic?
Users like that it takes advantage of historical log data to detect historical bottlenecks and can deeply analyze consumption trends as well as the NRQL (New Relic Query Language) that makes it easy to query relative system information. Users also like how New Relic can fit and entire system health on a single paged dashboard which makes for an at-a-glance overview.
What do users dislike about New Relic?
A common complaint is the New Relic user interface being clunky and not as intuitive to navigate as some of the competitors boast. Reviewers cite response time as another potential criticism, with dashboards taking longer to retrieve information than anticipated.
New Relic pricing
New Relic has an awesome freemium product. Outside of that, New Relic is priced on data usage (100 GB of free data per month) and user seats.
3. IBM Instana Observability
One of the best-known and most influential companies in the tech industry, IBM needs no introduction. The addition of Instana to IBM Cloud has been a successful tool in the observability space, providing real-time service mapping and automatically discover application service on your stack
What makes Instana special?
AI enables behavioral learning. The data analytics capability is perhaps the most powerful of the group, with an entire repository of application request trace data.
What do users like about Instana?
Users like how Instana can manage many complex applications while providing an intuitive framework. Root cause analysis capabilities can easily detect SLA Violations.
What do users dislike about Instana?
Users often feel that there is a lack of documentation readily available for understanding and working through common setup. Critics also note several backlogged and requested new features.
Instana charges a per host rate and offers unlimited users and slightly more for on-premise hosting.
Grafana is one of the most popular open source APM and observability tools in the market. Unique product differentiators like Grafana ‘OnCall’ provide flexible and automatic escalations so engineers can quickly coordinate and address issues that arise.
What makes Grafana special?
Open Source. Grafana is one of the most popular open source projects on GitHub, rich in community. Grafana makes the tools of observability more accessible.
What do users like about Grafana?
Being open source, users enjoy its pluggable extendibility and the depth of community support that typically goes along with it.
What do users dislike about Grafana?
Because of the security concerns that go along with open source, not every enterprise can take advantage of it. It can also be difficult to get support for unique situations. Users also complain about the initial complexity of setup.
Grafana comes in a few versions, starting with a free version all the way to custom enterprise pricing. The company’s pro version supports a blended pricing model that consists of a per user base price plus storage cost dependent on GB of logs and traces ingested.
The Gartner 2022 Quadrant leader for APM and observability, Dynatrace combines application security, observability and AIOps into a unified and robust platform. An open-source framework allows a uniform view among disparate applications.
What is special about Dynatrace?
Omni-channel user experience. A single data model across all channels ties together the complete picture of your entire stack.
What do users like about Dynatrace?
Most reviewers are amazed at the ease of implementation and user experience. Dynatrace allows its users to react quicker and more efficiently to potential issues so that they can focus on driving innovation.
What do users dislike about Dynatrace?
Reviews claim that the initial setup can be slow as well as lag with transaction time. Users also note the relative price seems to be too high.
While Dynatrace pricing ranks higher than industry standard, dynatrace offers a breadth of pricing options and customizations.
Founded in 2003 and headquartered in San Francisco, Splunk is a flexible and powerful observability tool. Splunk is database agnostic and can cover the entire technology landscape.
What is special about Splunk?
Splunkbase. Splunkbase is an application exchange that has over 2,000 partner and customer built applications that can extend your unique use cases. Users can choose from over a dozen categories of applications that serve your business objectives.
What do users like about Splunk?
Users enjoy the flexibility of the Dashboards, which provide both a high level and granular view of system health. They also like the ability to create complex filters to dive into the data. Users appreciate fast and responsive widgets, with quick load times. Support is also highly rated, with a global presence available in over 100 regions.
What do users dislike about Splunk?
A common criticism is the lack of features available on the PaaS version of the platform versus On-Premise. Product-related training is also available, but at an additional cost and certifications are expensive.
Splunk’s core offerings are priced per host with add-ons and services available at an added cost. Some of these add-ons include automatic incident response through Splunk On-Call and synthetic monitoring.
Acquired by Cisco in 2017, AppDynamics is a robust suite of observability tools that offers quick setup times, seamless on-platform collaboration, AppDynamics allows you to correlate stack performance with key business metrics.
What is special about AppDynamics?
Support. Being part of Cisco, AppDynamic offers readily available and global support to it’s customers.
What do users like about AppDynamics?
Users love the speed of implementation, setup, and end-user monitoring. The out-of-the-box features are powerful, and installation is intuitive. Users also especially like the Experience Journey maps – a view of customers’ most traveled journeys and an interactive performance view at each step.
What do users dislike about AppDynamics?
Reviewers would like the ability to make configuration and setup changes through an API, rather than just the UI. This would make agent deployments and upgrades much quicker. Users have also noted a delay in alerts during SLO violations or system failure.
AppDynamics a few pricing options. The infrastructure monitoring addition, being the least expensive and rising to Enterprise Edition. Each pricing plan charger per ‘CPU CORE’, or Host.
Observability tools, with the help of AI and machine learning algorithms are allowing enterprises to safely scale, while still maintaining critical insight on the health and security of their environment.
Application infrastructure is changing. The way organizations monitor and analyze application health needs to evolve to meet those changes.
Disclaimer: The views and opinions expressed above are those of the contributor and do not necessarily represent or reflect the official beliefs or positions of Sofy.