I started using the AWS X-Ray service 3 weeks ago and wanted to pass along my experiences. The service collects and reports data about your applications. That data can include the time required to execute sections of your code. This can provide both the absolute time required and comparison of changes between different software releases or external interface behavior changes.

In 2 cases so far the trace data immediately helped me identify the location, down to the function level, of where a slowdown was occurring in request processing of a Python Flask application running in an AWS Fargate cluster. The value of trace and metric data cannot be underestimated when it is needed. Without that data the labor and time cost of problem isolation and understanding is significantly greater than the effort in adding the instrumentation.

The code changes involved a couple of imports and maybe 10 lines total to begin getting traces. Each area of code that metric data is desired only requires 2 or 3 lines of code surrounding it.

To control cost it is possible to include a sampling rate configuration setting where metrics on every execution are not collected. While other tools, such as licensed ones, also provide data like this they can be much more expensive. I am also a believer in staying in the AWS ecosystem whenever possible. X-Ray makes it much easier to collect metrics in all of your environments and sampling rate controls per environment lets the cost be whatever you like in each case.

More about X-Ray at https://aws.amazon.com/xray/