logo头像
Snippet 博客主题

Distributed Tracing In Grab - GopherCon SG 2017

本文于365天之前发表,文中内容可能已经过时。

Distributed Tracing in Grab…

Title: Distributed Tracing in Grab
Date: 25th-26th May 2017

Describe: When we migrated from our Rails monolith to multiple smaller Go apps,
one of the things that we missed the most was the ability to trace a request,
a feature that has in the past helped us pinpoint problems quickly.
In this talk, I will be sharing how we are bringing this feature back into our Go apps.

https://opentracing.io/
https://www.jaegertracing.io/

Example Trace:

  1. Start at the api service
  2. Call the fare, auth and booking services
  3. Request was successful (200)
  4. The fare service takes the most time

If distributed tracing is so great, why is not already everywhere?

  1. Micro-Services is fairly new
    In the monolithic world, request tracing is very common
  2. Lock-in is unacceptable
    Instrumentation must be decoupled from vendors
  3. Inconsistent APIs
    Tracing semantics must not be language-dependent

OpenTracing - concepts

  1. Trace
    Think of it as a directed acyclic graph where each node is a span
  2. SPAN
    Basic unit of Open tracing
  3. TAGS/BAGGAGE
    Key-value pairs

Using context.Context

1
2
3
4
5
6
7
8
9
10
11
12
13
func httpHandler(w http.ResponseWriter, r *http.Request) {
span, childCtx := tracer.CreateSpanFromContext(r.Context(), logTag+".httpHandler"
defer span.Finish()
...
anotherFunctionCall(childCtx, arg1, arg2)
...
}

func anotherFunctionCall(ctx context.Context, arg1 string, arg2 int) {
span, childCtx := tracer.CreateSpanFromContext(r.Context(), logTag+".anotherFunctionCall"
defer span.Finish()
...
}

Challenges And Learning

1. Adding distributed tracing to an existing systems is hard.

Start using context.Context - even if you don’t do Tracing
context can also be used for request timeout, request cancellation or to pass request information

2. Deciding what to create spans for.

Important? Trace it
Time consuming? Trace it
Call something outside your process? Definitely trace it!

3. Differentiating metrics, logging and tracing.

For a request, tracing is a superset.
With the right setup (TeeRecorder), you can pip the metrics and logs accordingly

4. OpenTracing limitations

No API to extract a TraceID/SpanID
Requires the communicating system to use the same tracer implementation