In the pervious blog, we exploration into Golang's memory-related features, but it is just the beginning. In forthcoming articles, we will delve deeper into various facets of Golang, uncovering more advanced topics and best practices.
Golang Profiling
The Profiler runs the user program and configures the operating system to send the SIGPROF signal periodically:
- First, upon receipt of the SIGPRFO signal, user program execution is suspended.
- Then, the profiler collects the status of the user program.
- Finally, the user program is resumed when the collection is complete, and so on.
Profiler is based on sampling, which affects the program performance to some extent.
"Before you profile, you must have a stable environment to get repeatable results."
2.1 Supporting Profiling
By default, all the profiles are listed in runtime/pprof
.
a. CPU Profiling
When CPU profiling is enabled, Golang runtime will interrupt the application every 10ms by default and record the stack information of the Goroutine.
b. Memory Profiling
Memory profiling is the same as CPU profiling, it is also based on sampling, it will record stack information when heap memory is allocated. It will record stack information when heap memory is allocated. The default sampling frequency is once every 1000 heap memory allocations, and this frequency can be configured.
Note: Memory profiling only records heap memory allocation information, ignoring stack memory usage.
c. Block Profiling
Block profiling is similar to CPU profiling, but it records how long the goroutine waits on shared resources. It is useful for checking the concurrency bottleneck of an application. Blocking statistics mainly include:
- Read/write unbuffered channel
- Write full buffer channel, read empty buffer channel
- Locking operations
If based on net/http/pprof, the application needs to call runtime.SetBlockProfileRate to configure the sampling frequency.
d. Mutex Profiling
Go 1.8 introduced a mutex profile that allows you to capture a portion of the stack of goroutines competing for locks. If based on net/http/pprof, you need to call runtime.SetMutexProfileFraction in your application to configure the sampling frequency.
Note: It is not recommended to change the default values of the golang profiler when profiling an online service via net/http/pprof. This is because modifying some of the profiler parameters, such as increasing the memory profile sample rate, may result in a significant degradation of program performance unless you are aware of the potential impact.
2.2 Profiling Commands
We can get the profile file from the go test
command, or from an application using net/http/pprof:
## 1. From unit tests
$ go test \[-blockprofile | -cpuprofile | -memprofile | -mutexprofile\] xxx.out
## 2. From long-running program with `net/http/pprof` enabled
## 2.1 heap profile
$ curl -o mem.out http://localhost:6060/debug/pprof/heap
## 2.2 cpu profile
$ curl -o cpu.out http://localhost:6060/debug/pprof/profile?seconds=30
After getting the profile file, analyze it with the go tool pprof:
# 1. View local profile
$ go tool pprof xxx.out
# 2. View profile via http endpoint
$ go tool pprof http://localhost:6060/debug/pprof/block
$ go tool pprof http://localhost:6060/debug/pprof/mutex
2.3 Golang Trace
We can get the trace file from the go test
command, or from an application using net/http/pprof:
# 1. From unit test
$ go test -trace trace.out
# 2. From long-running program with `net/http/pprof` enabled
curl -o trace.out http://localhost:6060/debug/pprof/trace?seconds=5
After getting the trace file, analyze it with go tool trace, which will automatically open the browser:
$ go tool trace trace.out
2.4 Profiling Hints
If a large amount of time is consumed in the function runtime.mallocgc
, it means that the program has undergone a large heap memory allocation, and Memory Profiling can be used to determine where the code is allocating heap memory.
If a lot of time is spent on synchronization primitives (such as channels, locks, etc.), the program may have concurrency problems, which usually means that the program workflow needs to be redesigned.
If a lot of time is spent on syscall.Read/Write, the program is likely to be performing a lot of small IO.
If the GC component is consuming a lot of time, the program may be allocating a lot of small memory, or allocating a large amount of heap memory.
2.5 Code Demo
The code can be found in the document: contention_test.go
-
Block Profiling with Unit Test
$ go test -run ^TestContention$ -blockprofile block.out $ go tool pprof block.out (pprof) top (pprof) web
-
Mutex Profiling with Unit Test
$ go test -run ^TestContention$ -mutexprofile mutex.out $ go tool pprof mutex.out (pprof) top (pprof) web
-
Trace with Unit Test
$ go test -run ^TestContention$ -trace trace.out $ go tool trace trace.out
2.6 Bibliography
As we conclude our exploration of Golang profiling, it's clear that this is just the beginning of a deeper journey into Golang's performance optimization.