背景
工作中经常会遇到一些常见的Go问题,比如锁竞争、内存占用过高、CPU 占用过高、协程泄露、阻塞操作、GC频繁内存回收等,我们可以通过官方的PProf工具进行迅速排查定位问题,最近工作的2家公司线上服务也都集成了PProf采集和分析的WEB UI工具,所以这是一个非常实在的功能。
使用
一般的基本引入包自动init,并且开启http服务就会默认注册debug路由,如下所示:
import _ "net/http/pprof" //nolint
// ...
_ = http.ListenAndServe(":8080", nil)
常用的性能分析指标有:
类型 | 描述 |
---|---|
profile | CPU占用情况 |
heap | 堆上内存的使用情况 |
goroutine | 当前所有协程的情况 |
allocs | 内存分配情况 |
blocks | 阻塞操作情况 |
trace | 程序运行跟踪情况 |
对于在线采集分析可以用下面的命令:
// cpu
go tool pprof http://localhost:8080/debug/pprof/profile
// 堆内存
go tool pprof http://localhost:8080/debug/pprof/heap
// 协程
go tool pprof http://localhost:8080/debug/pprof/goroutine
// etc ...
命令行模式有top、list、traces、web这几个命令,traces和web都是在浏览器上分析,命令行可以通过top查看占用排行,用list找出具体哪一行代码造成的占用:
➜ play: go tool pprof http://127.0.0.1:8080/debug/pprof/goroutine
Fetching profile over HTTP from http://127.0.0.1:8080/debug/pprof/goroutine
Saved profile in /Users/ileopold/pprof/pprof.goroutine.001.pb.gz
Type: goroutine
Time: Feb 27, 2023 at 2:34pm (CST)
Entering interactive mode (type "help" for commands, "o" for options)
(pprof) top
Showing nodes accounting for 62, 100% of 62 total
Showing top 10 nodes out of 93
flat flat% sum% cum cum%
60 96.77% 96.77% 60 96.77% runtime.gopark
1 1.61% 98.39% 1 1.61% runtime.goroutineProfileWithLabels
1 1.61% 100% 1 1.61% runtime.sigNoteSleep
0 0% 100% 5 8.06% bufio.(*Reader).Read
0 0% 100% 1 1.61% bufio.(*Reader).ReadLine
0 0% 100% 1 1.61% bufio.(*Reader).ReadSlice
0 0% 100% 1 1.61% bufio.(*Reader).fill
0 0% 100% 4 6.45% bufio.(*Scanner).Scan
0 0% 100% 1 1.61% database/sql.(*DB).connectionOpener
(pprof) list runtime.gopark
Total: 62
ROUTINE ======================== runtime.gopark in /usr/local/go/src/runtime/proc.go
60 60 (flat, cum) 96.77% of Total
. . 358: gp.waitreason = reason
. . 359: mp.waittraceev = traceEv
. . 360: mp.waittraceskip = traceskip
. . 361: releasem(mp)
. . 362: // can't do anything that might move the G between Ms here.
60 60 363: mcall(park_m)
. . 364:}
. . 365:
. . 366:// Puts the current goroutine into a waiting state and unlocks the lock.
. . 367:// The goroutine can be made runnable again by calling goready(gp).
. . 368:func goparkunlock(lock *mutex, reason waitReason, traceEv byte, traceskip int) {
ROUTINE ======================== runtime.goparkunlock in /usr/local/go/src/runtime/proc.go
0 2 (flat, cum) 3.23% of Total
. . 364:}
. . 365:
. . 366:// Puts the current goroutine into a waiting state and unlocks the lock.
. . 367:// The goroutine can be made runnable again by calling goready(gp).
. . 368:func goparkunlock(lock *mutex, reason waitReason, traceEv byte, traceskip int) {
. 2 369: gopark(parkunlock_c, unsafe.Pointer(lock), reason, traceEv, traceskip)
. . 370:}
. . 371:
. . 372:func goready(gp *g, traceskip int) {
. . 373: systemstack(func() {
. . 374: ready(gp, traceskip, true)
(pprof)
通过这种手段,只需要在问题发生期间抓包,就能线下分析立即找出问题代码的位置,并且上线修复。
如果看不出来问题,还可以抓取正常时期的包以及异常时期的包,通过 -base 指定基准来确定问题代码。