Go中复制map可能遇到坑
昨天下午遇到一个问题。服务A与服务B批量建立tcp连接,并定时发送rpc请求(rpc.Ping())检查连接的健康状态。A服务在kibana出现报错:
1use of closed network connection
第一反应是B服务重启导致连接断开,正好落在A服务检查出错与重连的时间窗口。运维同学帮拉日志,发现错误:
1{"log":"fatal error: concurrent map iteration and map write\n","stream":"stderr","time":"2019-11-01T02:52:52.771394863Z"}
2{"log":"[11/01/19 10:52:52] [DEBG] RouterRPC Put Request =\u003e \u0026{6076362058 1 200178629686 2fadfb7c24bb440cb9dfcb74f515baef server:1:currt:1572229900380812731:498081}\n","stream":"stdout","time":"2019-11-01T02:52:52.771452423Z"}
3{"log":"\n","stream":"stderr","time":"2019-11-01T02:52:52.773643972Z"}
4{"log":"goroutine 62906840 [running]:\n","stream":"stderr","time":"2019-11-01T02:52:52.773666216Z"}
5{"log":"runtime.throw(0x932fd7, 0x26)\n","stream":"stderr","time":"2019-11-01T02:52:52.773672033Z"}
6{"log":"\u0009/app/go1.9/go/src/runtime/panic.go:605 +0x95 fp=0xc4202019a8 sp=0xc420201988 pc=0x42d1a5\n","stream":"stderr","time":"2019-11-01T02:52:52.773676889Z"}
7{"log":"runtime.mapiternext(0xc420201ac8)\n","stream":"stderr","time":"2019-11-01T02:52:52.773682159Z"}
8{"log":"\u0009/app/go1.9/go/src/runtime/hashmap.go:778 +0x6f1 fp=0xc420201a40 sp=0xc4202019a8 pc=0x40b7c1\n","stream":"stderr","time":"2019-11-01T02:52:52.773686889Z"}
9{"log":"main.(*RouterRPC).AllRoomCount(0xc4201988c0, 0xc420b08630, 0xc420192158, 0x0, 0x0)\n","stream":"stderr","time":"2019-11-01T02:52:52.773694133Z"}
10{"log":"\u0009/path/to/project/rpc.go:185 +0x161 fp=0xc420201b38 sp=0xc420201a40 pc=0x809421\n","stream":"stderr","time":"2019-11-01T02:52:52.773704869Z"}
11{"log":"runtime.call64(0xc420187f20, 0xc420192078, 0xc420358b10, 0x1800000028)\n","stream":"stderr","time":"2019-11-01T02:52:52.773713993Z"}
12{"log":"\u0009/app/go1.9/go/src/runtime/asm_amd64.s:510 +0x3b fp=0xc420201b88 sp=0xc420201b38 pc=0x45b86b\n","stream":"stderr",
13...
显示是并发读写map出错,看下这行代码:
1rc := b.Demo(111,"foo")
2// /path/to/project/rpc.go:185
3for rid, count = range rc {
4 rep.Counter[rid] += count
5}
rc
的类型是map[string]int32
,对它访问没加锁因为预期没有并发写,看下来源:
1func (b *B) Demo(s int32, a string) (rc map[string]int32) {
2 b.bLock.RLock()
3 rc = make(map[string]int32, len(b.rc))
4 rc = b.rc[s][a]
5 b.bLock.RUnlock()
6 return
7}
b.rc
是一个会被并发读写的map
,所以对它的读操作需加锁。问题出在赋值语句:
1rc = b.rc[s][a]
map
的本质是*hmap
,即hmap
类型的指针,所以变量rc
和b.rc[s][a]
中保存的是同一个指向底层hmap
的地址。Demo()
方法返回后,外面对该地址的读操作并没加锁,从而导致报错。这里代码的本意是复制map
,只是用了错误的方法。正确的做法是循环遍历map
的每个元素逐个赋值到新的map
。例如:
1func (b *B) Demo(s int32, a string) (rc map[string]int32) {
2 b.bLock.RLock()
3 m := b.rc[server][appId]
4 rc = make(map[string]int32, len(rc))
5 for rid, count := range m {
6 if count > 0 {
7 rc[rid] = count
8 }
9 }
10 b.bLock.RUnlock()
11 return
12}
微服务框架go-micro
中的对头信息的复制过程可以作为参考。