@
lujjjh 感谢兄弟的细心阅读,并且一眼就看出了我的担心,握爪。是这样的,其实问题就消费 Channel 这里。
有一种极端的情况,就是我消费 Channel 的时候,runtime 的 deltimer()还没有来得及调用 sendTime 。但是当我的消费代码走完了以后,它调用了 sendTime,导致了 Channel 里面又有了内容,这样,我下次正儿八经的业务 select 的时候,立马就会返回,导致我错误的认为超时了。
这个问题的出在 Runtime 中 timer 的维护 routine 中调用定时器处理函数 runOneTimer()。该函数的处理分为两部分:
* A) 从 timer 堆中处理定时器状态这部分
* B) 调用 f(arg, seq)来 sentTime 到 Channel 中
从 [代码](
https://github.com/golang/go/blob/891547e2d4bc2a23973e2c9f972ce69b2b48478e/src/runtime/time.go#L818)看,这两个部分操作没有放在一个锁中,因此不是原子操作。```go
func runOneTimer(pp *p, t *timer, now int64) {
.......
if raceenabled {
// Temporarily use the current P's racectx for g0.
gp := getg()
if gp.racectx != 0 {
throw("runOneTimer: unexpected racectx")
}
gp.racectx = gp.m.p.ptr().timerRaceCtx
}
unlock(&pp.timersLock)
f(arg, seq) // 就是那个 sendTime 函数
lock(&pp.timersLock)
.......
}
```
设想在用户调用 Stop()的时候,虽然和 runOneTimer()的 A 部分产生了互斥,但是和 sentTime 并没有产生互斥。极端情况下,如果 runtime 的 routine 中把 A 部分执行完了,碰到 CPU 繁忙或者系统调度等极端场景,B 部分一直没有机会执行。那么 User 的 Stop()、消费 Channel (实际上没有数据)、然后 Reset(),到这里 B 部分都没有执行。
这样,Channel 中一直没有数据,但是当用户调用业务 Select()的时候,runtime routine 中的 B 部分执行了,sendTime 将数据发送到了 Channel 中。select()就因为上一次 Timer 的数据而触发了。
这也就是我在问题中说的,`(理论上)一定概率的立即超时`。
其实 Golang 的相关模块作者也注意到了该问题,修改的次数非常多,提供的注释也非常长。
其中作者提到,Stop()并不会等到 f (即 sentTime)执行完毕后才返回,也就是说 Stop()只反馈 Timer 当时的状态,不反馈 channel 的状态。
```
// Stop prevents the Timer from firing.
// It returns true if the call stops the timer, false if the timer has already
// expired or been stopped.
// Stop does not close the channel, to prevent a read from the channel succeeding
// incorrectly.
//
// To ensure the channel is empty after a call to Stop, check the
// return value and drain the channel.
// For example, assuming the program has not received from t.C already:
//
// if !t.Stop() {
// <-t.C
// }
//
// This cannot be done concurrent to other receives from the Timer's
// channel or other calls to the Timer's Stop method.
//
// For a timer created with AfterFunc(d, f), if t.Stop returns false, then the timer
// has already expired and the function f has been started in its own goroutine;
// Stop does not wait for f to complete before returning.
// If the caller needs to know whether f is completed, it must coordinate
// with f explicitly.
```
而对于 Reset(),作者也是用了大量篇幅,指出 Reset 的返回值并不是准确的,因为竟态导致的问题。提供返回值只是为了兼容过去的接口。另外其实 time 模块的 Reset()调用的其实就是 runtime 的 resetTimer,而 resetTimer()其实就是内部调用 stopTimer()。
```
// Reset changes the timer to expire after duration d.
// It returns true if the timer had been active, false if the timer had
// expired or been stopped.
//
// For a Timer created with NewTimer, Reset should be invoked only on
// stopped or expired timers with drained channels.
//
// If a program has already received a value from t.C, the timer is known
// to have expired and the channel drained, so t.Reset can be used directly.
// If a program has not yet received a value from t.C, however,
// the timer must be stopped and—if Stop reports that the timer expired
// before being stopped—the channel explicitly drained:
//
// if !t.Stop() {
// <-t.C
// }
// t.Reset(d)
//
// This should not be done concurrent to other receives from the Timer's
// channel.
//
// Note that it is not possible to use Reset's return value correctly, as there
// is a race condition between draining the channel and the new timer expiring.
// Reset should always be invoked on stopped or expired channels, as described above.
// The return value exists to preserve compatibility with existing programs.
//
// For a Timer created with AfterFunc(d, f), Reset either reschedules
// when f will run, in which case Reset returns true, or schedules f
// to run again, in which case it returns false.
// When Reset returns false, Reset neither waits for the prior f to
// complete before returning nor does it guarantee that the subsequent
// goroutine running f does not run concurrently with the prior
// one. If the caller needs to know whether the prior execution of
// f is completed, it must coordinate with f explicitly.
```
所以从这些场景看,很难精确的判断出来 Channel 是否有数据,而尝试去 Drain Channel 也只是大概率可以 drain out,极端情况下,drain empty channel 之后,还会立即有数据 send 进来。