NIO和epoll的关系
nio在linux对应的实现是epoll。在linux中,所有I/O都抽象为文件(包括网络和文件读写),用文件描述符(fd)来标识。fd是一个非负整数, 其中0,1,2分别对应stdin
,stdout
,stderr
。
epoll包含三个函数,分别是:
- epoll_create:创建一个epoll instance并返回一个fd代表他,对应
Selector
。 - epoll_ctl:注册I/O事件,对应
SelectableChannel.register()
。 - epoll_wait:等待I/O事件,对应
Selector.select()
。
OP_ACCEPT 和 OP_CONNECT
EPOLLIN The associated file is available for read(2) operations. EPOLLOUT The associated file is available for write(2) operations. EPOLLRDHUP (since Linux 2.6.17) Stream socket peer closed connection, or shut down writing half of connection. (This flag is especially useful for writ‐ ing simple code to detect peer shutdown when using edge-trig‐ gered monitoring.) EPOLLPRI There is an exceptional condition on the file descriptor. See the discussion of POLLPRI in poll(2). EPOLLERR Error condition happened on the associated file descriptor. This event is also reported for the write end of a pipe when the read end has been closed. epoll_wait(2) will always report for this event; it is not necessary to set it in events when calling epoll_ctl(). EPOLLHUP Hang up happened on the associated file descriptor. epoll_wait(2) will always wait for this event; it is not nec‐ essary to set it in events when calling epoll_ctl(). Note that when reading from a channel such as a pipe or a stream socket, this event merely indicates that the peer closed its end of the channel. Subsequent reads from the channel will return 0 (end of file) only after all outstanding data in the channel has been consumed. EPOLLET Requests edge-triggered notification for the associated file descriptor. The default behavior for epoll is level-trig‐ gered. See epoll(7) for more detailed information about edge- triggered and level-triggered notification. This flag is an input flag for the event.events field when calling epoll_ctl(); it is never returned by epoll_wait(2). EPOLLONESHOT (since Linux 2.6.2) Requests one-shot notification for the associated file de‐ scriptor. This means that after an event notified for the file descriptor by epoll_wait(2), the file descriptor is dis‐ abled in the interest list and no other events will be re‐ ported by the epoll interface. The user must call epoll_ctl() with EPOLL_CTL_MOD to rearm the file descriptor with a new event mask. This flag is an input flag for the event.events field when calling epoll_ctl(); it is never returned by epoll_wait(2).
可以注意到,和SelectionKey中的事件有一些差别,比如这里没有OP_ACCEPT和OP_CONNECT。那么这两个事件是做什么的🤔?
ACCEPT
1 | public int translateInterestOps(int ops) { |
OP_ACCEPT变成了Net.POLLIN。而对于CONNECT
:
1 | public int translateInterestOps(int ops) { |
POLLCONN和POLLOUT一样均为4,通过socket的状态进行区分。如果socket未连接代表OP_CONNECT,已连接代表OP_WRITE。如果说把POLLIN拆分成ACCEPT和READ尚可理解,那把OUT拆成WRITE和CONNECT是为什么?
OP_CONNECT是在做什么
这里有一个非常容易误解的地方,客户端调用connect,服务端调触发OP_ACCEPT事件,调用accept之后客户端触发OP_CONNECT事件,调用finishConnect。看上去和三次握手完全一致,但完全不是那回事,通过wireshark调试得知在服务端调用accept时三次握手已经完成了。那么OP_CONNECT和finishConnect分别是在做什么?
1 | boolean polled = Net.pollConnectNow(fd); |
这是一个native方法:
1 | jint fd = fdval(env, fdo); |
可以看到jni方法只是在用poll()检查该fd的POLLOUT事件。而POLLOUT表示socket缓冲区可写,隐含连接已经建立。所以对于阻塞的finishConnect(),他会阻塞到连接建立,而非阻塞的finishConnect,用返回值代表连接是否建立。这个方法名很有误导性,建议改为doesItFinishConnect,或者说非阻塞的connect好像用处不大。
错误
epoll事件转换为NIO事件:
1 | public boolean translateReadyOps(int ops, int initialOps, |
可以看出:
POLLNVAL没有设置任何ReadyOps,POLLNAVAL的值为32,在上表中没有对应的项。如注释所说,应该是API使用错误不去管他。
POLLERR和ROLLHUP原封不动复制了intOps,也就是会触发所有注册的事件。
这里还翻到Netty的一个issuehttps://github.com/netty/netty/issues/924。
1 | if ((readyOps & SelectionKey.OP_CONNECT) != 0) { |
netty的OP_CONNECT处理在第一位,当对方发送Reset时,首先会进入unsafe.finishConnect(),而这里并没有取消事件也没有关闭连接的逻辑。
最后
注意到上面代码的最后一行,Java把IN事件分离成ACCEPT和READ,但是Netty又把ACCEPT和READ统一起来,ServerSocket的读处理就是调用Accept:
1 | protected int doReadMessages(List<Object> buf) throws Exception { |
整个事情的感觉就是都有自己这么做的理由,但是组合起来就十分滑稽。