Infiniband interface doesn't route IPoIB traffic

geoffjay

I have blocks of hosts that I'm provisioning using Puppet in exactly the same way, they have identical hardware (same blade chassis), and are definitely connected in all the same ways where interfaces on some are not working the same as others. These are all Infiniband interfaces, so I'm able to test them with commands like ibping and ibsysstat, which shows that they have working UVERBS/RDMA connections. For example:

master# ibsysstat 29
sysstat ping succeeded

where the node with that LID that isn't working quite right has:

node10# ibstat
CA 'mlx4_0'
    CA type: MT4099
    Number of ports: 1
    Firmware version: 2.11.1250
    Hardware version: 1
    Node GUID: 0x...
    System image GUID: 0x...
    Port 1:
        State: Active
        Physical state: LinkUp
        Rate: 40
        Base lid: 29
        LMC: 0
        SM lid: 26
        Capability mask: 0x02594868
        Port GUID: 0x...
        Link layer: InfiniBand

but, when I just do a simple ping to the IPoIB IP address it sits there not connecting. Other commands like ibping are also definitely passing traffic, and data shows up when adding -d showing debug output. I can see the pings go out when I watch the interface using tcpdump, but nothing coming in. Meanwhile, right next to it is a host with the same everything that works just fine. The routing tables all like right to me also, and match hosts that work. On a host that doesn't work:

default via 10.10.0.1 dev em1 proto dhcp metric 100 
10.10.0.0/24 dev em1 proto kernel scope link src 10.10.0.110 metric 100 
10.11.0.0/24 dev ib0 proto kernel scope link src 10.11.0.110 
169.254.0.0/16 dev ib0 scope link metric 1005

and on one that does:

default via 10.10.0.1 dev em1 proto dhcp metric 100 
10.10.0.0/24 dev em1 proto kernel scope link src 10.10.0.108 metric 100 
10.11.0.0/24 dev ib0 proto kernel scope link src 10.11.0.108 
169.254.0.0/16 dev ib0 scope link metric 1004

The only thing different is the metric in the last route, but that shouldn't matter. Also of note, these hosts worked before they were reprovisioned. So I'm almost positive it's not hardware.

I'm at a bit of a loss now and any ideas would be appreciated.

Edit: Update with dmesg error

I found something in the output of dmesg for the interface in question that only exists on the hosts that don't work. The error

ib0: failed to modify QP to RTR: -22

unfortunately this isn't very helpful, and there's not much that comes up related in searches.

Perhaps also worth noting, the hosts in question can ping the switch IP address, and the switch can ping the hosts on their associated IPs.

Austin Ewens

This is a known issue in kernel 3.10.0-862.11.1 to 3.10.0-862.11.6 (see here and here).

Essentially, if you update the kernel to 862.11.1-862.11.6, a bug in drivers/infiniband/core/verbs.c where a semi-colon was left out causes all reliable connected (rc) messages to fail while unreliable datagram messages will work. You can either patch this driver, or boot from an earlier kernel to work-around this issue until the updated kernel resolves this issue.

이 기사는 인터넷에서 수집됩니다. 재 인쇄 할 때 출처를 알려주십시오.

침해가 발생한 경우 연락 주시기 바랍니다[email protected] 삭제

에서 수정
0

몇 마디 만하겠습니다

0리뷰
로그인참여 후 검토

관련 기사

분류에서Dev

Infiniband 인터페이스는 IPoIB 트래픽을 라우팅하지 않습니다.

분류에서Dev

Route more traffic through VPN

분류에서Dev

Route Client Traffic Through VPN Tunnel

분류에서Dev

route some traffic in subnet to other gateway

분류에서Dev

Route Client Traffic Through VPN Tunnel

분류에서Dev

After installing Postfix, why doesn't it pop up the configuration interface

분류에서Dev

Simple WoW Interface Addon doesn't work (Lua)

분류에서Dev

Redirecting IPv6 traffic to a VPN interface

분류에서Dev

Route incoming and outgoing on same interface

분류에서Dev

Configuring GSuite to work with route 53 - "MX record doesn't have 2 fields" error

분류에서Dev

Create Direct3DEx interface from C doesn't work

분류에서Dev

Using `ip route` to change network interface by changing default route

분류에서Dev

proper syntax to delete default route for a particular interface?

분류에서Dev

Forward traffic to virtual interface based on source IP address dynamically using iptables

분류에서Dev

How to set a routing table that prefers wlan dhcp interface as default route?

분류에서Dev

Infiniband의 Pcap 링크 유형

분류에서Dev

Ubuntu doesn't boot!

분류에서Dev

Database doesn't exist

분류에서Dev

Process doesn't start

분류에서Dev

unbind doesn't work

분류에서Dev

Crontab doesn't work

분류에서Dev

Why isn't the URL being generated for this route?

분류에서Dev

UI-Router don't route with param

분류에서Dev

Windows 경로 선택 및 "route print -6"대 "netsh interface ipv6 show route"

분류에서Dev

PHP can't extend from interface?

분류에서Dev

Can't login to CentOs graphic interface

분류에서Dev

Why won't dhcpd listen on the virtual interface

분류에서Dev

What is this traffic?

분류에서Dev

HTML doesn't recognize css file and doesn't import it

Related 관련 기사

뜨겁다태그

보관