The correct return code to keep on processing any further TC
attached programs is 'TC_ACT_PIPE' and not 'TC_ACT_OK' (which
is terminal).
Without this the ipv6 tether offload program causes termination
of processing and the ipv6 clatd offload program never actually
handles any packets (while tethering is active).
This results in lack of bpf xlat64 offloading for tethered ipv4
traffic on an ipv6-only (cellular) network.
This in turn means incoming TCP packets get GRO'ed, do not get
bpf offloaded, and get delivered to the clat daemon, which
due to them being bigger than the mtu (due to gro) cannot
handle them and discards them.
This results in poor performance, since tcp falls back to 1 mss/mtu
sized packet per rtt.
Tested via tethering a linux laptop on an ipv6-only cellular connection
and downloading the linux kernel from kernel.org via 'wget -6' and 'wget -4'.
Before:
IPv6: over 2MB/s, observed:
5805 packets, including 4 sackOK
IPv4: under 1MB/s, observed:
9300 packets, including 8 sackOK, 387 sack 1, 501 sack 2, 2310 sack 3
After:
IPv6: over 7MB/s, observed:
16702 packets, including 4 sackOK
IPv4: over 9MB/s, observed:
32755 packets, including 2 sackOK
Test: builds, TreeHugger, see above
Bug: 195624908
Signed-off-by: Maciej Żenczykowski <maze@google.com>
Change-Id: I623dacb5a37dc689cea34499c3906c11fcaf946c
If a tc ebpf program writes into a packet using direct packet access
then the packet will automatically be uncloned and pulled by
additional prologue inserted by the kernel itself. See
tc_cls_act_prologue() & bpf_unclone_prologue() in kernel sources
(this is how the clat ebpf program works, which does DPA writes).
However in the forwarding programs we only *read* from the packets
using direct packet access, but never write. All writes happen via
kernel bpf helpers (this is mostly an implementation detail: since
we need to use helpers for checksum updates, I decided to also use
checksums for the writes themselves). As such the insert 'automatic
unclone/pull' logic doesn't trigger.
It is thus possible (it depends on the skb layout delivered by the
nic driver) for 0 bytes of the packet to be accessible for read
using direct packet access. We thus need to explicitly try to pull
in the header of the packet so that we can inspect it.
In most cases (on most drivers for most packet types) this will
end up being a no-op (because the headers will already be in
the linear portion of the skb). But on some drivers for some
packet types it ends up mattering.
Test: TreeHugger, makes icmpv6 tether forwarding work on bramble
Signed-off-by: Maciej Żenczykowski <maze@google.com>
Change-Id: I4b07e57728ce544ffb908527ea11ecc315e5acec
by marking programs as optional and providing appropriate stub implementations.
Test: TreeHugger
Bug: 181045068
Signed-off-by: Maciej Żenczykowski <maze@google.com>
Change-Id: I021e7bcbfe4236242f517f067f89777fc08ecd8d
This is just a cut'n'paste reordering of programs.
Goal is to put rawip programs above ether ones.
This will enable next change to be easier to read.
Test: TreeHugger
Bug: 181045068
Signed-off-by: Maciej Żenczykowski <maze@google.com>
Change-Id: Icebf4bf0505136e97b7b6950fb0b790582eb495e
It will map device ifindex to itself (but note that internally in the
kernel this is optimized into a map from ifindex to direct device
pointer), but only for xdp transmit capable devices (other devices
will not have an entry).
This will allow the use of bpf_redirect_map() from xdp tethering programs.
Test: atest, TreeHugger
Signed-off-by: Maciej Żenczykowski <maze@google.com>
Change-Id: I29684e6761727d1115e9b4d75486eccbca3d5e33
For ipv6 we need 1 entry per client, so 64 seems like plenty,
while for ipv4 we need 1 entry per flow, so even 1024 seems
like it might not be enough, but it's much better than 64.
Nucca says:
# cat proc/sys/net/netfilter/nf_conntrack_buckets
65536
# cat proc/sys/net/netfilter/nf_conntrack_max
262144
per https://www.kernel.org/doc/Documentation/networking/nf_conntrack-sysctl.txt
the default “nf_conntrack_max” is “nf_conntrack_buckets * 4”.
Test: TreeHugger
Signed-off-by: Maciej Żenczykowski <maze@google.com>
Change-Id: Ib7d1d8c19bc688c442d842cf5c9f45cdf1241754
because it is not appropriate for use in XDP programs
Test: TreeHugger
Signed-off-by: Maciej Żenczykowski <maze@google.com>
Change-Id: Ibd5dac9676bae7aa5f10fbcfd777291f72bec819
and more importantly unconditionally. This requires less effort
on the part of the in-kernel bpf verifier.
Test: TreeHugger
Signed-off-by: Maciej Żenczykowski <maze@google.com>
Change-Id: Ibaa94bf096fc81c4d984dfabf515131b1c81ef09
We've backported the necessary support to all 4.14+ ACK kernels,
but we can't actually enforce that these changes will be picked
up by all devices. Thus we can only make the full featured
implementations optional on [4.14..5.8) kernels, with a tcp-only
version for those 4.14+ devices where the full featured version
fails to load.
Note: there's still a fair bit of implementation work left
in the do_forward4() function itself. This is really just
the skeleton.
Test: atest, TreeHugger
Signed-off-by: Maciej Żenczykowski <maze@google.com>
Change-Id: If78123e00d55a77f2ecd7da1547581797e23f9b2
This will facilitate providing a tcp-only version of the programs
which due to TCP's very long timeouts will not need to use the
Linux 5.8+ bpf_ktime_get_boot_ns() helpers.
Test: atest, TreeHugger
Signed-off-by: Maciej Żenczykowski <maze@google.com>
Change-Id: I1e49b6758d3754782ac6f8820e0c15aa20e4c61d
As this is the actual version that is required,
ie. the version that supports bpf_ktime_get_boot_ns() helper.
Test: atest, TreeHugger
Signed-off-by: Maciej Żenczykowski <maze@google.com>
Change-Id: I2ea4830597a0bed53950a5d0c483a47208959f35
Currently, debugging the tethering programs is not easy because
in case of any failure they simply return TC_ACT_OK. This CL adds
a number of counters that the program can increment in the case
of interesting events such as malformed packets.
At the moment the counters are stored in a global tethering error
map, which is an ARRAY map of 32-bit counters. This should not
take up much space because there are only a dozen of these.
We might not need all of these counters. In future CLs we can
reduce the number of counters, or perhaps move them to a map of
maps so as to have separate counters on a per-interface basis.
Test: manual
Change-Id: I3fcd7eb8d318700092949ff2f39987bf4ba3656c
The keys are identical, and the values nearly so, this will make everyone's life easier.
Test: git grep 'Tether(Down|Up)stream4(Key|Value)' finds nothing
(note this requires follow up commits)
Signed-off-by: Maciej Żenczykowski <maze@google.com>
Change-Id: Ifbff2c617ac5834ea80f827eaf89ca81e862baec
We want connection establishment/shutdown to flow through
the kernel code path so connection tracking state is at least
somewhat correct.
Test: atest, TreeHugger
Signed-off-by: Maciej Żenczykowski <maze@google.com>
Change-Id: Iee97baa65750188f3436937b16c9b320f0495a5a
I keep on failing to find this using grep because it
doesn't match how all the other programs are defined,
so change it for consistency.
Test: builds, atest, TreeHugger
Signed-off-by: Maciej Żenczykowski<maze@google.com>
Change-Id: Ib61b375bef84d2b489080866b2411c84880e4ef2
This allows for better separation of test vs production code:
we will add more test maps and programs here later.
Test: builds
Signed-off-by: Maciej Żenczykowski <maze@google.com>
Change-Id: I7b22e3e148ebf43fdf43dc68d0dea354f7627688