Due to try_make_writable's implementation:
// try to make the 1st 'len' header bytes r/w via DPA
void try_make_writable(struct __sk_buff* skb, int len) {
if (len > skb->len) len = skb->len;
if (skb->data_end - skb->data < len) bpf_skb_pull_data(skb, len);
}
This *should* normally result in nothing actually being done.
This is because the 'len' we request should trivially be <= skb->len
(by virtue of how we construct the packet / get here),
and because skb->data_end - skb->data < len was previously
(to this patch) already checked below in line 251
(and thus the packet would have been dropped if it was false).
However, there's a tentative theory that we could somehow end up
with the entire payload in the non-linear portion of the packet,
and thus need to move it into the linear header portion where
we actually have direct packet access to it.
Note also that we already called this in line 71, so it should
be safe to add another call without causing bpf verifier unhappiness...
Test: TreeHugger
Bug: 298879031
Signed-off-by: Maciej Żenczykowski <maze@google.com
Change-Id: If3531c3cf6932ac3f1d384a43d28326d17544aa3
On ingress:
(a) the socket is not a normal socket (it's AF_PACKET)
and thus (likely) doesn't hit this code path
[if it did... we'd have double or more accounting
of any traffic captured by AF_PACKET sockets,
I haven't checked - but I assume that doesn't happen]
(b) is created by the system server (so not AID_CLAT)
(c) is not tagged by the system server (so not AID_CLAT)
So this is a no-op, but it simplifies the bpf program,
since 'egress' is a compile time evaluated constant.
Test: TreeHugger
Signed-off-by: Maciej Żenczykowski <maze@google.com>
Change-Id: Iec693548789eb2752f9f30038e72e35c876f986c
while this is a little bit more code,
it seems much better for the accumulation operation
to be next to the struct definition itself
(in case we ever add more fields)
Test: TreeHugger
Signed-off-by: Maciej Żenczykowski <maze@google.com>
Change-Id: I26022db4566e69c964298d7b3f2cc4fa4a9a5152
(next step is to replace use of Stats struct with
identical (except field order) StatsValue struct)
Test: TreeHugger
Bug: 294604315
Signed-off-by: Maciej Żenczykowski <maze@google.com>
Change-Id: I9be3c411f9592bf4edc75386b1c5b386ebeb5905
This patch is based on aosp/2535559 from maze@.
Add source prefix into the upstream key such that only packets which
source IPv6 address matches it will be forwarded to the upstream
interface.
In this patch, the source prefix is set to zero so there is no
behavior changes. Next CL in patch series will use the real source
prefixes retrieved from upstream interface.
Test: atest TetheringTests
Bug: 261923493
Change-Id: I43d068a29b937c7dfeb6fab632a8effb47ee2263
This is trivial - as the UDPLITE pseudoheader is identical
to the UDP pseudoheader (except that the UDPLITE pseudo length
is derived from the IPv4 total length / IPv6 payload length
field, instead of being copied from the UDPLITE header 'coverage
length' field - but this doesn't matter, as it [ie. the udplite
payload length] doesn't change during 464xlat translation).
Additionally UDPLITE never sends a checksum value of 0,
as at least 8 bytes (the UDPLITE header) *must* be included
in the checksum field, and a 0 must be sent as 0xFFFF.
See: https://datatracker.ietf.org/doc/html/rfc3828
Test: TreeHugger
Signed-off-by: Maciej Żenczykowski <maze@google.com>
Change-Id: I00a110b793fcf3cf705a9a706811da7866c3e810
This is to cut down bpfloader boot time.
Potential savings might be on the order of 30+% (300ms).
Loading BTF requires fork-execing the btfloader,
and currently BTF is only used to facilitate debugging.
Bug: 286369326
Test: TreeHugger
Signed-off-by: Maciej Żenczykowski <maze@google.com>
Change-Id: Ifa5f0052135b9dc826b18ca4622784615ed9c3c8
It is just a constant source of bugs, with no real tests,
let's stop pretending this is a supported configuration.
The only tested configuration is out-of-process tethering
updatable apex.
Test: TreeHugger
Bug: 279942846
Change-Id: I4b659a3cd32b89a65549b56006b926a5ac755f7b
Android T beta3/4 haven't been tested in ages,
and were really only tested for the transition to final T
nearly a year ago.
Test: TreeHugger
Signed-off-by: Maciej Żenczykowski <maze@google.com>
Change-Id: I520e60026179c078859572231b86184796182142
This will make the code more legibble once we switch to using these.
Also moving them out of the .c files so we can share the same
constants across multiple files.
Test: TreeHugger
Signed-off-by: Maciej Żenczykowski <maze@google.com>
Change-Id: I5cc9058cee8d1ea10d2f9e62a38313d0728f07d3
I don't know if this will truly help:
We'll still drop the expected egress TCP ACK (or FIN-ACK) reply
to the newly allowed ingress TCP FIN...
However: I don't think this will make things worse.
The presence of an ingress packet is proof the hardware already woke up to receive it. This behaviour doesn't change when allowing ingress *anything*.
ie. the main reason we don't allow ingress packets is
that it would be illogical to be asymmetrical.
So even if we do immediately send back a reply (I think a RST is the only real possibility at the moment, since ACK would still be dropped). Worst case we're waking the hardware up from RX processing to full blown TX processing.
Furthermore if an inbound FIN causes an outbound RST, then that
RST will most likely prevent receiving future FIN retransmits.
So we're trading an RX->TX hardware wake up now,
for less RX wakeups in the (near) future.
This *might* just be an overall win.
I think a true solution likely needs to be smarter still
and allow skb->sk state != BPF_TCP_ESTABLISHED (or something)
Bug: 259199087
Bug: 264903985
Test: TreeHugger
Signed-off-by: Maciej Żenczykowski <maze@google.com>
Change-Id: I143f12342f72d89f9450560c8d60dad4c79ffe64
Instead of also accounting tag!=0 traffic against tag==0 slot,
while the bpf code writes into the map, move this logic into
the userspace jni code which reads from the map.
Simplifies the bpf program making things easier on the
kernel's bpf verifier, and is better for performance,
since a per-packet fixup operation becomes a per-poll fixup.
Test: TreeHugger, atest libnetworkstats_test FrameworksNetTests
Bug: 276296921
Signed-off-by: Maciej Żenczykowski <maze@google.com>
Change-Id: Ic220a201781a1170bcffe327fe5664fc12b65dd9
effectively no-op, but since it's a trivial check (uid < APP_START),
better do it first, rather than the complex packet parsing in
skip_owner_match().
Test: TreeHugger
Signed-off-by: Maciej Żenczykowski <maze@google.com>
Change-Id: I35a9188e108987d48f03a18cdf70ec4cdd715376
We only ever return DROP_UNLESS_DNS on ingress,
so the ordering doesn't actually matter.
Test: TreeHugger
Signed-off-by: Maciej Żenczykowski <maze@google.com>
Change-Id: I742b85748433f5319d518bebc05d976d630b72e7
This adds the core BPF implementation of Android network packet tracing.
The new code looks into the skb to pull out various bits of information.
Both the program and the ring buffer are restricted to 5.8+ kernels and
userdebug or eng builds.
With the packet_info_config map defaulting to zero, userdebug and eng
builds won't run any of the tracing today. The only effect will be 32k
memory increase for the ringbuf and the check on the config array.
Bug: 246985031
Test: build & flash both userdebug and user
Change-Id: I144da2971c0738b565ad58abc17e456209f13bde
These all default to false, never ignoring the maps.
Bug: 246985031
Test: build connectivity module
Change-Id: I404d56dcb311b34587d56dd6edc292029c4ad83f
This change updates callers to include the new ignore_on and bpfloader
arguments as per the change in aosp/2374598.
Bug: 246985031
Test: tethering build & install, full platform build & install
Change-Id: Id940a6003ae4cb0bbfc65db8ff96590c4f3c847b
This is a repeat of:
https://android-review.git.corp.google.com/c/platform/packages/modules/Connectivity/+/2266447
which was reverted in:
https://android-review.git.corp.google.com/c/platform/packages/modules/Connectivity/+/2372509
This time with kver >= 4.14 protections of the bpf_skb_adjust_room()
bpf helper which isn't present on 4.9 T devices.
Original change comments:
Tested manually on a flame device connected to an ipv6-only wifi
network (GoogleGuest).
On server:
nc -4 -l -u -p 443
On client (phone):
adb shell nc -4 -u my.server 443
On client (phone):
adb shell tcpdump -l -ee -vv -s 1600 -i v4-wlan0
On client send something to server "Hi."
On server send something to client "Hey!"
You should see normal unfragmented IP packets.
Then on server send something really long (I used 57 copies of the 26 letter English alphabet). This should be long enough that fragmentation is required.
You should see tcpdump show 2 ipv4 fragments, and netcat
show the packet being delivered correctly.
(and previous versions of the code were buggy and were
resulting in corrupt packets and things not working)
Test: TreeHugger
Signed-off-by: Maciej Żenczykowski <maze@google.com>
Change-Id: I6758e63d8133215edd26b4cd2d73a5b5f261ffd1
This reverts commit be9685c35c.
Reason for revert:
fails on 4.9 due to bpf_skb_adjust_room requiring a later kernel,
will need an alternative approach
Bug: 261818177
Test: TreeHugger
Signed-off-by: Maciej Żenczykowski <maze@google.com>
Change-Id: I26535a96de80febc2fd54dcb564cde4f9ed7b3c9
will make it easier to extend this for 5.4+ behaviour as well
without having to introduce another is_5_4 boolean
Bug: 263884894
Test: TreeHugger
Signed-off-by: Maciej Żenczykowski <maze@google.com>
Change-Id: Id4f6512d813dd460cb2b9a7ccb6a5f7b7e937575
easier on bpf verifier with no third case
Bug: 263884894
Test: TreeHugger
Signed-off-by: Maciej Żenczykowski <maze@google.com>
Change-Id: I5076de6f83ba522ed4783bca0a9d7fca4024986a
The comment added by:
https://android-review.git.corp.google.com/c/platform/packages/modules/Connectivity/+/2261966
'offload.c - make tether_error_map read only.'
mentions offload.o loading on T when it should talk about S+.
Tethering offload bpf code was mainlined in S.
(T mainlined all the other bpf code)
Bug: 254543135
Test: TreeHugger
Signed-off-by: Maciej Żenczykowski <maze@google.com>
Change-Id: I10b89691082e451115e61dedbdc0dac7a58e499c
To quote: https://www.rfc-editor.org/rfc/rfc6145
4.1 Identification:
The low-order 16 bits copied from the Identification field in
the IPv4 header. The high-order 16 bits set to zero.
5.1.1 Identification:
Copied from the low-order 16 bits in the Identification field in
the Fragment Header.
The RFC does not mention endianness. But I'm assuming it thinks
of things as network, ie. big, endian.
This matches userspace external/android-clat/translate.c:214
ip_targ->id = htons(ntohl(frag_hdr->ip6f_ident) & 0xffff);
This takes the 3rd and 4th byte of the 32-bit ipv6 frag ident field:
see also line 195:
frag_hdr->ip6f_ident = htonl(ntohs(old_header->id));
and
packages/modules/Connectivity/bpf_progs/bpf_net_helpers.h
// Android only supports little endian architectures
#define htons(x) (__builtin_constant_p(x) ? ___constant_swab16(x) : __builtin_bswap16(x))
#define htonl(x) (__builtin_constant_p(x) ? ___constant_swab32(x) : __builtin_bswap32(x))
#define ntohs(x) htons(x)
#define ntohl(x) htonl(x)
Test: TreeHugger
Signed-off-by: Maciej Żenczykowski <maze@google.com>
Change-Id: Ie4eed30cfd0e3e3e4dfa6c1a54751dcae1f9972b