Around 10% of the traces with Nettrace have "traced_final_flush_failed"
errors. It is believed that Network Tracing isn't writing enough data to
fill one "Chunk" in Perfetto's buffer. Although this should still be
saved by Perfetto, it doesn't seem to be.
This change records the number of packets read from the ring buffer to
understand whether the error coincided with low-data cases. It also
tries to flush the data OnStop to potentially fix the issue.
Bug: 285411033
Test: flash and run trace
(cherry picked from https://android-review.googlesource.com/q/commit:80d705566be6ea8a822eebd088c1f37758fdf0f6)
Merged-In: I92c8d2d8d47d1ed123585e1cfdde802d286f120f
Change-Id: I92c8d2d8d47d1ed123585e1cfdde802d286f120f
Instead of also accounting tag!=0 traffic against tag==0 slot,
while the bpf code writes into the map, move this logic into
the userspace jni code which reads from the map.
Simplifies the bpf program making things easier on the
kernel's bpf verifier, and is better for performance,
since a per-packet fixup operation becomes a per-poll fixup.
Test: TreeHugger, atest libnetworkstats_test FrameworksNetTests
Bug: 276296921
Signed-off-by: Maciej Żenczykowski <maze@google.com>
Change-Id: Ic220a201781a1170bcffe327fe5664fc12b65dd9
only the test code ever passes in anything that isn't
a limit {UID_ALL, INTERFACES_ALL, TAG_ALL} (ie. no limit)
Test: TreeHugger, atest libnetworkstats_test FrameworksNetTests
Bug: 276296921
Signed-off-by: Maciej Żenczykowski <maze@google.com>
Change-Id: Ida489f25c4da4b12541c6001b41d9e4b30804eff
This should ensure that we don't have stale packets next time we start a
trace.
Bug: 246985031
Test: flash & trace with long (1m+) poll (used to get stale, now not)
Change-Id: I6085d4a97688a221d26c095ef9073360292fd1ec
It's not obvious from code how the simple false value reduces the binary
size. Adds a comment to explain how it works and a warning about
changing the flag value.
Bug: 272207884
Test: It's a comment
Change-Id: I5ea733ef879e8e6ea1a2a6bfb3aa22b1fbedd6da
Despite extending the handler with a test-only version, calling
HandlerForTest::Register creates a NetworkTraceHandler instead. This
change adds a class-level toggle to skip certain code in tests. Without
this, it's possible for the test to see system-level packets and fail
(although I haven't seen this yet).
Bug: 246985031
Test: atest libnetworkstats_test w/ logs in the poller Start/Stop
Change-Id: If22d91e5449bc774961c58115ba84ca2a4bcde59
The only change required to fix it was to add the "= true" which got
lost at some point in modifications. This change also removes the empty
packet with only the cleared flag (Perfetto may soon filter empty
packets) and removes the flags altogether for non interned cases.
Test: atest libnetworkstats_test
Change-Id: I6781349106a7c157ccace9d0cdff6d90190b69c5
The test has flaked twice, once reporting only half of the packets, the
second time reporting no packets. Although I haven't been able to repro
this locally with 500 attempts, I believe it's a race between the kernel
thread writing and test thread receiving the events. This adds a retry
with a brief sleep only if we don't get all packets on the first try.
This also records all unmatched packets in case something like the port
or uid reporting breaks. Rather than fail saying zero packets, you'll
see the ones we skipped and can (hopefully) tell what went wrong.
Bug: 273600719
Test: atest libnetworkstats_test
Change-Id: Iee21f30a8dc59be5649f8e8b6509f4cc69ae5ff9
Interning is a feature Perfetto offers. You can store details in the
intern table and associate it with an id. Then, each trace packet can
reference just the id rather than the full proto contents. In our case,
we already identify unique contexts, so all we need to do is give them
unique IDs and record that instead.
Bug: 246985031
Test: atest libnetworkstats_test
Change-Id: I84f7673bc41b89390c02b8ec5460adfadbb36173
Dropping fields removes them from the bucketing and from the trace
output (the important part being the bucketing). The most useful is
dropping the local port which is often meaningless and can occasionally
have very high cardinality (leading to less bundling, less aggregation
and more space used by interning).
Bug: 246985031
Test: atest libnetworkstats_test
Change-Id: I2ae3289583fa7f29e60d92c58378f189a564bd81
The aggregation threshold determines the number of packets a particular
bundle has to hit before recording only aggregate results. This allows
individual high-throughput streams to be recorded at a significantly
reduced trace buffer cost.
Bug: 246985031
Test: atest libnetworkstats_test
Change-Id: I9e3c6f013db91919be541860364764e913726cf1
Perfetto has been updated to understand both the bundled and individual
packets, but the change hasn't yet hit stable. Until it does, fall back
to using the single event format.
Also fixes build warning from prior change.
Note: once the change hits Perfetto stable, the two formats behave
identically in the ui/querying/etc.
Bug: 246985031
Test: atest libnetworkstats_test
Change-Id: Ifc0e00f3c73aa1ef485f87f6ed4a9873c193ebb3
Bundling groups PacketTraces by their various attributes (e.g. uid, tag,
ports) and outputs a single Perfetto TracePacket per group. Rather than
repeat this information many times, it's only written once. In most
cases, this should reduce the trace size by 5x.
Bug: 246985031
Test: atest libnetworkstats_test
Change-Id: Ia9cb163fb4c673abdab8d442576cf4b12a98dbc6
This tests the logic of converting bpf PacketTraces into Perfetto
TracePackets. The test stands up an in-process trace and parses the
resulting trace output (filtering to only the events we care about).
On the implementation side, `Write` is added as a non-static method
to do the conversion from a batch of PacketTraces. Because this becomes
part of the instance, per-instance configuration (e.g. interning) can be
applied without causing conflicts between concurrent sessions.
Bug: 246985031
Test: atest libnetworkstats_test
Change-Id: I15a26ba720eff308d01e6827a176a6b2f7c60e80
This converts the poller's callback to batch-style pass all of the
events each time it polls rather than one-by-one. This requires a bit
more memory (up to 32kb) but will allow optimizations in the following
changes that should reduce the trace size and cpu by ~10x.
Bug: 246985031
Test: atest libnetworkstats_test
Change-Id: Ia3223ba8b27b825e2d63d6b3b8ac09b8eb17b3f8
This finishes the split of NetworkTraceHandler into separate DataSource
and Polling parts by moving the Polling part to its own file.
Despite being a large diff, all this change did was copy the
NetworkTraceHandler files to their NetworkTracePoller counterparts and
delete the irrelevant sections in each file. The actual content of the
classes and functions should be identical.
Bug: 246985031
Test: atest libnetworkstats_test
Change-Id: Ibc9d945658e89f969fa3d1551863ccd26fd51a78
Perfetto allows multiple trace sessions to run in parallel. Each trace
session creates an instance of the registered DataSource. Bpf ring
buffers only support a single consumer, so we don't want multiple
instances reading concurrently.
This patch fixes things by making the DataSource a very thing wrapper
which delegates everything to a singleton. The singleton counts the
number of active sessions so that start is only called if not already
started, and stop is called if there are no remaining sessions.
Note: it's not clear whether it would be better to take the min or max
of poll_ms for active sessions. Min would be good for callers wanting
high throughput data collection, but doing so could jeopordise callers
using the poll_ms to limit the trace size (e.g. longer traces that are
alright dropping >5kpps scenarios). In this change, we use whichever
poll_ms was set first and make no promises.
Bug: 246985031
Test: atest libnetworkstats_test
Change-Id: Ic85cab2205e6d426bcfc913450edff50be373bb0
This adds a mutex to guard access to all fields. The lock is taken
liberally since calls should rarely (if ever) collide (for example,
ConsumeAll is called at most every 100ms and shouldn't overlap).
Bug: 246985031
Test: atest libnetworkstats_test
Change-Id: I97791a808771bafe789091c9b54cbec0a31d1721
This splits the polling and DataSource aspects into separate classes
in preparation for making polling a singleton. This change is not
expected to change behavior.
Note:
* OnSetup saves the poll duration and passes it to Start to simplify
the Poller's life cycle.
* OnStart's handling of mTaskRunner/Loop is moved to the end of Start.
* OnStop's handling of mTaskRunner is moved to the end of Stop and Stop
is changed to continue on failure (so that the task runner is reset in
all cases as it was before this change).
Bug: 246985031
Test: atest libnetworkstats_test
Change-Id: I521eb50cd00916aa4d98174f0db461508b532732
improves coverage...
(see also cs/ p:android$ combineUidTag showing this 1 hit)
Test: TreeHugger
Signed-off-by: Maciej Żenczykowski <maze@google.com>
Change-Id: I85524032b045bdad78876239c968a3008217ed67
The network trace handler only produces, never consumes Perfetto events.
By cutting out the consumer code, we can reduce the binary size by
~280KB.
Bug: 246985031
Test: build and flash
Change-Id: Ie03129630cc425e1759770ef3f3d3391e78331d7
This runs the Perfetto and NetworkTraceHandler initialization on system
initialization along with the existing NetworkStatsService. The code is
run within the context of the system_server and is only executed on U or
later devices running eng or userdebug builds.
Bug: 246985031
Test: build & flash
Change-Id: I2b091e31c3ded54c0d7062d7d73f33ed47e0bf98
This adds the DataSource base class, data source registration and
overrides the lifecycle methods.
Adding the Perfetto SDK increases the size of the tethering apex by
450kb (~150 KB for the compressed apex).
Bug: 246985031
Test: atest libnetworkstats_test
Change-Id: Ie2e8f3e43c8080434408f752346e575a19e9042e
This adds the base (non-perfetto) NetworkTraceHandler with support for
starting and stopping tracing, and pulling messages from BPF. The
included test covers the end-to-end scenario from socket creation,
socket tagging and data traffic.
Bug: 246985031
Test: atest --iterations 500 libnetworkstats_test
Change-Id: I035d2e03fa7c461ecb93d207b7fd2f53e6a2f52e
See:
https://source.corp.google.com/search?q=p:android$%20OVERFLOW_COUNTERSET
(I'm guessing any remaining uses are in Java not C/C++?)
I'd like to remove the definition from the
frameworks/libs/net/common/native/bpf_headers/include/bpf/BpfUtils.h
header file, where it really doesn't fit.
Test: TreeHugger
Signed-off-by: Maciej Żenczykowski <maze@google.com>
Change-Id: I687f4ca0f52c362b29be7fe612f90d8aed2afe9e
This might eliminate a failure mode with dreaded 524 error
from the kernel... or it might not, but it shouldn't hurt.
Bug: 230418056
Test: TreeHugger
Signed-off-by: Maciej Żenczykowski <maze@google.com>
Change-Id: If2479daf77f61c220214ff507582295bd303fd21
frameworks/libs/net/common/native/bpf_headers/include/bpf/BpfMap.h
(which is a mainline automerged path)
already aborts() on failure in the path taking constructor
(by calling abortOnKeyOrValueSizeMismatch), and as such
checking for isValid afterwards is pointless.
Also, there's no reason to waste cpu cycles opening and closing these maps, and creating and destroying the map objects.
Test: TreeHugger
Signed-off-by: Maciej Żenczykowski <maze@google.com>
Change-Id: Icb5567ee32ca9a00fd8aeb565066716d350f1292
We note that:
0 <= BpfMap<K,V>.getMap()
is equivalent to
BpfMap<K,V>.isValid()
but the latter is far more kosher.
Unfortunately there is more non-kosher stuff happening in this
file, so we still need to define BPF_MAP_MAKE_VISIBLE_FOR_TESTING anyway.
Test: TreeHugger
Signed-off-by: Maciej Żenczykowski <maze@google.com>
Change-Id: I138709c6a298a2d8511b525a8349e01ab87d9455
In the BPF code, per-UID network access (e.g., for doze mode,
standby, etc.) is stored in UidOwnerValue structures. Each of
these stores that UID's rules in a 32-bit bitmask of
UidOwnerMatchType values, so the code can support ~31 match
types.
However, which match types are enabled is stored in
configuration_map at index UID_RULES_CONFIGURATION_KEY, and
configuration_map only stores 8-bit values. So it's not
possible to define more than 7 match types.
Widen configuration_map to from 8 to 32 bits to match the width
of UidOwnerValue.rule. This doesn't impact memory because
configuration_map only has 2 entries.
Bug: 208371987
Test: TreeHugger
Signed-off-by: Maciej Żenczykowski <maze@google.com>
Change-Id: I7e1eee2daedd66d27965a2dd4ce6b4c3667892f7
In order to get counted by mts code coverage, these native tests need to
be run as part of mts.
Bug: 233904825
Test: m mts && mts-tradefed run mts-tethering-coverage
Change-Id: I79313197b146c7043ffb5e164faa46c2e16dd1d2
Configuration map index 1(CURRENT_STATS_MAP_CONFIGURATION_KEY) can only
have value 0(SELECT_MAP_A) or 1(SELECT_MAP_B). Return error if it is any
other values. Otherwise, read out of array boundary can cause memory
corruption or security issues.
Bug: 231420457
Test: TH
Change-Id: Ia800ad78781f72b8118469c0230cc550796d334e
Add new symbols to libservice-connectivity loaded on T only, and the
framework libraries to apex and tests.
Bug: 197717846
Test: atest FrameworksNetTests
(cherry-picked and splitting apex Android.bp to aosp/1994130)
Change-Id: Iae44344701a3267110e5cbf271120201134d59e5
Merged-In: Iae44344701a3267110e5cbf271120201134d59e5
If libraries are moved from the platform to a module, the test
will fail with:
CANNOT LINK EXECUTABLE "/data/local/tmp/libnetworkstats_test/arm64/libnetworkstats_test": library "libnetworkstats.so" not found: needed by main executable
This may be because it can't find libnetworkstats.so because that
is in platform.
In any case, since this is the test for libnetworkstats,
dynamically linking libnetworkstats is incorrect, because it
means if the same CL modifies both libnetworkstats and its test,
that CL will run the modified test against the unmodified
libnetworkstats.so on the device, when running updating tests
without reflashing the device.
Also remove libutils because it does not seem to be needed.
Test: atest libnetworkstats_test passes with libraries moved
Change-Id: Id49641c0a919129e2c54531c3995ec7421161002
Two reason for renaming:
1. Avoid module name collision in sc-mainline-prod branch.
2. The libnetdbpf was misnamed before.
Bug: 202086915
Test: atest libnetworkstats_test FrameworksNetTests
ConnectivityCoverageTests FrameworksNetSmokeTests
CtsAppOpsTestCases
Change-Id: I87fcf4b1a9d58780a45743a9aa91b9b936e54266
This is a clean copy (that can't build). Modifications will be in
followed commits.
Bug: 202086915
Test: $ diff -r
packages/modules/Connectivity/service-t/native/libs/libnetdbpf/
system/netd/libnetdbpf/
(no different)
No-Typo-Check: Clean move
Change-Id: I23aad26d487b4d99e24ffecf79eeef3f8eea664b