diff --git a/tools/repo_diff/README.md b/tools/repo_diff/README.md new file mode 100644 index 000000000..d4b059635 --- /dev/null +++ b/tools/repo_diff/README.md @@ -0,0 +1,107 @@ +# Repo Diff Trees + +repo_diff_trees.py compares two repo source trees and outputs reports on the +findings. + +The ouput is in CSV and is easily consumable in a spreadsheet. + +In addition to importing to a spreadsheet, you can also create your own +Data Studio dashboard like [this one](https://datastudio.google.com/open/0Bz6OwjyDcWYDbDJoQWtmRl8telU). + +If you wish to create your own dashboard follow the instructions below: + +1. Sync the two repo workspaces you wish to compare. Example: + +``` +mkdir android-8.0.0_r1 +cd android-8.0.0_r1 +repo init \ + --manifest-url=https://android.googlesource.com/platform/manifest \ + --manifest-branch=android-8.0.0_r1 +# Adjust the number of parallel jobs to your needs +repo sync --current-branch --no-clone-bundle --no-tags --jobs=8 +cd .. +mkdir android-8.0.0_r11 +cd android-8.0.0_r11 +repo init \ + --manifest-url=https://android.googlesource.com/platform/manifest \ + --manifest-branch=android-8.0.0_r11 +# Adjust the number of parallel jobs to your needs +repo sync --current-branch --no-clone-bundle --no-tags --jobs=8 +cd .. +``` + +2. Run repo_diff_trees.py. Example: + +``` +python repo_diff_trees.py --exclusions_file=android_exclusions.txt \ + android-8.0.0_r1 android-8.0.0_r11 +``` + +3. Create a [new Google spreadsheet](https://docs.google.com/spreadsheets/create). +4. Import projects.csv to a new sheet. +5. Create a [new data source in Data Studio](https://datastudio.google.com/datasources/create). +6. Connect your new data source to the project.csv sheet in the Google spreadsheet. +7. Add a "Count Diff Status" field by selecting the menu next to the "Diff + Status" field and selecting "Count". +8. Copy the [Data Studio dashboard sample](https://datastudio.google.com/open/0Bz6OwjyDcWYDbDJoQWtmRl8telU). + Make sure you are logged into your Google account and you have agreed to Data Studio's terms of service. Once + this is done you should get a link to "Make a copy of this report". +9. Select your own data source for your copy of the dashboard when prompted. +10. You may see a "Configuration Incomplete" message under + the "Modified Projects" pie chart. To address this select the pie chart, + then replace the "Invalid Metric" field for "Count Diff Status". + +## Analysis method + +repo_diff_trees.py goes through several stages when comparing two repo +source trees: + +1. Match projects in source tree A with projects in source tree B. +2. Diff projects that have a match. +3. Find commits in source tree B that are not in source tree A. + +The first two steps are self explanatory. The method +of finding commits only in B is explaned below. + +## Finding commits not upstream + +After matching up projects in both source tree +and diffing, the last stage is to iterate +through each project matching pair and find +the commits that exist in the downstream project (B) but not the +upstream project (A). + +'git cherry' is a useful tool that finds changes +which exist in one branch but not another. It does so by +not only by finding which commits that were merged +to both branches, but also by matching cherry picked +commits. + +However, there are many instances where a change in one branch +can have an equivalent in another branch without being a merge +or a cherry pick. Some examples are: + +* Commits that were squashed with other commits +* Commits that were reauthored + +Cherry pick will not recognize these commits as having an equivalent +yet they clearly do. + +This is addressed in two steps: + +1. First listing the "git cherry" commits that will give us the + list of changes for which "git cherry" could not find an equivalent. +2. Then we "git blame" the entire project's source tree and compile + a list of changes that actually have lines of code in the tree. +3. Finally we find the intersection: 'git cherry' changes + that have lines of code in the final source tree. + + +## Caveats + +The method described above has proven effective on Android +source trees. It does have shortcomings. + +* It does not find commits that only delete lines of code. +* It does take into accounts merge conflict resolutions. diff --git a/tools/repo_diff/exclusions.txt b/tools/repo_diff/android_exclusions.txt similarity index 100% rename from tools/repo_diff/exclusions.txt rename to tools/repo_diff/android_exclusions.txt diff --git a/tools/repo_diff/repo_diff_android.py b/tools/repo_diff/repo_diff_android.py index 7868df27a..f47e71f0f 100644 --- a/tools/repo_diff/repo_diff_android.py +++ b/tools/repo_diff/repo_diff_android.py @@ -12,7 +12,7 @@ This script: import argparse import os import subprocess -import repo_diff_downstream +import repo_diff_trees HELP_MSG = "Diff a repo (downstream) and its upstream" @@ -152,7 +152,7 @@ def diff(manifest_url, manifest_branch, tag, upstream_manifest_url, upstream_workspace) # do the comparison - repo_diff_downstream.diff( + repo_diff_trees.diff( upstream_workspace, workspace, os.path.abspath("project.csv"), diff --git a/tools/repo_diff/repo_diff_downstream.py b/tools/repo_diff/repo_diff_trees.py similarity index 100% rename from tools/repo_diff/repo_diff_downstream.py rename to tools/repo_diff/repo_diff_trees.py