Merge "Add README.md"
This commit is contained in:
107
tools/repo_diff/README.md
Normal file
107
tools/repo_diff/README.md
Normal file
@@ -0,0 +1,107 @@
|
||||
# Repo Diff Trees
|
||||
|
||||
repo_diff_trees.py compares two repo source trees and outputs reports on the
|
||||
findings.
|
||||
|
||||
The ouput is in CSV and is easily consumable in a spreadsheet.
|
||||
|
||||
In addition to importing to a spreadsheet, you can also create your own
|
||||
Data Studio dashboard like [this one](https://datastudio.google.com/open/0Bz6OwjyDcWYDbDJoQWtmRl8telU).
|
||||
|
||||
If you wish to create your own dashboard follow the instructions below:
|
||||
|
||||
1. Sync the two repo workspaces you wish to compare. Example:
|
||||
|
||||
```
|
||||
mkdir android-8.0.0_r1
|
||||
cd android-8.0.0_r1
|
||||
repo init \
|
||||
--manifest-url=https://android.googlesource.com/platform/manifest \
|
||||
--manifest-branch=android-8.0.0_r1
|
||||
# Adjust the number of parallel jobs to your needs
|
||||
repo sync --current-branch --no-clone-bundle --no-tags --jobs=8
|
||||
cd ..
|
||||
mkdir android-8.0.0_r11
|
||||
cd android-8.0.0_r11
|
||||
repo init \
|
||||
--manifest-url=https://android.googlesource.com/platform/manifest \
|
||||
--manifest-branch=android-8.0.0_r11
|
||||
# Adjust the number of parallel jobs to your needs
|
||||
repo sync --current-branch --no-clone-bundle --no-tags --jobs=8
|
||||
cd ..
|
||||
```
|
||||
|
||||
2. Run repo_diff_trees.py. Example:
|
||||
|
||||
```
|
||||
python repo_diff_trees.py --exclusions_file=android_exclusions.txt \
|
||||
android-8.0.0_r1 android-8.0.0_r11
|
||||
```
|
||||
|
||||
3. Create a [new Google spreadsheet](https://docs.google.com/spreadsheets/create).
|
||||
4. Import projects.csv to a new sheet.
|
||||
5. Create a [new data source in Data Studio](https://datastudio.google.com/datasources/create).
|
||||
6. Connect your new data source to the project.csv sheet in the Google spreadsheet.
|
||||
7. Add a "Count Diff Status" field by selecting the menu next to the "Diff
|
||||
Status" field and selecting "Count".
|
||||
8. Copy the [Data Studio dashboard sample](https://datastudio.google.com/open/0Bz6OwjyDcWYDbDJoQWtmRl8telU).
|
||||
Make sure you are logged into your Google account and you have agreed to Data Studio's terms of service. Once
|
||||
this is done you should get a link to "Make a copy of this report".
|
||||
9. Select your own data source for your copy of the dashboard when prompted.
|
||||
10. You may see a "Configuration Incomplete" message under
|
||||
the "Modified Projects" pie chart. To address this select the pie chart,
|
||||
then replace the "Invalid Metric" field for "Count Diff Status".
|
||||
|
||||
## Analysis method
|
||||
|
||||
repo_diff_trees.py goes through several stages when comparing two repo
|
||||
source trees:
|
||||
|
||||
1. Match projects in source tree A with projects in source tree B.
|
||||
2. Diff projects that have a match.
|
||||
3. Find commits in source tree B that are not in source tree A.
|
||||
|
||||
The first two steps are self explanatory. The method
|
||||
of finding commits only in B is explaned below.
|
||||
|
||||
## Finding commits not upstream
|
||||
|
||||
After matching up projects in both source tree
|
||||
and diffing, the last stage is to iterate
|
||||
through each project matching pair and find
|
||||
the commits that exist in the downstream project (B) but not the
|
||||
upstream project (A).
|
||||
|
||||
'git cherry' is a useful tool that finds changes
|
||||
which exist in one branch but not another. It does so by
|
||||
not only by finding which commits that were merged
|
||||
to both branches, but also by matching cherry picked
|
||||
commits.
|
||||
|
||||
However, there are many instances where a change in one branch
|
||||
can have an equivalent in another branch without being a merge
|
||||
or a cherry pick. Some examples are:
|
||||
|
||||
* Commits that were squashed with other commits
|
||||
* Commits that were reauthored
|
||||
|
||||
Cherry pick will not recognize these commits as having an equivalent
|
||||
yet they clearly do.
|
||||
|
||||
This is addressed in two steps:
|
||||
|
||||
1. First listing the "git cherry" commits that will give us the
|
||||
list of changes for which "git cherry" could not find an equivalent.
|
||||
2. Then we "git blame" the entire project's source tree and compile
|
||||
a list of changes that actually have lines of code in the tree.
|
||||
3. Finally we find the intersection: 'git cherry' changes
|
||||
that have lines of code in the final source tree.
|
||||
|
||||
|
||||
## Caveats
|
||||
|
||||
The method described above has proven effective on Android
|
||||
source trees. It does have shortcomings.
|
||||
|
||||
* It does not find commits that only delete lines of code.
|
||||
* It does take into accounts merge conflict resolutions.
|
||||
@@ -12,7 +12,7 @@ This script:
|
||||
import argparse
|
||||
import os
|
||||
import subprocess
|
||||
import repo_diff_downstream
|
||||
import repo_diff_trees
|
||||
|
||||
HELP_MSG = "Diff a repo (downstream) and its upstream"
|
||||
|
||||
@@ -152,7 +152,7 @@ def diff(manifest_url, manifest_branch, tag, upstream_manifest_url,
|
||||
upstream_workspace)
|
||||
|
||||
# do the comparison
|
||||
repo_diff_downstream.diff(
|
||||
repo_diff_trees.diff(
|
||||
upstream_workspace,
|
||||
workspace,
|
||||
os.path.abspath("project.csv"),
|
||||
|
||||
Reference in New Issue
Block a user