this resolves some msan errors.
the same change was done in libaom:
5ab58722c Add missing initializations of HBD buffers
Change-Id: I8882af45b95c90ba43bf138c7d305a6c3b99e61c
since:
77fa51003 Replace deprecated scoped_ptr with unique_ptr
c++11 has been required so <tuple> is safe to use
Change-Id: I873cb953104b361a8503b5839a3372ce2b99e73c
Horizontal filter on 64x64 block: 1.59 times as fast as baseline.
Vertical filter on 64x64 block: 2.5 times as fast as baseline.
2D filter on 64x64 block: 1.96 times as fast as baseline.
Change-Id: I12e46679f3108616d5b3475319dd38b514c6cb3c
The interp filter tap calculation was not accurate to tell the
difference between 2 taps and 4 taps. This patch fixed the bug, and
resolved Jenkins test failures in mips sub-pel filter optimizations.
BUG=webm:1568
Change-Id: I51eb8adb7ed194ef2ea7dd4aa57aa9870ee38cfc
There are Jenkins test failures in mips sub-pel filter optimizations.
[ RUN ] MSA/ConvolveTest.MatchesReferenceSubpixelFilter/5
../libvpx/test/convolve_test.cc:889: Failure
Expected equality of these values:
lookup(ref, y * kOutputStride + x)
Which is: 255
lookup(out, y * kOutputStride + x)
Which is: 11
mismatch at (1,0), filters (4,0,1)
This relates to the 4-tap kernel added recently. This CL is a temporary
fix, while we investigate the issue.
BUG=webm:1568
Change-Id: If64c552b794425687cca4fbed893d8ccb73c89a5
Added the 4-tap interp filter, and used it for speed 1 sub-pel motion
search. Speed 2 motion search still used bilinear filter as before.
Speed 1 borg test showed good bit savings.
avg_psnr: ovr_psnr: ssim:
lowres: -1.125 -1.179 -1.021
midres: -0.717 -0.710 -0.543
hdres: -0.357 -0.370 -0.342
Speed test at speed 1 showed ~10% encoder time increase, which was
partially because of no SIMD version of 4-tap filter.
Change-Id: Ic9b48cdc6a964538c20144108526682d64348301
googletest imports tuple into testing to allow for compatibility across
c++ versions where tuple may be in std::tr1 or std. fixes deprecation
warnings under visual studio 2017
Change-Id: Id78b372d5478b12d8c8f63fd3f2166fec25aa8be
Changed the intrinsics to perform summation similiar to the way the assembly does.
The new code diverges from the assembly by preferring unsaturated additions.
Results for haswell
SSSE3
Horiz/Vert Size Speedup
Horiz x4 ~32%
Horiz x8 ~6%
Vert x8 ~4%
AVX2
Horiz/Vert Size Speedup
Horiz x16 ~16%
Vert x16 ~14%
BUG=webm:1471
Change-Id: I7ad98ea688c904b1ba324adf8eb977873c8b8668
Let it test extreme inputs and all filter types.
In the future ConvolveTest should test regular 8-bit functions in
high bitdepth mode.
Change-Id: I1042564d1d390589ca203070fe332c6da3315d75
vpx_convolve8_avg works by first running a normal horizontal filter then a
vertical filter averages at the end.
The added vpx_convolve8_avg_avx2 calls pre-existing AVX2 code for the
horizontal step.
vpx_convolve8_avg_vert_avx2 is also added, but only uses ssse3 code.
Change-Id: If5160c0c8e778e10de61ee9bf42ee4be5975c983
Replace by CAST_TO_BYTEPTR/SHORTPTR.
The rule is: if a short ptr is casted to a byte ptr, any offset
operation on the byte ptr must be doubled. We do this by casting to
short ptr first, adding offset, then casting back to byte ptr.
BUG=webm:1388
Change-Id: I9e18a73ba45ddae58fc9dae470c0ff34951fe248
applied against a x86_64 configure with and without
--enable-vp9-highbitdepth
clang-tidy-3.7.1 \
-checks='-*,google-readability-braces-around-statements' \
-header-filter='.*' -fix
+ clang-format afterward
Change-Id: Ia2993ec64cf1eb3505d3bfb39068d9e44cfbce8d
CONVERT_TO_BYTEPTR(x) was corrected in:
003a9d2 Port metric computation changes from nextgenv2
to use the more common (x) within the expansion. offsets should occur
after converting the pointer to the desired type.
+ factorized some common expressions
Change-Id: I171c3faaa5606d098e984baa9aa74bb36042f57f