implicit conversion from type 'int' of value 42126 (32-bit, signed)
to type 'tran_low_t' (aka 'short') changed the value to -23410 (16-bit, signed)
BUG=webm:1615
Change-Id: I339c640fce81e9f2dd73ef9c9bee084b6a5638dc
since:
77fa51003 Replace deprecated scoped_ptr with unique_ptr
c++11 has been required so <tuple> is safe to use
Change-Id: I873cb953104b361a8503b5839a3372ce2b99e73c
Calculate the high bits of dqcoeff and store them appropriately in high
bit depth builds.
Low bit depth builds still do not pass. C truncates the results after
division. X86 only supports packing with saturation at this step.
BUG=webm:1448
Change-Id: Ic80def575136c7ca37edf18d21e26925b475da98
Calculate the high bits of dqcoeff in high bit depth builds and store
them appropriately.
BUG=webm:1448
Change-Id: I61a2f8bfcf2e30765f10a94073c4d58321d2fa24
Low bit depth version only. Passes the VP9QuantizeTest test suite.
VP9QuantizeTest Speed Test (POWER8 Model 2.1)
32x32 C time = 93.1 ms (±0.4 ms), VSX time = 6.5 ms (±0.2 ms) [14.4x]
Change-Id: I7f1fd0fc987af86baf2b74147a25aee811289112
Low bit depth version only. Passes the VP9QuantizeTest test suite.
VP9QuantizeTest Speed Test (POWER8 Model 2.1)
4x4 C time = 86.3 ms (±0.7 ms), VSX time = 18.2 ms (±0.0 ms) [ 4.7x]
8x8 C time = 57.7 ms (±0.3 ms), VSX time = 7.6 ms (±0.0 ms) [ 7.6x]
16x16 C time = 50.7 ms (±0.1 ms), VSX time = 4.9 ms (±0.0 ms) [10.3x]
Change-Id: Ic09bc786c57cc89bba14624064216b52996075eb
functions: upper camelcase
members: lowercase with trailing '_'
decl order: functions (overrides marked virtual), members
after:
656e8ac61 VSX version of vpx_post_proc_down_and_across_mb_row
766d875b9 VSX version of vpx_mbpost_proc_ip
35e98a70b VSX version of vpx_mbpost_proc_down
b2898a9ad Bench Class For More Robust Speed Tests
Change-Id: Ib257bd607c5c1248d30e619ec9e8a47cc629825b
To make speed testing more robust, the AbstractBench runs the
desired code multiple times and report the median run time with
mean absolute deviation around the median.
To use the AbstractBench, simply add it as a parent to your test
class, and implement the run() method (with the code you want to
benchmark).
Sample output for VP9QuantizeTest
[ BENCH ] Bypass calculations 4x4 165.8 ms ( ±1.0 ms )
[ BENCH ] Full calculations 4x4 165.8 ms ( ±0.9 ms )
[ BENCH ] Bypass calculations 8x8 129.7 ms ( ±0.9 ms )
[ BENCH ] Full calculations 8x8 130.3 ms ( ±1.4 ms )
[ BENCH ] Bypass calculations 16x16 110.3 ms ( ±1.4 ms )
[ BENCH ] Full calculations 16x16 110.1 ms ( ±0.9 ms )
Change-Id: I1dd649754cb8c4c621eee2728198ea6a555f38b3
Low bit depth version only. Passes the VP9QuantizeTest.
VP9QuantizeTest Speed Test (POWER8 Model 2.1)
Full calculations:
C time = 1456 ms, VSX time = 80 ms (18x)
Change-Id: I1b1d6d03b1aeff63640efbdeb222cab857ddd95e
googletest imports tuple into testing to allow for compatibility across
c++ versions where tuple may be in std::tr1 or std. fixes deprecation
warnings under visual studio 2017
Change-Id: Id78b372d5478b12d8c8f63fd3f2166fec25aa8be
This c version uses the shortcuts found in the x86
vp9_quantize_fp functions.
The test was updated to use the correct quant/round range.
Change-Id: Ie5871f710d9eb39047d8d9f48b907c0633e1f830
This reverts commit 86842855d3.
SSSE3/VP9QuantizeTest.EOBCheck/1 fails on Mac and the build breaks under
visual studio due to a #if within another macro.
Change-Id: I475095a04aafcc714fade2b24e4df7b682be2cd1
This c version uses the shortcuts found in the x86
vp9_quantize_fp functions.
The test was updated to use the correct quant/round range.
Change-Id: I5d19f8af2fddda8e50910249eafb740acb29415b
This reverts commit 8c42237bb2.
Because ssse3 code is used for the reference, the qcoeff and dqcoeff
reference buffers must be aligned.
Original change's description:
> quantize avx: copy 32x32 implementation
>
> Ensure avx and ssse3 stay in sync by testing them against each other.
>
> Change-Id: I699f3b48785c83260825402d7826231f475f697c
Change-Id: Ieeef11b9406964194028b0d81d84bcb63296ae06
This reverts commit f60d1dcd3d.
Reason for revert: <INSERT REASONING HERE>
Failures in AVX/VP9QuantizeTest in nightly tests.
Original change's description:
> quantize avx: copy 32x32 implementation
>
> Ensure avx and ssse3 stay in sync by testing them against each other.
>
> Change-Id: I699f3b48785c83260825402d7826231f475f697c
TBR=slavarnway@google.com,johannkoenig@google.com,builds@webmproject.org
Change-Id: Ibd38636212269328317dd0721be9d25452113d1c
No-Presubmit: true
No-Tree-Checks: true
No-Try: true
Still does not pass tests. Does match the previous assembly, although
saving the sign before multiplying is dubious.
Change-Id: Ia163f18c755aba542d6e93f7bf7343184660df5a
Adds an early exit based on ptest. Slightly slower than ssse3 in the
full case because of the extra check, but potentially faster if lots of
rows can be skipped.
Very close in speed to the assembly.
Can run in 32 bit, unlike the assembly. Allows reworking the function
prototype to use structs.
Change-Id: If80e2b9ba059370a4cad3c973196e82a97b4330e
About 4x faster when values are below the dequant threshold and 10x
faster if everything needs to be calculated.
Both numbers would improve if the division for dqcoeff could be
simplified.
BUG=webm:1426
Change-Id: I8da67c1f3fcb4abed8751990c1afe00bc841f4b2
This condition is handled before this code is reached. The ssse3 version
of the function has always crashed when attempting to handle the
skip_block condition.
Add assert() and comments regarding the usage of skip_block.
Removing the parameter is a fairly involved process so leave it be for
the moment.
Change-Id: Ib299f6fc6589d7ee102262cc74a7aeb60110bc5a
this test fails with the configuration similar to the assembly prior to:
d52cb5972 quantize: copy ssse3 optimizations to intrinsics
BUG=webm:1458
Change-Id: Idc5c0b84c0598259fc49609a9f0756de531d3baf
With skip block the neon is about twice as fast as C.
The neon has no shortcut for coeff < zbin so it always takes the
same amount of time. Even if the C can take the shortcut, it is over
twice as fast in neon. If it can't, that gap increases to over 10x.
BUG=webm:1426
Change-Id: I400722146c1b5a5f6289f67d85fd642463d2bfc6
Fairly minor differences from sse2. pabsw and psignw are the big gains.
Also re-uses some values in eob calculation to avoid an extra pcmp.
Fixes test failures in HBD and OS X builds.
Allows using it in 32bit builds, where it is about 40% faster than sse2.
Substantially faster than the assembly for skip_block. 10-20% faster the
rest of the time.
Change-Id: If783bb3567e561e47667e10133b9c84414a334e2
With skip block or coeff < zbin it is about twice as fast as C.
If most coeff values are > zbin it is about 10-15x as fast as C.
BUG=webm:1426
Change-Id: I5d3c007b014a372d5ef0882b39bb48983b4131c7
Avoid unsigned overflow warning:
unsigned integer overflow: 19974 - 32703 cannot be represented in type
'unsigned int'
Change-Id: Ifebee014342e4c6f3b53306c0cad6ae0b465ac12