Commit Graph

62 Commits

Author SHA1 Message Date
James Zern
fad865c54a namespace ARCH_* defines
this prevents redefinition warnings if a toolchain sets one

BUG=b/117240165

Change-Id: Ib5d8c303cd05b4dbcc8d42c71ecfcba8f6d7b90c
2019-09-30 11:13:29 -07:00
Hien Ho
2f52ae2384 test/vp9_quantize_test: fix int sanitizer warning
implicit conversion from type 'int' of value 42126 (32-bit, signed)
to type 'tran_low_t' (aka 'short') changed the value to -23410 (16-bit, signed)

BUG=webm:1615

Change-Id: I339c640fce81e9f2dd73ef9c9bee084b6a5638dc
2019-08-22 23:15:17 +00:00
Jerome Jiang
8894c766c6 Fix saturation issue in vp9_quantize_fp_neon
Change-Id: I7850a5c5aea3633e50e9a2efc8116b9e16383a8f
2019-08-01 14:57:28 -07:00
James Zern
8f03f719af test/*: use std::*tuple
since:
77fa51003 Replace deprecated scoped_ptr with unique_ptr

c++11 has been required so <tuple> is safe to use

Change-Id: I873cb953104b361a8503b5839a3372ce2b99e73c
2018-12-07 17:55:21 -08:00
Johann
26dbf9eba8 quantize neon: fix hbd builds
BUG=webm:1448

Change-Id: I2140fb9b6ce92716d2d9509f3031244088a62127
2018-12-03 10:55:00 -08:00
Johann
5fbc7a286b quantize 32x32: saturate dqcoeff on x86
This slows down low bitdepth builds but is necessary to obtain correct
values.

BUG=webm:1448

Change-Id: I4ca9145f576089bb8496fcfeedeb556dc8fe6574
2018-11-30 16:27:14 -08:00
Johann
d566160f32 quantize 32x32: fix dqcoeff
Calculate the high bits of dqcoeff and store them appropriately in high
bit depth builds.

Low bit depth builds still do not pass. C truncates the results after
division. X86 only supports packing with saturation at this step.

BUG=webm:1448

Change-Id: Ic80def575136c7ca37edf18d21e26925b475da98
2018-11-28 11:30:37 -05:00
Johann
0eeb797512 quantize: fix x86 hbd builds
Calculate the high bits of dqcoeff in high bit depth builds and store
them appropriately.

BUG=webm:1448

Change-Id: I61a2f8bfcf2e30765f10a94073c4d58321d2fa24
2018-11-28 11:30:02 -05:00
Luc Trudeau
e769aeee80 include msvc.h for snprintf support in benchmarks
include vpx_ports/msvc.h to avoid issues with snprintf issues with MSVC.

Change-Id: Ida09cff8ee3b84e09fd61de131f84b32c113fa1a
2018-06-18 15:18:43 +00:00
Luc Trudeau
74a0b04f57 VSX Version of vp9_quantize_fp_32x32
Low bit depth version only. Passes the VP9QuantizeTest test suite.

VP9QuantizeTest Speed Test (POWER8 Model 2.1)
32x32 C time = 93.1 ms (±0.4 ms), VSX time = 6.5 ms (±0.2 ms) [14.4x]

Change-Id: I7f1fd0fc987af86baf2b74147a25aee811289112
2018-06-11 19:18:22 +00:00
Luc Trudeau
b1434f3125 VSX Version of vp9_quantize_fp
Low bit depth version only. Passes the VP9QuantizeTest test suite.

VP9QuantizeTest Speed Test (POWER8 Model 2.1)
 4x4  C time = 86.3 ms (±0.7 ms), VSX time = 18.2 ms (±0.0 ms) [ 4.7x]
 8x8  C time = 57.7 ms (±0.3 ms), VSX time =  7.6 ms (±0.0 ms) [ 7.6x]
16x16 C time = 50.7 ms (±0.1 ms), VSX time =  4.9 ms (±0.0 ms) [10.3x]

Change-Id: Ic09bc786c57cc89bba14624064216b52996075eb
2018-06-11 19:18:01 +00:00
James Zern
3a0dc0e4b7 test,cosmetics: fix func/member naming, decl order
functions: upper camelcase
members: lowercase with trailing '_'
decl order: functions (overrides marked virtual), members

after:
656e8ac61 VSX version of vpx_post_proc_down_and_across_mb_row
766d875b9 VSX version of vpx_mbpost_proc_ip
35e98a70b VSX version of vpx_mbpost_proc_down
b2898a9ad Bench Class For More Robust Speed Tests

Change-Id: Ib257bd607c5c1248d30e619ec9e8a47cc629825b
2018-06-04 16:14:33 -07:00
Luc Trudeau
b2898a9ade Bench Class For More Robust Speed Tests
To make speed testing more robust, the AbstractBench runs the
desired code multiple times and report the median run time with
mean absolute deviation around the median.

To use the AbstractBench, simply add it as a parent to your test
class, and implement the run() method (with the code you want to
benchmark).

Sample output for VP9QuantizeTest

  [    BENCH ]      Bypass calculations       4x4  165.8 ms ( ±1.0 ms )
  [    BENCH ]        Full calculations       4x4  165.8 ms ( ±0.9 ms )
  [    BENCH ]      Bypass calculations       8x8  129.7 ms ( ±0.9 ms )
  [    BENCH ]        Full calculations       8x8  130.3 ms ( ±1.4 ms )
  [    BENCH ]      Bypass calculations     16x16  110.3 ms ( ±1.4 ms )
  [    BENCH ]        Full calculations     16x16  110.1 ms ( ±0.9 ms )

Change-Id: I1dd649754cb8c4c621eee2728198ea6a555f38b3
2018-05-29 13:04:47 +00:00
Luc Trudeau
d1aede92ec VSX version of vpx_quantize_b_32x32_vsx
Low bit depth version only. Passes the VP9QuantizeTest.

VP9QuantizeTest Speed Test (POWER8 Model 2.1)
Full calculations:
C time = 1456 ms, VSX time = 80 ms (18x)

Change-Id: I1b1d6d03b1aeff63640efbdeb222cab857ddd95e
2018-05-14 19:50:11 +00:00
Luc Trudeau
1251bf2a63 VSX version of vpx_quantize_b_vsx
Low bit depth version only. Passes the VP9QuantizeTest.

Change-Id: I6546f872864bd404a7e353348b0554aab1de5bf0
2018-05-09 17:54:27 +00:00
James Zern
db49a22cfa test: use testing::*tuple instead of std::tr1
googletest imports tuple into testing to allow for compatibility across
c++ versions where tuple may be in std::tr1 or std. fixes deprecation
warnings under visual studio 2017

Change-Id: Id78b372d5478b12d8c8f63fd3f2166fec25aa8be
2018-03-28 12:45:35 -07:00
Scott LaVarnway
c7449b482c vp9_quantize_fp_avx2()
Started from vp9_quantize_fp_sse2 and tweaked to use avx2.

Change-Id: Ic2da50cc9d73896c7ef2f3cd3db5b1c5d7795b8b
2018-01-18 13:33:30 -08:00
Scott LaVarnway
fe5d87aaeb Add quantize_fp_32x32_nz_c()
This c version uses the shortcuts found in the
vp9_quantize_fp_32x32_ssse3 function.

Change-Id: I2e983adb00064e070b7f2b1ac088cc58cf778137
2017-12-26 06:11:21 -08:00
Scott LaVarnway
8a4336ed2e Add vp9_quantize_fp_nz_c() -- 2
This c version uses the shortcuts found in the x86
vp9_quantize_fp functions.

The test was updated to use the correct quant/round range.

Change-Id: Ie5871f710d9eb39047d8d9f48b907c0633e1f830
2017-12-21 15:26:36 -08:00
James Zern
7a245adb18 Revert "Add vp9_quantize_fp_nz_c()"
This reverts commit 86842855d3.

SSSE3/VP9QuantizeTest.EOBCheck/1 fails on Mac and the build breaks under
visual studio due to a #if within another macro.

Change-Id: I475095a04aafcc714fade2b24e4df7b682be2cd1
2017-12-21 06:05:19 -08:00
Scott LaVarnway
86842855d3 Add vp9_quantize_fp_nz_c()
This c version uses the shortcuts found in the x86
vp9_quantize_fp functions.

The test was updated to use the correct quant/round range.

Change-Id: I5d19f8af2fddda8e50910249eafb740acb29415b
2017-12-19 12:48:45 -08:00
Johann
eb4238ac70 Revert "Revert "quantize avx: copy 32x32 implementation""
This reverts commit 8c42237bb2.

Because ssse3 code is used for the reference, the qcoeff and dqcoeff
reference buffers must be aligned.

Original change's description:
> quantize avx: copy 32x32 implementation
>
> Ensure avx and ssse3 stay in sync by testing them against each other.
>
> Change-Id: I699f3b48785c83260825402d7826231f475f697c

Change-Id: Ieeef11b9406964194028b0d81d84bcb63296ae06
2017-09-12 14:25:38 -07:00
Marco Paniconi
3e069846b9 Merge "Revert "quantize avx: copy 32x32 implementation"" 2017-08-25 18:20:31 +00:00
Marco Paniconi
8c42237bb2 Revert "quantize avx: copy 32x32 implementation"
This reverts commit f60d1dcd3d.

Reason for revert: <INSERT REASONING HERE>
Failures in AVX/VP9QuantizeTest in nightly tests.
Original change's description:
> quantize avx: copy 32x32 implementation
> 
> Ensure avx and ssse3 stay in sync by testing them against each other.
> 
> Change-Id: I699f3b48785c83260825402d7826231f475f697c

TBR=slavarnway@google.com,johannkoenig@google.com,builds@webmproject.org

Change-Id: Ibd38636212269328317dd0721be9d25452113d1c
No-Presubmit: true
No-Tree-Checks: true
No-Try: true
2017-08-25 16:56:08 +00:00
Johann Koenig
6c21650c0e Merge "quantize avx: copy 32x32 implementation" 2017-08-24 18:55:03 +00:00
Johann Koenig
258122fdc6 Merge "quantize test: skip block was removed" 2017-08-24 17:43:10 +00:00
Johann
f60d1dcd3d quantize avx: copy 32x32 implementation
Ensure avx and ssse3 stay in sync by testing them against each other.

Change-Id: I699f3b48785c83260825402d7826231f475f697c
2017-08-24 10:42:34 -07:00
Johann
1787e7dbe0 quantize ssse3: copy implementation to intrinsics
Still does not pass tests. Does match the previous assembly, although
saving the sign before multiplying is dubious.

Change-Id: Ia163f18c755aba542d6e93f7bf7343184660df5a
2017-08-24 07:47:51 -07:00
Johann
92aafefa1e quantize test: skip block was removed
Change-Id: I1d93698bc27529b0544d79dd7b9fe37afa51ef87
2017-08-24 07:21:42 -07:00
Johann
e89344d61a quantize test: set threshold for 32x32
Change-Id: I77be617c7d7c64929dd51c6077322f4f8ad23897
2017-08-23 15:59:11 -07:00
Johann Koenig
f53b656207 Merge "quantize avx: copy implementation to intrinsics" 2017-08-23 21:14:13 +00:00
Johann
7c27872164 quantize avx: copy implementation to intrinsics
Adds an early exit based on ptest. Slightly slower than ssse3 in the
full case because of the extra check, but potentially faster if lots of
rows can be skipped.

Very close in speed to the assembly.

Can run in 32 bit, unlike the assembly. Allows reworking the function
prototype to use structs.

Change-Id: If80e2b9ba059370a4cad3c973196e82a97b4330e
2017-08-23 09:19:16 -07:00
Johann
e83d99d7b8 quantize fp: neon implementation
About 4x faster when values are below the dequant threshold and 10x
faster if everything needs to be calculated.

Both numbers would improve if the division for dqcoeff could be
simplified.

BUG=webm:1426

Change-Id: I8da67c1f3fcb4abed8751990c1afe00bc841f4b2
2017-08-23 08:01:30 -07:00
Johann
661efeca97 quantize test: test _fp_ version of quantize
None of the x86 optimizations pass the tests.

Change-Id: Ic67f2ba1977b657e68f2a13b0711fc5fcbafd909
2017-08-21 12:29:41 -07:00
Johann
13eed991f9 Remove skip_block from quantize
This condition is handled before this code is reached. The ssse3 version
of the function has always crashed when attempting to handle the
skip_block condition.

Add assert() and comments regarding the usage of skip_block.

Removing the parameter is a fairly involved process so leave it be for
the moment.

Change-Id: Ib299f6fc6589d7ee102262cc74a7aeb60110bc5a
2017-08-21 09:49:04 -07:00
Johann
08cb7b5c68 quantize test: quiet overflow warning
Promote the result of RandRange to signed

Change-Id: I89313cace3bcbe9af96946bef00b6857fc48b128
2017-08-15 08:28:09 -07:00
Johann Koenig
ff184e482a Merge changes I4b4beab1,I02f74dec
* changes:
  quantize test: check skip_block
  quantize test: use negative input
2017-08-14 20:52:52 +00:00
James Zern
746c0eab3b disable SSSE3/VP9QuantizeTest* in hbd builds
this test fails with the configuration similar to the assembly prior to:
d52cb5972 quantize: copy ssse3 optimizations to intrinsics

BUG=webm:1458

Change-Id: Idc5c0b84c0598259fc49609a9f0756de531d3baf
2017-08-14 09:31:14 -07:00
Johann Koenig
9bb8ce5efb Merge "neon: vpx_quantize_b_32x32" 2017-08-10 15:42:49 +00:00
Johann
357adb68b2 quantize test: check skip_block
Not all sizes were tested previously. Only 4x4 and 32x32

Change-Id: I4b4beab1b92a810a097a7306de04cc9e0e260315
2017-08-08 14:21:58 -07:00
Johann
1092cc7f1a quantize test: use negative input
coeff contains signed values.

Change-Id: I02f74decf30379a28122169ab3e844d0f3bd7d23
2017-08-08 14:19:56 -07:00
Johann
93166c5e51 neon: vpx_quantize_b_32x32
With skip block the neon is about twice as fast as C.

The neon has no shortcut for coeff < zbin so it always takes the
same amount of time. Even if the C can take the shortcut, it is over
twice as fast in neon. If it can't, that gap increases to over 10x.

BUG=webm:1426

Change-Id: I400722146c1b5a5f6289f67d85fd642463d2bfc6
2017-08-08 14:05:18 -07:00
Johann
d52cb59729 quantize: copy ssse3 optimizations to intrinsics
Fairly minor differences from sse2. pabsw and psignw are the big gains.
Also re-uses some values in eob calculation to avoid an extra pcmp.

Fixes test failures in HBD and OS X builds.

Allows using it in 32bit builds, where it is about 40% faster than sse2.

Substantially faster than the assembly for skip_block. 10-20% faster the
rest of the time.

Change-Id: If783bb3567e561e47667e10133b9c84414a334e2
2017-08-08 12:22:14 -07:00
Johann
9578a84205 quantize test: consolidate sizes
Pass a max txfm size parameter and combine the base quantize
test with the 32x32 test.

Change-Id: I72ddf020fe6888e864ea9f3642ee2d9a8e48a04b
2017-08-04 12:45:32 -07:00
Johann
1059b5cc52 quantize test: add speed comparison
Test some possible scenarios.

Change-Id: I1a612e7153b31756be66390ceea55877856d5a33
2017-08-02 09:33:35 -07:00
Johann
2d6b5df657 neon: vpx_quantize_b
With skip block or coeff < zbin it is about twice as fast as C.

If most coeff values are > zbin it is about 10-15x as fast as C.

BUG=webm:1426

Change-Id: I5d3c007b014a372d5ef0882b39bb48983b4131c7
2017-07-31 10:38:46 -07:00
Johann
af08fbb444 quantize test: promote RandRange() result to signed
Avoid unsigned overflow warning:
unsigned integer overflow: 19974 - 32703 cannot be represented in type
'unsigned int'

Change-Id: Ifebee014342e4c6f3b53306c0cad6ae0b465ac12
2017-07-20 08:17:48 -07:00
Johann
c782f27ead quantize test: lowbd functions do not pass in highbd
qcoeff output looks OK but dqcoeff is no good.

BUG=webm:1448

Change-Id: I07211db8a8b74f1f45fdd059852e2de0e5ee18fd
2017-07-20 08:17:48 -07:00
Johann
bde2e4aa36 quantize test: eob is output
eob values are generated by the function.

Change-Id: I8ce92100e83022bff99888a5a7e6ef378c49fda3
2017-07-19 14:17:19 -07:00
Johann
101981b736 quantize test: test sse2 and avx optimizations
ssse3 does not pass either of the tests.

avx 32x32 does not pass.

Change-Id: I62c2e31336fd2327327afaa0da896ad79a3def44
2017-07-18 12:08:16 -07:00