Commit Graph

111 Commits

Author SHA1 Message Date
James Zern
fad865c54a namespace ARCH_* defines
this prevents redefinition warnings if a toolchain sets one

BUG=b/117240165

Change-Id: Ib5d8c303cd05b4dbcc8d42c71ecfcba8f6d7b90c
2019-09-30 11:13:29 -07:00
Hien Ho
073b326565 update .clang-format for version clang-7.0.1 update.
added files that are affected by clang-format version 7.

BUG=b/120815481

Change-Id: I40662ce962e4f4b1fcdf183b700f85cc5c0f9f82
2019-03-29 18:25:26 +00:00
James Zern
495282774e convolve_test: Add missing init of HBD buffers
this resolves some msan errors.
the same change was done in libaom:
5ab58722c Add missing initializations of HBD buffers

Change-Id: I8882af45b95c90ba43bf138c7d305a6c3b99e61c
2019-01-11 16:36:45 -08:00
James Zern
8f03f719af test/*: use std::*tuple
since:
77fa51003 Replace deprecated scoped_ptr with unique_ptr

c++11 has been required so <tuple> is safe to use

Change-Id: I873cb953104b361a8503b5839a3372ce2b99e73c
2018-12-07 17:55:21 -08:00
Angie Chiang
49b6b99f5c Fix scan_build warnings in convolve_test.cc
Change-Id: I87e1c3f0492cde805b54b048385ea200652dfccc
2018-11-21 10:40:52 -08:00
Angie Chiang
9a848af54d Fix scan_build warnings in convolve_test.cc
BUG=webm:1575

Change-Id: Ic90b09e596fa68bc516237d31b7f4540831becfd
2018-11-19 18:46:22 -08:00
chiyotsai
272f46212e Add SSE2 support for 4-tap interpolation filter for width 16.
Horizontal filter on 64x64 block: 1.59 times as fast as baseline.
Vertical filter on 64x64 block: 2.5 times as fast as baseline.
2D filter on 64x64 block: 1.96 times as fast as baseline.

Change-Id: I12e46679f3108616d5b3475319dd38b514c6cb3c
2018-10-17 09:58:30 -07:00
Yunqing Wang
bcd17e32c9 Fix the filter tap calculation in mips optimizations
The interp filter tap calculation was not accurate to tell the
difference between 2 taps and 4 taps. This patch fixed the bug, and
resolved Jenkins test failures in mips sub-pel filter optimizations.

BUG=webm:1568

Change-Id: I51eb8adb7ed194ef2ea7dd4aa57aa9870ee38cfc
2018-10-16 09:35:23 -07:00
Yunqing Wang
be51a7731d A temporary fix to mips sub-pel filters
There are Jenkins test failures in mips sub-pel filter optimizations.
[ RUN      ] MSA/ConvolveTest.MatchesReferenceSubpixelFilter/5
../libvpx/test/convolve_test.cc:889: Failure
Expected equality of these values:
  lookup(ref, y * kOutputStride + x)
    Which is: 255
  lookup(out, y * kOutputStride + x)
    Which is: 11
mismatch at (1,0), filters (4,0,1)

This relates to the 4-tap kernel added recently. This CL is a temporary
fix, while we investigate the issue.

BUG=webm:1568

Change-Id: If64c552b794425687cca4fbed893d8ccb73c89a5
2018-10-15 16:48:02 -07:00
Yunqing Wang
50b91aff52 Use 4-tap interp filter in speed 1 sub-pel motion search
Added the 4-tap interp filter, and used it for speed 1 sub-pel motion
search. Speed 2 motion search still used bilinear filter as before.

Speed 1 borg test showed good bit savings.
        avg_psnr:  ovr_psnr:    ssim:
lowres:  -1.125    -1.179      -1.021
midres:  -0.717    -0.710      -0.543
hdres:   -0.357    -0.370      -0.342
Speed test at speed 1 showed ~10% encoder time increase, which was
partially because of no SIMD version of 4-tap filter.

Change-Id: Ic9b48cdc6a964538c20144108526682d64348301
2018-10-09 09:47:22 -07:00
guxiwei-hf@loongson.cn
15dad6bcbc vp9: [loongson] optimize vpx_convolve8 with mmi
1. vpx_convolve_avg_mmi
2. vpx_convolve8_avg_horiz_mmi

Change-Id: Ie544aac45b4b1c0a0e51b44b650189ae5e88aee1
2018-04-25 09:55:05 +08:00
James Zern
d636fe53af Merge "test: use testing::*tuple instead of std::tr1" 2018-03-29 19:01:34 +00:00
James Zern
db49a22cfa test: use testing::*tuple instead of std::tr1
googletest imports tuple into testing to allow for compatibility across
c++ versions where tuple may be in std::tr1 or std. fixes deprecation
warnings under visual studio 2017

Change-Id: Id78b372d5478b12d8c8f63fd3f2166fec25aa8be
2018-03-28 12:45:35 -07:00
gxw
25d9adb74b vp9: [loongson] optimize vpx_convolve8 with mmi.
1. vpx_convolve8_vert_mmi
2. vpx_convolve8_horiz_mmi
3. vpx_convolve8_mmi
4. vpx_convolve8_avg_mmi
5. vpx_convolve8_avg_vert_mmi

Change-Id: I41a6b3b4f327d6b67d282e0163cfa0aee8648abe
2018-03-28 18:11:16 +00:00
Linfeng Zhang
c101a5f5c4 Fix dangling-else warnings
Compiler -- gcc (Debian 7.3.0-5) 7.3.0

Change-Id: If2dcc6e215a2990cde575f0e744ce0c7a44a15f1
2018-03-22 11:44:01 -07:00
Kaustubh Raste
339f4dcaee mips msa optimize vpx_scaled_2d function
Change-Id: I638507b360c71489ab0e87bd558d2719ad995333
2017-11-29 13:27:04 +05:30
Kyle Siefring
ae35425ae6 Optimize convolve8 SSSE3 and AVX2 intrinsics
Changed the intrinsics to perform summation similiar to the way the assembly does.

The new code diverges from the assembly by preferring unsaturated additions.

Results for haswell

SSSE3
Horiz/Vert  Size  Speedup
Horiz       x4    ~32%
Horiz       x8    ~6%
Vert        x8    ~4%

AVX2
Horiz/Vert  Size  Speedup
Horiz       x16   ~16%
Vert        x16   ~14%

BUG=webm:1471

Change-Id: I7ad98ea688c904b1ba324adf8eb977873c8b8668
2017-10-24 10:39:48 -04:00
Linfeng Zhang
0d2e95193b Merge "Generalize CheckScalingFiltering in ConvolveTest" 2017-10-17 16:03:07 +00:00
Linfeng Zhang
54f7d68c5c Generalize CheckScalingFiltering in ConvolveTest
Let it test extreme inputs and all filter types.
In the future ConvolveTest should test regular 8-bit functions in
high bitdepth mode.

Change-Id: I1042564d1d390589ca203070fe332c6da3315d75
2017-10-10 14:12:43 -07:00
Kyle Siefring
1b2f92ee8e Extend 16 wide AVX2 convolve8 code to support averaging.
Also adds vpx_convolve8_avg_horiz_avx2.

Change-Id: I38783d972ac26bec77610e9e15a0a058ed498cbf
2017-10-09 19:10:03 -04:00
Kyle Siefring
9ca06bcdd2 Add AVX2 version of vpx_convolve8_avg.
vpx_convolve8_avg works by first running a normal horizontal filter then a
vertical filter averages at the end.

The added vpx_convolve8_avg_avx2 calls pre-existing AVX2 code for the
horizontal step.

vpx_convolve8_avg_vert_avx2 is also added, but only uses ssse3 code.

Change-Id: If5160c0c8e778e10de61ee9bf42ee4be5975c983
2017-10-07 23:37:48 -04:00
Linfeng Zhang
6543213e87 Refactor x86/vpx_subpixel_8t_intrin_ssse3.c
Change-Id: Id6a8c549709a3c516ed5d7b719b05117c5ef8bac
2017-10-03 13:02:05 -07:00
Linfeng Zhang
9d0d13e939 Add vpx_scaled_2d_neon()
BUG=webm:1419

Change-Id: I39c8033734562efc0ac0e28e7f06fa05130f9b96
2017-09-26 09:22:39 -07:00
Linfeng Zhang
d331e7a1c0 Remove get_filter_base() and get_filter_offset() in convolve
so that the convolve functions are independent of table alignment.

Change-Id: Ieab132a30d72c6e75bbe9473544fbe2cf51541ee
2017-09-05 15:22:36 -07:00
Yi Luo
a3452996a1 High bit depth inter prediction horizontal/vertical filters AVX2
User level speed improvement on i7-6700, cpu-used=1,
  x86_64 Linux, bitrate, 1080p, 8Mbps, 4K, 16Mbps:
- Decoder:
  1080p: ~4%
  4K: ~5%
- Encoder:
  1080p: ~1%
  4K: ~3%

Change-Id: I51b48f9c5de0d62487d5a11aa579c97bd03dd640
2017-05-03 12:18:01 -07:00
Luca Barbato
e2ad89092d ppc: Add convolve8_vsx and convolve8_avg_vsx
Change-Id: Ia5293d948003a7fff5a7cbad6e83d8a72717c857
2017-05-02 20:27:47 -07:00
Luca Barbato
e6ca81ee67 ppc: Add convolve8_avg_vert_vsx
Only the generic one again, speedups for 8x8 and larger blocks to
come later.

Change-Id: I90d481d3a602d1e277ead8f3934eca126b86b72d
2017-05-02 20:27:42 -07:00
Luca Barbato
a65f1771ad ppc: Add convolve8_vert
Only the generic one again, speedups for 8x8 and larger blocks
to come later.

Change-Id: Ia509d6225984b4930ec03928c9bcbf51486da99f
2017-05-02 20:27:33 -07:00
Luca Barbato
77772350f3 ppc: Add convolve8_horiz_avg
The 8x8 and larger blocks cases can be sped up further.

Change-Id: I54549b03ac6c7a4e3f485738b100c3cac7ac2e15
2017-05-02 20:27:28 -07:00
Luca Barbato
08edb85bd0 ppc: Add convolve8_horiz
The 8x8 and larger blocks cases can be sped up further.

Change-Id: I89b635d6b01c59f523f2d54b1284ed32916c5046
2017-05-02 20:27:16 -07:00
Luca Barbato
d51d3934f5 ppc: Add convolve_avg
Change-Id: Ib203c444c708f42072e38301ee3db97b5b53d014
2017-04-29 15:47:25 +02:00
Luca Barbato
63860ba7b8 ppc: Add convolve_copy
Change-Id: Ie26d6dbe090e711d84bac01ba7da270db983f405
2017-04-29 15:47:25 +02:00
Linfeng Zhang
51dc998f3a Update highbd convolve functions arguments to use uint16_t src/dst
BUG=webm:1388

Change-Id: I6912de2639895d817ce850da8ea9f6c8fe21da42
2017-04-25 14:22:19 -07:00
Linfeng Zhang
bf8a49abbd Clean CONVERT_TO_BYTEPTR/SHORTPTR in convolve
Replace by CAST_TO_BYTEPTR/SHORTPTR.
The rule is: if a short ptr is casted to a byte ptr, any offset
operation on the byte ptr must be doubled. We do this by casting to
short ptr first, adding offset, then casting back to byte ptr.

BUG=webm:1388

Change-Id: I9e18a73ba45ddae58fc9dae470c0ff34951fe248
2017-04-19 12:13:49 -07:00
Yi Luo
aa5a941992 Add AVX2 optimization to copy/avg functions
Change-Id: Ibcef70e4fead74e2c2909330a7044a29381a8074
2017-04-14 16:50:10 -07:00
Linfeng Zhang
9c8981c666 add vpx high bitdepth convolve8 NEON intrinsics optimization
BUG=webm:1299

Change-Id: I236bfa0441e357b6ff05add8269a2cfb543924d1
2016-10-17 15:23:54 -07:00
Linfeng Zhang
f910d14a1a add vpx_highbd_convolve_{copy,avg}_neon()
BUG=webm:1299

Change-Id: Ib87ac466ada63251eb06ae2abd1e13e61e0d1538
2016-10-13 15:21:14 -07:00
Linfeng Zhang
85a9e48d25 Refine vpx_convolve_copy_neon() and vpx_convolve_avg_neon()
BUG=webm:1290

Change-Id: Ia27e58521eba5a4852b50381c56746fa5767f6d6
2016-09-29 16:19:39 -07:00
Linfeng Zhang
81ff7a065f Clean convolve_test.cc
Combine test MatchesReferenceSubpixelFilter and
MatchesReferenceAveragingSubpixelFilter.

Change-Id: I75f96befbbb118cdc6b8c6001b4cdda8d88fbbd3
2016-09-27 13:36:31 -07:00
clang-format
9c9d92ae3a test: apply clang-tidy google-readability-braces-around-statements
applied against a x86_64 configure with and without
--enable-vp9-highbitdepth

clang-tidy-3.7.1 \
  -checks='-*,google-readability-braces-around-statements' \
  -header-filter='.*' -fix
+ clang-format afterward

Change-Id: Ia2993ec64cf1eb3505d3bfb39068d9e44cfbce8d
2016-08-05 20:02:28 -07:00
clang-format
33e40cb5db test: apply clang-format
Change-Id: I0d9ab85855eb723f653a7bb09b3d0d31dd6cfd2f
2016-07-27 01:58:52 +00:00
Johann Koenig
e616012d69 Merge changes I59a11921,I296a0b81,I397d7753
* changes:
  configure: remove x86inc.asm distinction
  test: remove x86inc.asm distinction
  vpx_dsp: remove x86inc.asm distinction
2016-07-01 18:13:41 +00:00
Johann
0266e70c52 test: remove x86inc.asm distinction
BUG=b:29583530

Change-Id: I296a0b81755e3086bc0a40cb126d0200ff03c095
2016-06-30 11:14:10 -07:00
James Zern
f5a6079141 convolve_test: fix byte offsets in hbd build
CONVERT_TO_BYTEPTR(x) was corrected in:
003a9d2 Port metric computation changes from nextgenv2
to use the more common (x) within the expansion. offsets should occur
after converting the pointer to the desired type.

+ factorized some common expressions

Change-Id: I171c3faaa5606d098e984baa9aa74bb36042f57f
2016-06-29 20:39:07 -07:00
Tom Finegan
9a56a5ea18 convolve_test: Fix high bit depth IOC runtime errors.
Add a cast.

BUG=webm:1225

Change-Id: I34ea18ee816569485c1f1046a81fd2a0ce527ac8
2016-05-13 09:42:58 -07:00
Tom Finegan
6042d68851 convolve_test: Fix IOC runtime errors.
Add a cast.

BUG=https://bugs.chromium.org/p/webm/issues/detail?id=1216

Change-Id: I40627de387bc9cfba37860e7a0a4f2d4524f3431
2016-05-09 16:33:59 -04:00
Alex Converse
2f97b7cbfe Port convolve test refactor to master.
Brings f03e238f to master.

Change-Id: I7f7754e7d1288b103a4510303d10afc68a7d8ca8
2016-04-27 16:53:33 -07:00
James Zern
cffef113b9 tests: quiet some unused parameter warnings
Change-Id: Iff8b0d77234f78bf407676891bccad92825bfcc6
2016-02-11 19:25:48 -08:00
Alex Converse
0c00af126d Add vpx_highbd_convolve_{copy,avg}_sse2
single-threaded:
swanky (silvermont): ~1% faster overall
peppy (celeron,haswell): ~1.5% faster overall

Change-Id: Ib74f014374c63c9eaf2d38191cbd8e2edcc52073
2015-10-09 11:50:25 -07:00
Alex Converse
7e77938d72 Generate convolve_test wrapper functions with a macro
Change-Id: Iccb4cdc23c1845cf9cb7d69101c9f4f43675d368
2015-10-09 11:42:05 -07:00